Constant up-to-date data on website

smerny · January 22, 2014

So I have a webpage where I pull several rows from the database and create a list on the webpage. Any one of these rows can get updated independently at any time. For the past several years I've used something like this:



    function execRefresh() {

        $("#refreshIcon").fadeIn("fast");

        $("#'. $p .'").load("/data.php?inid='. $p .'&rand=" + Math.random(), "", 

            function () {

                $("#refreshIcon").fadeOut("slow");

            }

        );

        setTimeout(execRefresh, 20000)

    }

    $(document).ready(function () {

        execRefresh();

    });

This will reload the entire content from the database to repopulate the list on the webpage. I'm wondering if there is a better way to do this. The downfalls I'd like to overcome if possible would be (and I realize these two overlap a bit):

1. I am loading the entire table for everyone on the webpage even if there are no changes (unnecessary loads/database pings).

- Is there a way to only pull new data if there is a change?

- Is there a way to only pull the rows that got changed?

- Is there a way to make it so that I don't have to make a call to the database for every single user?

2. I have to create an interval to determine how often I reload the data. Currently I have 20 seconds, this means if an update occurs right after the user loaded data, it could be up to 20 seconds before they see that change (not loading when necessary).

- Is there a way to tell the client there has been a change and it needs to update so that it doesn't have extended periods of time where the data isn't updated without just making the interval shorter (and thus having more unnecessary loads)?

I know that, for example, Google chat is nearly instantaneous in telling you "Someone is typing" and then showing what they sent as a chat. I imagine that they don't have millions of users constantly pinging a database that contains the chats and whether or not a user is currently typing every second.

What would the best way to do this be? I assume it's relatively common and there are possibly some best practices for things such as this.

Thanks

Psycho · January 22, 2014

Not knowing all the details about the situation and "how" data can change it's difficult to give definitive answers.

Is there a way to only pull new data if there is a change?

Yes. Pass a timestamp to the PHP page. When the query is run only get records that have changed. If no records are returned, then you know there is no new data.

Is there a way to only pull the rows that got changed?

Yes, see above. But the trick is in using that information to only update the records that need to be updated. Again, not knowing what the data is ot how it can change it is difficult to really provide a good response. Are you only dealing with new and changed records? What about deleted records? What happens where there are n records? Etc., etc.

Is there a way to make it so that I don't have to make a call to the database for every single user?

Yes, use a cache file.

Is there a way to tell the client there has been a change and it needs to update so that it doesn't have extended periods of time where the data isn't updated without just making the interval shorter (and thus having more unnecessary loads)?

Yes, it is possible to implement a "push" implementation with JavaScript. I've never done it myself - only read about it.

Here is what I would suggest. Modify the data.php page as follows:

1. Define a cache file

2. Check to see if the cache file exists and, if it does, when the file was last updated.

3a. If the file does exist and is less than n seconds old read the content of the cache file and return to the browser (in this case the JQuery call)

3b. If the cache file does not exist, or is too old, run the query to get the new data. Then process that data into the necessary output and save to the cache file and pass back to the browser.

Now, if you have 100 people with the page open, you will only be running a query once every 20 seconds for all the users. All the other times you will simply be reading the cache file and returning back to the user. With this process you can probably reduce the refresh period without putting too much load on the server.

smerny · January 22, 2014

Thanks for your response. I have to leave soon now so I will read it more thoroughly later, meanwhile I don't know if it will change your answer at all but I will answer your questions.

Not knowing all the details about the situation and "how" data can change it's difficult to give definitive answers.

Again, not knowing what the data is ot how it can change it is difficult to really provide a good response. Are you only dealing with new and changed records? What about deleted records? What happens where there are n records? Etc., etc.

I have a table with 100+ items, but only 10-15 of these items are showing at any time (there is a field determining whether or not they will belong in this list, and that field can change if a moderator determines it should show up or not). I allow the public to modify a certain field relating to these items, but because I want to save History of these edits, I have this "field" in its own table which has a foreign key to match up these "edits" with the item they belong to. On the webpage with the list, I show the items (various fields that don't generally change) that have the status field set to 1 along with a couple fields from the most latest record of the "edits" table associated with each item.

The "edits" table has many updates (actually inserts) throughout the day. These inserts are done via public users doing a form submission. The "status" field of the main list items table is generally only updated a few times during the same half hour period every week.

There will not be deleted records, but if an "edit" is flagged as bad, I will actually show the previous edit instead. This is currently done by showing (associated with each list item) the most recent "edit" that does not have the flag status.

If you believe I should change any of this to get a better result, please feel free to offer any suggestions.

smerny · January 23, 2014

Okay, I'm not exactly sure how this "cache file" thing would work. I need to update the page for all of the users viewing it as soon as possible. If I have a high refresh rate it will mean too many database pings (there are often over 100 people at one time, which would mean 5 database pings per second even with 20 second intervals). If I don't ping the database each time, how will it update?

My thoughts/questions after seeing your suggestions... Does this cache file mean that I create an xml or json file or something like that with the current data? Then instead of checking the database, each refresh (lets say I bring it to a couple seconds instead of 20) will check for the date this cache file was created. If the cache file was created more than (lets say) 2 seconds ago, it will read that. If the cache file is more than 2 seconds ago, it will ping the database and create a new cache file. This should then only ping the database every 2 seconds even with 100 people, instead of 0.2 seconds. Do I have the basic idea right? This leads me to a couple more questions:

will this create any race conditions where multiple clients hit when the cache file is 2 seconds old or more before the first client creates the cache file?
will checking the creation time of this file and loading this file really be much quicker than just reading the database?

Psycho · January 23, 2014

<p>
My thoughts/questions after seeing your suggestions... Does this cache file mean that I create an xml or json file or something like that with the current data? Then instead of checking the database, each refresh (lets say I bring it to a couple seconds instead of 20) will check for the date this cache file was created. If the cache file was created more than (lets say) 2 seconds ago, it will read that. If the cache file is more than 2 seconds ago, it will ping the database and create a new cache file. This should then only ping the database every 2 seconds even with 100 people, instead of 0.2 seconds. Do I have the basic idea right?

Yes, but how you store the data in the cache file is up to you for whatever is most expedient. Typically you want to store it in a structured format such as XML or JSON and pass that to the JavaScript which updates the actual HTML. But, in this case, it may be better to take the data and create the actual HTML and store that in the cache. Maybe, but see my comments at the end.

will this create any race conditions where multiple clients hit when the cache file is 2 seconds old or more before the first client creates the cache file?

Possibly, especially if you are updating it often and having a lot of hits. But, there are methods to lock the file until a user is done reading it. So, at worst, there might be a slight delay in updating the cache if another user is reading the content. I don't have any specifics on how to do this - but I'm sure a Google search will give you some pointers.

will checking the creation time of this file and loading this file really be much quicker than just reading the database?

Should be. Every time you load a web page the web server has to read all the PHP files, right? If you're not sure, test it. Create a loop to 1) run a query to get the records vs 2) reading the data from a cache file - then compare the times for each.

Having said all that, one of the responses you provided may control the best decision

I have a table with 100+ items, but only 10-15 of these items are showing at any time . . .

If you are only showing 10-15 records, would all users be viewing the same 10-15 records or will they be different? If they are the same, then I would go with the cache file and storing the actual HTML content in the cache. Then you can just pass the contents directly back to the JavaScript and update in the page. However, if users can be viewing different pages of records, then you would want to store the data in a structure format such as XML or JSON. Then you'd have to parse that data to pick out the records that need to be sent back to the client. If you only have a total of 100 records, this still might be more efficient. But, at some point the cost of parsing the cache file would be greater than a DB query. The only way to find out is to test.

smerny · January 24, 2014

Thanks, that gives me something to work with anyway. I'm also going to look into the "long poll" or "Comet" approach based on what I'm seeing here: http://stackoverflow.com/a/1086448/684368

Psycho · January 24, 2014

Yeah, Comet is the push service I referred to previously. Not sure how you would implement that with each user potentially having different records on their page though.

Sign In

Constant up-to-date data on website

Recommended Posts

smerny

Link to comment

Share on other sites

Psycho

Link to comment

Share on other sites

smerny

Link to comment

Share on other sites

smerny

Link to comment

Share on other sites

Psycho

Link to comment

Share on other sites

smerny

Link to comment

Share on other sites

Psycho

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information