Jump to content

speed vs complexity


drifter

Recommended Posts

OK so I was noticing the other day that the average visitor to my site goes through about 2.5 pages of search results - same search, just different page number (limits) - they also veiw several items (which are passed through the same search criteria to verify permissions.)

So I had this idea to take any search that comes to my site and create a table from select and write all of the results to this table. On each call I check to see if there is a table holding that query, and if there is I query that table instead. All tables expire after 10 minutes.

So the first time through, I lose some speed because I create a new table, but on the addition search pages, I gain a ton because I no longer have a complex sql, I have a select * from xxx LIMIT 20,20 on a much smaller table. (twice - once for query, once for count)

So now I set it up so all people on even IP address run on the old system and the odd IPs run on the new so I can compare at the same time and account for server load.

The results are a gain of an average of about  10% on pages that take about .2-.3 seconds to generate.

So my question is... (finally) - is gaining the extra  .02-.04 seconds really worth all the extra complexity this brings to my code? What about on very busy servers? I am obviously not worried about visitors seeing pages .02 seconds faster, I am more wondering how much of a difference that can make as the server gets busier - also will it help on a busy server if search are querying 20 different tables rather then all querying the one main table?
Link to comment
Share on other sites

I'd say the extra complexity is not worth it. Additional complexity just creates more possible errors and problems.

You need to asses how much traffic it would take to make the speed of your current system unacceptable. Then you need to estimate when you would expect to reach that kind of traffic - if ever. Lastly, you need to determine how long ahead of a possible problem you want to react. If, from your load tests and your expected traffic, you don't see a problem for 5 years - it's probably nothing to worry about as you would probably go to something completely new by then. If, however, you forsee a problem within 6 months to a year, THEN you should start looking for solutions.

In any event you could probably improve performance by redisigning your database, your queries, or both. Not to mention hardware upgrades. With your benchmarks (if done properly) you should be able to detemrmine where you bottlenecks are and where you could get the most performance gains.
Link to comment
Share on other sites

Well I put the code in for a couple hours - splitting the traffic based on IP to the new and old script, and the results were a lot bigger then I thought -at the busiest time I saw up to 70 extra tables in my DB - so if I went with this and was peakingout at 200+ extra tables, that does not cause a problem for mysql does it?

(note  time are with apc disabled)
[code]
RUNS avg time   median time script test_period
------------------------------------------------------------
800 0.24488356305286 0.122019 ITEM NEW
592 0.44784119426952 0.1331 ITEM OLD
-----------------------------------------------------------
1251 0.60897496259279 0.205683 SEARCH NEW
440 0.82222866654735 0.616499 SEARCH OLD

[/code]
Link to comment
Share on other sites

[quote author=mjdamato link=topic=119415.msg489073#msg489073 date=1166637810]
In any event you could probably improve performance by redisigning your database, your queries, or both. Not to mention hardware upgrades. With your benchmarks (if done properly) you should be able to detemrmine where you bottlenecks are and where you could get the most performance gains.
[/quote]

I have been doing a lot of DB tweaking and query tweaking and it has helped a lot. I project problems when my traffic is about 5 times what it is now - I project that with in a year. Really I want to told over until I can afford a second server and move my DB and some data files over there. I do still want efficiency though - even if a company is HUGE they can buy more hardware, but you would rather only pay for 50 servers rather then 60.
Link to comment
Share on other sites

I think you are on the right track as far as contemplating problems. I'mnot so sure your approach is valid or not - never seen that method used before.

But, one thing I'm not sure you are accounting for is scalability. For illustrative purposes: if 70 extra tables causes a delay of 10 seconds, that does not mean that 280 extra tables will cause a delay of 40 seconds. The delay (or performance hit) will most likely increase at an increasing rate.

If you can, do a performance test using benchmarking software that can simulate [i]n[/i] number of users to see when the system breaks.
Link to comment
Share on other sites

Isn't this just reinventing the MySQL query cache (assuming you're using MySQL...)?

[quote]pages that take about .2-.3 seconds to generate.[/quote]Are your tables properly indexed?

[quote]also will it help on a busy server if search are querying 20 different tables rather then all querying the one main table?[/quote]I've always been under the impression that querying large tables is faster.
Link to comment
Share on other sites

well the first thing about the query cache is that it stores an exact query - including limits - I have very few truly duplicate queries. I do however have a lot of people query page one (limit 0-20) then page 2 (limit 21-40) then view an item say 15 (which is derived from the same query) - I have noted that an average of 2.5 pages of results per page being views with about 2 items per page - that is about 4.5 queries plus 2.5 queries for count... so a total of 6.5 Different queries in the eyes of mysql cache, but really one query with different limits.  so what I am suggesting and testing is taking any query - limit 1500 - dump it all in the table. then I can search that small table quick using just limits - no joins, no = or != - just a select *

if the query cache would cache without the limits, that would help a lot... do they? is there a setting or something?
Link to comment
Share on other sites

I have tried several similar caching systems.

1. Fetch the entire query into an array, serialize the array, and store in the database as character data.
2. Fetch the entire query into an array, serialize the array, compress the result and store in db as binary data.
3. Fetch the entire query into an array, and store in db in an array data type (I don't think this is possible in mysql, I was using postgres).

In all cases, the caching tables had just key, data and updated (for expiring old results) columns.

Of course this means you must fetch and unserialize the array each time..

Another option is to store the data in a text file or text files.  You can split the files into page chunks if you know (or can make an educated guess) as to the page size the user will be using.  The filesystem can be a very effective caching system, much like a hash table.  Lookups of a particular filename are very fast in most filesystems.
Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.