Jump to content

PHP + MYSQL: best way to handle connections for autosuggest


j1982

Recommended Posts

Hi,

 

I've done a lot of searching on this topic, but different websites seem to give different recommendations.

 

There are two basic parts of functionality to a business application I'm working on:

 

1- A search box. When a user types in it, it will display a list of suggestions (using java/ajax).

2- A display area. Once the user has completed the search (ie clicked one of the suggestions), more details about the searched item will be displayed.

 

My question is - what would be the best way to handle the connections to MYSQL when performing the auto-suggest?

 

For example - when the search box is typed in, ajax will call "getsuggestion.php" passing the current search string.

This file will then connect to the database, search for the value the user has typed, then send the results back to ajax & be displayed to the user.

 

What I'm really wanting to know, is should "getsuggestion.php" open AND close the mysql connection each time it is called? Or should I not be closing the connection?

 

I'm asking this because the application will be used by potentially >1000 people at the company I work for, and I don't want there to be database connectivity issues - or speed issues when performing the autosuggest.

 

Any hints/tips would be greatly appreciated.

 

Thanks

Link to comment
Share on other sites

What I'm really wanting to know, is should "getsuggestion.php" open AND close the mysql connection each time it is called? Or should I not be closing the connection?

 

Because each request is a separate request it doesn't really matter if you do it yourself or not as php will close the connection for you anyway.

Link to comment
Share on other sites

If each request is going over HTTP to a script that runs once, there's no way to make the mysql connection persistent. It has to be reopened every time, and PHP will close it when the script exits. IMHO, the large number of mysql connections is just the tip of the iceberg for the problems you'll have with this. Making an ajax/http call on every keystroke, opening a connection, running an full table scan on SELECT LIKE 'S%', then returning the result set, is going to be very slow. First you're going to hit the wall when Apache starts taking 10 requests per user per second. Each of those Apache workers is going to have to run a PHP process that takes at least 15M of memory. Each PHP process is going to have to open a mysql connection. This is a very inefficient way to do it.

 

At the very least, you can write a daemon that stays alive and keeps a single connection open to your database, and listens on a local socket. When requests come in, the web-facing scripts connect via socket to the daemon, which runs the db request for them. This way you have an orderly single thread handling your DB work. Secondly, don't do the call every time the user presses a key. Do it on batch when it's been 1/2 second or so since the last key press. And don't do it unless there are a few characters in the field. You can improve things by having PHP actually cache result sets, depending on the size of the table you're searching. And if you wanted to go further, you could sockets instead of HTTP requests to connect to the clients, and run the whole thing on the command line as a stand-alone PHP process. Not having to run Apache or do a new HTTP call with every keystroke will  make it anywhere from 10x to 30x less resource-intensive.

Link to comment
Share on other sites

Thanks Thorpe & Scootstah, I was (somehow) under the impression that the connection remained open until it was manually closed! Good to know that's not the case.

 

Jstrike> Thank you very much for the detailed suggestion. Really appreciate it. I'm not familiar with daemon's & how that might operate. Have done some searching on it, and found something called "Solr" which might be another option to look into..

 

Having said that, the table which would be searched in my original method would (obviously) have an index on the text being searched, and there is only one field being searched by the 'like' statement & a limit of 10 results.

When you say it would be slow with multiple users & multiple searches being performed, are we talking many seconds? Or an extra .0xx of a second over a more efficient method? (generally speaking - obviously this is dependant on system config)

Link to comment
Share on other sites

Well - not that the query would take seconds (probably, barring a deadlock against an insert :D) ...but microseconds add up. I speak from a humbling experience -- I did exactly what you're talking about for a corporate intranet app in 2006. It works - it's still running - but only for a handful of users per location, never more than 20 on at a time. It ran fast when there were a few hundred contacts in the list it's searching; now with about 10,000 it runs slowly, and you can feel the drag. It's the most-complained-about product bug they're not willing to have rebuilt. And that's a tiny dictionary. I didn't know as much then... I wouldn't do it that way now, especially for an app with 1000+ users (live & learn, but don't apologize, hehe).

 

A few points:

* It may not help much to put an index on a VARCHAR or TEXT field in a very large table, especially if the table is user-updated and you're using InnoDB as the engine. I don't know whether we're talking about a dictionary of the english language or a few thousand words; obviously that makes a huge difference. If it's a very large dictionary, one thing you can do is partition the table by the first letter. That makes your index or table scans a lot smaller.

* Presumably your database is going to be doing a lot of other things. If this dictionary takes inserts, you'll want it to be InnoDB for row-level instead of table-level locking. But that means much slower index scans. Also, it's a lot easier to scale in the future by putting more webservers in front of a database with plenty of extra cycles to spare, than it is to replicate the database. Is the dictionary small enough to store with shared memory in APC or memcached? If not, what about caching it occasionally to a SQLite table... that way you're not taxing your primary database with these select requests, and it's always free to take inserts.

* If you have to open a lot of PHP connections sequentially to mysql, set wait_timeout LOW (I like 6 seconds) and max_user_connections high (500) in your my.cnf file. PHP has a nasty habit of sometimes not cleaning up its sockets right when a script terminates. Always use mysql_close(). Don't use persistent connections - ever. The only time I use them is when I have a long-running PHP daemon that has to make calls to a remote MySQL server on the other side of the planet, and then they have to be watched really closely to make sure they're still alive.

* If the database is under 500k or so, you could even reasonably cache it to SQLite in Javascript and have no wait time and no extra connection overhead once the client is loaded. If done wisely, you could even just cache part of it (e.g., there are 17,576 sets of results for all 3 letter combinations -- maybe you don't need 10 suggestions and 4 would be enough -- in that case store the suggestions for all 3 letter combinations locally and only hit the DB on the 4th letter. If the average suggestion is 10 characters, you could store 342k of most common results on the client side; then hope that 50% of people stop typing and take a suggestion after 3 letters; I bet google has figures on that; heck, your whole dictionary might fit into 342k anyway!)

* Apache will be the biggest problem. You'll need plenty of spare workers running for all these microscopic little hits. When you don't have enough spare workers, Apache has to spawn a new thread and bootstrap PHP again. Rough thumbnail, figure that usually sucks up about 2% of the whole CPU for half a second on a Quadcore Xeon. Therefore you're exposing the entire system to being crashed by ten users who can type at 40 WPM, or one user maliciously holding down "N". Check out how Google cuts you off if you do that. Lots of extra RAM is your friend here.

 

This is one of those optimization problems where it's a game of inches, and there's no silver bullet. The trouble is it's the kind of little thing that can bring a perfectly healthy LAMP stack to its knees very quickly. The goal is just to shave down the response time as much as you can, using every nasty trick in the book. Every processor cycle and connection saved counts. Always assume the client believes this thing can scale to at least 100x what it was built for, and will blame you if it doesn't.

Link to comment
Share on other sites

Thanks for the detailed reply! I've taken your advice on board for a smaller app I'm working on at work; your suggestions have been very valuable for that, so thanks :D

 

Given the requirements of the autosuggest functionality, the "..where row like x%" method isn't going to work anyway.

 

I've been looking into it a bit more, and it looks like Solr is going to provide the most useful search functionality (ie faceted search, etc).

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.