Jump to content

Using Saved Queries


opcode

Recommended Posts

I have a client site that requires saved search queries. It's not an auto site but let's use that as an example since it's easy.

For example, say you signed in and searched for Blue Ford F-250's under $20,000 within 200 miles of Seattle. Now say you want to

save that query to a list, so you can come back later, login, click a button and run the search again. My first question is what

is the best way to store the query in a database, as one long string (SELECT * FROM table WHERE color = 'blue'....) or use separate

fields for color, brand, model, price etc... and then build the query string from those when the button is clicked?

 

My other question is that the client wants to send an email to any customer when a new product is added to the site that matches their

saved search query. So you'd get an email that said "Someone just offered a blue 2007 Ford F-250 for $18,999 click here to see it."

What is the best way to go about this to use the least amount of server resources as possible? If we had 50,000 users with 3 saved

searches each, how do you compare to all 150,000 of them without crashing the server? I was thinking maybe when the first field fails,

say it's a Blue Ford F-150 for $11,000, skip to the next record before you check the price, location etc, to save those resources.

 

I'm using PHP 5, OOP, MySQL 5.0.45 on an Apache server (Media Temple) if that helps. I'd appreciate any advice, I really want to make this thing rock.

Is there maybe a PHP pattern or model for this, or even an existing class in PEAR or something to look at as an example?

 

Link to comment
Share on other sites

The answer is entirely dependent on your schema, but the first thing I'd point out is that the interface to search is completely different from the actual SQL required.  What you should be saving is the state of the UI that then leads to the queries.

 

If you've solved this elegantly, then there should be some normalization involved, and hopefully you will have a table or set of tables that can describe a saved query. 

 

For efficiency, when a user saves a query, the first thing that should happen is, your code should determine if this saved query already exists.  If it doesn't then you:

 

-Create a new saved query

-Link the saved query to that particular user.

 

IF however, the query is found:

 

-Link the saved query to the user.

 

Obviously this will save resources, as users who have the same criteria will share the saved search.

 

I don't know what database you're using, but this is a good strategy if you're using mysql.

 

- Have a replicated mysql server for running your saved queries off.  You can use maatkit to control the amount of drift between your servers.  Check it out.

 

At batch time (stop the replication on the slave server)

Run your saved queries process on the slave, and not the master.  This will insure that master (your transaction database) isn't effected by the load of the batch process

Have your batch process run through the saved queries, and mail results.  Since you have a many to many relationship between users and saved queries, you'll save substantial overhead anytime users have saved the same query, which I would predict will happen quite frequently.

 

One important thing to consider -- how long will stored queries persist?  Make sure you have a timestamp on the relationship between the user and the query.  You can then easily have a pruning script that runs, that clears out obsolete queries (maybe it's 90 or 180 days?)  or whatever criteria.  This also allows you to indicate if a user no longer wants to receive updates (otherwise you'll be violating canspam).  Obviously there's no reason to run a query if there are no users interested in the results.

 

 

Hope this helps you.

 

Link to comment
Share on other sites

Great idea on checking if the query already exists. So there would be a saved_searches table which has a unique ID per row, and there would be a field in the users table with maybe a comma separated list of query IDs they're diggin' on?

 

I only have one server so I don't know if the rest of your post will help me. This site is actually for more perishable items (food) that lasts a week or less, so search queries could be valid for say 30 days max.

 

The idea is that this stuff is super perishable so if an item comes up for sale, you want to know about it immediately, on your iPhone at 2:45am. Thus, this is the idea:

 

Say I am a buyer so I searched for a Blue 1997-99 Ford F-250 under $20,000 in Seattle.

Say there were no hits, so I saved it in my Saved Searches feature.

 

Three days later a seller comes along and posts a 1997 Ford F-250 for $16,000 in Seattle.

I want THAT one new post to be compared against all saved searches in the database, and when it

Finds a match, email that user that we found them a truck, and then keep going down the

List until the next matched query, till the end. So essentially it would be like this:

 

(newPost is the properties of the truck the seller just put up - still using cars, it's simple)

 

$newPost = array(make  => “Ford”, model  => “F-250”, color => “blue” …..);

 

$numIDs = mysql_query COUNT(userID);  // counts how many users are in the table with potential saved searches.

 

For ($i = 1; $i <= $numIDs; $i++) {

 

    Check first userID for search queries, if found compare to $newPost

 

    If (found) {

      Send email alert

  }

  Else {

      go to next userID row

  }

}

 

Not everyone will have saved searches, but we have to plan for it, and some will have a lot of saved searches.

I know it's a drag to launch a huge query and send emails every time a new product goes up, but the client needs this functionality. If we have to go dual servers, I'll have to lok into that.

 

 

Link to comment
Share on other sites

Great idea on checking if the query already exists. So there would be a saved_searches table which has a unique ID per row, and there would be a field in the users table with maybe a comma separated list of query IDs they're diggin' on?

 

Sort of.  You should not use a column in the user table, but rather a seperate table that links the user to the saved queries.  This might look something like:

 

user_saved_searches

---------------------

user_id  (primary key)

saved_searches_id  (primary key)

created_on timestamp

 

 

This way, when you find a match on the query, it's a simple 3 way join from saved_search to user_saved_searches to user, to get the list of people who need to be emailed.

 

As for the rest of the pseudo-code, I can only again say, that any solution is going to be highly dependent on your application and the nature of the existing database and search.  You could have a compressed format for the search that you store in a db, which will make an indexed search against that column very efficient for finding similar queries, however, if you have a ranged component, then this is no longer trivial.  When I say range, what I mean is, if there is ranged pricing for example, where a person can say:  Less than $10.  Now an item comes in that is $5.  This is going to match pricing criteria for a whole bunch of people potentially, so that is not a simple thing to determine.  You may need to cast a larger net and then eliminate people with saved queries who fall outside the range(s). 

 

 

 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.