Jump to content

Recommended Posts

Hi,

 

I was wondering if I could ask at what point it is more efficient to search in PHP rather than MySQL when querying data from the database please?

 

i.e. If a database has tens of thousands of entries/rows per table and plenty of concurrent users accessing the data is it better to A) use a straight MySQL query:

SELECT tablecolumn2 FROM tablename WHERE tablecolumn1='1234';

 

B) to grab all the rows and then sort 'and' search the data in PHP using:

SELECT tablecolumn2 FROM tablename;

 

or C) to sort and grab all the rows and then search the data in PHP using

SELECT tablecolumn2 FROM tablename ORDER BY tablecolumn1;

 

 

As a second question, would it be appropriate to split up a data table with minimum tens of thousands of rows to speed up data access. e.g. By splitting an users tables into 26 tables (users_a, users_b, users_c, ... users_z) or even into 676 tables (users_aa, users_ab, ... users_ba, users_bb, ... users_zzz), or is this bad practice?

Link to comment
https://forums.phpfreaks.com/topic/212179-searching-in-mysql-vs-php-query/
Share on other sites

Method A) will always be at least 10x more efficient than using php to perform the same task.

 

The database search engine is compiled code. Php is a parsed/tokenized/interpreted language.

 

Also, retrieving all the data into php every time you perform a search using php code would be many times more memory intensive because you must now transfer all that data to the mysql php client and then fetch it into memory that is accessible to php.

With a very bad database design and with only a small amount of data, it might be possible to produce a situation where php code could search through some data faster than a database engine could, but this would also require that the specific data be cached and not actually retrieved from the database before each search.

 

If you are going to the trouble of executing a query to select data, you might as well get that query to select just the data that you want, in the order that you want it, and with anything like date/number formatting already done so that your relatively slow parsed/tokenized/interpreted php code only needs to iterate over the data and display it.

Thank you for that info :)

 

So only for partial search matching across different fields or something more complex would it be worth doing the processing in PHP then?

 

Do you have thoughts on my second question on database structures please, the benefits or issues with splitting tables into sub-tables (breaking 'table' into 'table_a' to 'table_z' or 'table_aa' to 'table_zz')?

Okay, cheers :)

 

Update:

 

So are there 'any' instances where it be more efficient to grab all the data (and move the processing to PHP) in a large scale frequently accessed system? I've seen it done for some search cases so it must be beneficial sometimes?

 

and, Would it be appropriate to split up a data table with minimum tens of thousands of rows to speed up data access.

 

e.g. By splitting an users tables into 26 tables (users_a, users_b, users_c, ... users_z) or even into 676 tables (users_aa, users_ab, ... users_ba, users_bb, ... users_zzz), or is this bad practice?

In 30 years of consulting, programming and database design and administration; the number of times that I have found it more benefical to query data in code instead of at the database could be counted on one hand; even with compiled languages. Database engines are designed to manage the data, they do it very well. If you can write faster data processing, then you should build a new database engine and sell it!  IF you design the database correctly AND design your queries correctly, the database engine will provide the optimum processing; this is even more true with a interpreted language such as PHP. When you think you have found a case where you need to do this, you need to review the database design, indexing scheme, and query design; and then FIX THE DATABASE! If you are trying to get one of us to tell you specific cases where you need to do it; we can't. It would depend entirely on the database and the application.  If you come across a situation where you think you need to do it, you can post the specific case and, most likely, we will be able to tell you how to improve the query or the indexes or the database design.

 

Text searching is an interesting case. If you are trying to find every article that contains the words; "PHP", "mySQL", "search", and "multiple"; a traditional SQL SELECT, using LIKE ... OR LIKE ... OR ... will scan the entire table. mySql provides a FULLTEXT INDEX capability to assist in this type of query. There are also other solutions available. However, even without the FULLTEXT Index, I am confident that the SQL SELECT using LIKE will perform faster than selecting ALL of the rows from the table, retrieving them one-by-one into PHP, running substr() on the column data (4 times), and keeping the ones that pass.

 

As to splitting tables, one word -- DON'T -- or if you prefer, four words -- DO NOT DO IT!  Your example is a very bad idea. Consider this. The user table should have a unique identifier that is used to reference a user in other tables. For optimum storage and performance this identifier is usually an integer (not the user's name).  So, if you have a column called LastChangeUser in a table called CustomerAccounts, and it contains the user's ID, say 14287; are you going to do 676 queries to find the user's name? Or are you going to write a UNION query with 676 SELECT's? That will be some UGLY code either way.  Also, in a database of the size you propose (which is NOT unusual) referential integrity is going to be critical.  You will want (or at least, I would want) to have foreign key constraints between these tables. Splitting the table will make it impossible to define foreign key constraints to the user table. As a result, someone will be able to delete a user and leave rows in other tables that reference the deleted user's ID.

 

If you are in the planning stages of an application that you expect will really boom; and you expect the database to be big; you MUST spend a significant amount of time designing the database.  This includes normalization, indexing, foreign keys, other constraints, triggers, storage requirements, etc. [Note: normalization and indexing are a must for any size database] You also need to plan periodic processes: backups, index rebuilds, table analysis, etc. Query plans are based on information that the server stores including number of rows and index cardinality. As tables grow this meta-data can become stale.  If the meta-data is stale, query performance can degrade because the plan is based on false assumptions.

Thank you ever so much for that thorough answer. That was most useful.

 

I guess I was looking to confirm or debunk methods that colleagues had recommended to me in the past before committing any as my own practices.

 

For example, I had been advised for text searching to never use LIKE queries on large tables as I had been instructed they can take too long to run and can affect the whole system. Therefore since my colleagues actually perform searches similar to example you listed but by grabbing all the table rows in PHP and searching there (and assigning different keywords points to order the matches) I was particularly interested to hear that also should remain a MySQL query. The reason they had given me for this technique was that a single SELECT * was quite instantaneous and would have less of an impact on the database than checking every row when the data was also being regularly updated by other concurrent users.

 

With regard to splitting tables, the concept that was recommended was one in which a user with a username robert would have his record stored within either users_r or users_ro (depending on whether 26 or 676 tables was more a efficient split) and within that table he might have userid 14287. This would mean the PHP queries would be of the form: "SELECT * FROM users_".substr($username, 0, 1)." WHERE username='".$username."'"; I was advised that this would reduce access times considerably and again reduce the impact any one query would have on any others running.

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.