Jump to content

querying huge mysql tables


SchweppesAle

Recommended Posts

was wondering if you guys see anything wrong with the following query.  whenever I execute it the screen will continue to load, not sure if it's a memory issue or a problem with the syntax.

 

SELECT email_addresses.email_addressid 
			, email_addresses.email_address
			, email_addresses.status 
			, email_addresses_secondary.email_addressid AS spamID
			, email_addresses_secondary.email_address AS spamEmail
			, email_addresses_secondary.status AS spamStatus
		 FROM email_addresses_secondary
		INNER JOIN email_addresses ON 
			(TRIM(email_addresses_secondary.email_address) = TRIM(email_addresses.email_address)) 
		 	WHERE (email_addresses_secondary.status = 1)

Link to comment
Share on other sites

syntax problem

from what i see you use the select syntax in a bad way

 

SELECT email_addresses.email_addressid

 

you already use tablename.columnname and specify another table

FROM email_addresses_secondary

 

remove email_addresses. field from all select statement

 

email_addresses.email_addressid => email_addressid

Link to comment
Share on other sites

syntax problem

from what i see you use the select syntax in a bad way

 

SELECT email_addresses.email_addressid

 

you already use tablename.columnname and specify another table

FROM email_addresses_secondary

 

remove email_addresses. field from all select statement

 

email_addresses.email_addressid => email_addressid

 

so don't specify what table I'm selecting the column from?

Link to comment
Share on other sites

i think it should be like this...not sure thought


		 SELECT email_addressid
            , email_address
            , status
            , email_addresses_secondary.email_addressid AS spamID
            , email_addresses_secondary.email_address AS spamEmail
            , email_addresses_secondary.status AS spamStatus
          FROM email_addresses
         INNER JOIN email_addresses_secondary ON
            (TRIM(email_addresses_secondary.email_address) = TRIM(email_addresses.email_address))
             WHERE (email_addresses_secondary.status = 1)

Link to comment
Share on other sites

I guess the question is -- how large is your table.  The key problem here, is most likely in your join comparison.

 

(TRIM(email_addresses_secondary.email_address) = TRIM(email_addresses.email_address)) guarantees that no indexes will be used in the join, so each table will be table scanned and compared row by row.  If you have a lot of rows, this is going to take a long time and your script may be timing out.

 

You can explore these issues by using the EXPLAIN EXTENDED {YOUR QUERY HERE, no curly brackets} which is most easily issued inside the mysql command line client or in phpMyAdmin's sql window. 

Link to comment
Share on other sites

I guess the question is -- how large is your table.  The key problem here, is most likely in your join comparison.

 

(TRIM(email_addresses_secondary.email_address) = TRIM(email_addresses.email_address)) guarantees that no indexes will be used in the join, so each table will be table scanned and compared row by row.  If you have a lot of rows, this is going to take a long time and your script may be timing out.

 

You can explore these issues by using the EXPLAIN EXTENDED {YOUR QUERY HERE, no curly brackets} which is most easily issued inside the mysql command line client or in phpMyAdmin's sql window. 

 

we're talking maybe 120k and 240k entries for each table.  Unfortunately, I didn't design the database.  It's pretty bad though :P

 

I'll try that recommendation, we'll see what turns up

Link to comment
Share on other sites

It's not a memory issue.  Mysql grabs the memory it uses when it starts, and does not dynamically acquire or release memory while it's running.  That's all handled internally.

 

The issue is that your query is taking way too long.  At the range of joining a 120k row table to a 240k row table, that requires every single row in each table to be read from disk and compared -- yes that is going to take a long time, limited really by your IO performance. 

 

Since there's no other limiting criteria, it is also no surprise that it takes forever, and the WHERE (email_addresses_secondary.status = 1) is probably not helpful because it is likely a low cardinality value that will also not benefit from any indexes that might be available.

 

I don't know how often you need to run this query, looks like some sort of report?  What you need to do is change the php settings so the script does not time out before the query returns, and the query is going to take a LONG time.  It probably is crushing your mysql server when it runs, but it will eventually return.

 

If you had indexes on the email_address columns in each table, it would help this query, however because you are using TRIM() the indexes can not be used, even if they existed. 

 

In the php.ini file, there's a setting "max_execution_time" which limits the time a script can run to 30 seconds by default.  You can set it to 0 and allow unlimited run time.  In order to test this, that is probably the best course to take if you really need this query.

 

 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.