Jump to content


Photo

Big Tables and Slowness


  • Please log in to reply
12 replies to this topic

#1 neoform

neoform
  • Members
  • PipPipPip
  • Advanced Member
  • 241 posts
  • LocationMontreal

Posted 17 October 2006 - 01:58 PM

So i'm working with a table with 8.1 million rows (business listings), and i have a second table containing a list of all the categories those businesses are found in which has about 15.6 million rows.

From what i can tell i've indexed everything properly, yet i'm still running into long load times for seemingly simple queries..

SELECT COUNT(*) FROM listings WHERE longitude != 0
(i've got longitude indexed) yet this query takes a minimum of 5 seconds to load (uncached)..  any ideas as to why this might be happening?

I've also got a number of other queries i'm running (much more complex) that take up to 10 seconds to load but i'm guessing it's because of a number of inner joins taking place..

question is, is there a way of speeding this up without splitting the table up into smaller ones? (i hate doing that)

# Table "listings" DDL

CREATE TABLE `listings` (
  `id` mediumint(8) unsigned NOT NULL auto_increment,
  `url_name` char(50) default NULL,
  `longitude` double NOT NULL,
  `latitude` double NOT NULL,
  `name` char(255) default NULL,
  `address` char(255) default NULL,
  `state` smallint(5) unsigned default NULL,
  `city` mediumint(8) unsigned default NULL,
  `zip` char(6) default NULL,
  `phone` char(10) default NULL,
  `fax` char(10) default NULL,
  `email` char(128) default NULL,
  `website` char(255) default NULL,
  `description` text,
  `hours` text,
  `payment_options` text,
  `locations` smallint(5) unsigned default NULL,
  `established_on` date default NULL,
  `image` enum('yes','no') default 'no',
  `keywords` text,
  `added_on` datetime default NULL,
  `rating` tinyint(3) unsigned default NULL,
  PRIMARY KEY  (`id`),
  UNIQUE KEY `url` (`url_name`),
  KEY `locate2` (`longitude`,`latitude`),
  KEY `long` (`longitude`),
  KEY `city` (`city`),
  KEY `state` (`state`)
) ENGINE=MyISAM AUTO_INCREMENT=8164229 DEFAULT CHARSET=latin1



# Table "listings_link" DDL

CREATE TABLE `listings_link` (
  `listing_id` mediumint(8) unsigned NOT NULL default '0',
  `category_id` smallint(5) unsigned NOT NULL default '0',
  PRIMARY KEY  (`listing_id`,`category_id`),
  KEY `category_id` (`category_id`),
  KEY `listing_id` (`listing_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1


Newsique.com Social News Network

#2 obsidian

obsidian
  • Staff Alumni
  • Advanced Member
  • 3,202 posts
  • LocationSeattle, WA

Posted 17 October 2006 - 02:18 PM

just for kicks, when you are counting rows, try just counting on the id column instead of *. i don't know for sure if mysql handles aggregates the same as selects or not, but i know with selects, your query is usually substantially faster if you only select those items you need:
SELECT COUNT(id) FROM listings WHERE longitude != 0

again, don't know if it will help, but typically optimizing even the smallest things when you're searching through millions of records can help.

good luck
You can't win, you can't lose, you can't break even... you can't even get out of the game.

<?php
while (count($life->getQuestions()) > 0)
{   $life->study(); } ?>
  LINKS: PHP: Manual MySQL: Manual PostgreSQL: Manual (X)HTML: Validate It! CSS: A List Apart | IE bug fixes | Zen Garden | Validate It! JavaScript: Reference Cards RegEx: Everything RegEx

#3 neoform

neoform
  • Members
  • PipPipPip
  • Advanced Member
  • 241 posts
  • LocationMontreal

Posted 17 October 2006 - 02:30 PM

just ran it with COUNT(id)

*waits*

..

*minutes go by*

..

yeah, that's not working too good.. ;)
Newsique.com Social News Network

#4 shoz

shoz
  • Staff Alumni
  • Advanced Member
  • 600 posts

Posted 17 October 2006 - 03:14 PM

If you do an "EXPLAIN" on the query ("EXPLAIN SELECT ... ") you should see that the query uses no indexes. That's because MYSQL can't use indexes for !=.

The following should be faster
SELECT ((SELECT COUNT(*) FROM listings) - (SELECT COUNT(*) FROM listings WHERE longitude = 0)) AS num

As far as why the other queries take 10 seconds or more, you should post the queries and the EXPLAIN output.

If the database does not change frequently and the result is being used as a search result perhaps for users to browse through, you may want to store the result of the query in a table for quick retrieval. You can recreate the table every 1hr or so depending on how often you think is best.

#5 neoform

neoform
  • Members
  • PipPipPip
  • Advanced Member
  • 241 posts
  • LocationMontreal

Posted 17 October 2006 - 03:24 PM

Ooo, that's good to know..  i didn't realize that indexes didn't work for != operators..  (any reason why?)


as for the big query.. caching the results would be nice, but it's not possible in this case since the resulting cache table would be far too large since it's calculating the distance from a point a (you) to point b (business) and ordering then by distance..

$query = "
										SELECT 
											listings.id, 
											listings.url_name, 
											listings.name, 
											listings.address, 
											listings.zip, 
											listings.website, 
											listings.phone, 
											listings.fax,  
											listings.rating, 
											listings.city,
											listings.state, 
											(
												3958 
												* 3.1415926 
												* sqrt(
													(listings.latitude - ".$city['latitude'].") 
													* (listings.latitude - ".$city['latitude'].") 
													+ cos(listings.latitude / 57.29578) 
													* cos(".$city['latitude']." / 57.29578) 
													* (listings.longitude - ".$city['longitude'].") 
													* (listings.longitude - ".$city['longitude'].")
													) 
												/ 180
											) distance 
										FROM
											listings_link 
										INNER JOIN listings ON listings.id = listings_link.listing_id 
										WHERE
											(listings_link.category_id = '".$category['id']."' ".$where.")
										ORDER BY 
											url_name ASC 
										LIMIT 
											".(($_page - 1) * LOCAL_PER_PAGE_SEARCH).", ".LOCAL_PER_PAGE_SEARCH."
									";

Resulting query..

SELECT listings.id, listings.url_name, listings.name, listings.address, listings.zip, listings.website, listings.phone, listings.fax, listings.rating, listings.city, listings.state, ( 3958 * 3.1415926 * sqrt( (listings.latitude - 40.394700) * (listings.latitude - 40.394700) + cos(listings.latitude / 57.29578) * cos(40.394700 / 57.29578) * (listings.longitude - -99.121000) * (listings.longitude - -99.121000) ) / 180 ) distance FROM listings_link INNER JOIN listings ON listings.id = listings_link.listing_id WHERE (listings_link.category_id = '1360' ) ORDER BY distance ASC LIMIT 0, 15

Explained..

| id | select_type | table         | type   | possible_keys                  | key         | key_len | ref                                     | rows | Extra                           |
+----+-------------+---------------+--------+--------------------------------+-------------+---------+-----------------------------------------+------+---------------------------------+
|  1 | SIMPLE      | listings_link | ref    | PRIMARY,category_id,listing_id | category_id | 2       | const                                   |  200 | Using temporary; Using filesort |
|  1 | SIMPLE      | listings      | eq_ref | PRIMARY                        | PRIMARY     | 3       | localdirectory.listings_link.listing_id |    1 |                                 |
+----+-------------+---------------+--------+--------------------------------+-------------+---------+-----------------------------------------+------+---------------------------------+

I understand filesort is slower, but i can't figure any way of indexing that..
Newsique.com Social News Network

#6 fenway

fenway
  • Staff Alumni
  • MySQL Si-Fu / PHP Resident Alien
  • 16,199 posts
  • LocationToronto, ON

Posted 17 October 2006 - 03:59 PM

Well, you have some redundant indexes, but that would just make the inserts slower.  As for !=, just think about it... if I give a sorted list of 1000 numbers, and say find 543, that's easy.  But if I say show me all that aren't 543, you have to actually check them.  You can only use an index to find out what is there, not what isn't.

I'm not sure that you could make that last query much faster, the EXPLAIN doesn't really give you much room for improvement if you sort on a expression.
Seriously... if people don't start reading this before posting, I'm going to consider not answering at all.

#7 neoform

neoform
  • Members
  • PipPipPip
  • Advanced Member
  • 241 posts
  • LocationMontreal

Posted 17 October 2006 - 04:17 PM

*shakes fist at the mysql developpers*

Make me an index that indexes numbers that don't exist!! ;)
Newsique.com Social News Network

#8 shoz

shoz
  • Staff Alumni
  • Advanced Member
  • 600 posts

Posted 17 October 2006 - 05:14 PM

I did a test with a large table using a similar JOIN but didn't have anywhere near a 10 sec query time.

The EXPLAIN shows
Using temporary; Using filesort

There should at least be a "using where" in the output. I'm guessing that probably has something to do with it.

How long does the query take to do the JOIN without the LIMIT and the ORDER BY?
SELECT
listings.id, listings.url_name, listings.name, listings.address, listings.zip, listings.website,
listings.phone, listings.fax, listings.rating, listings.city, listings.state,
( 3958 * 3.1415926 * sqrt( (listings.latitude - 40.394700) * (listings.latitude - 40.394700) + cos(listings.latitude / 57.29578) * cos(40.394700 / 57.29578) * (listings.longitude - -99.121000) * (listings.longitude - -99.121000) ) / 180 ) distance
FROM
listings_link
INNER JOIN
listings
ON
listings.id = listings_link.listing_id
WHERE
(listings_link.category_id = '1360' )

You might try an "ANALYZE TABLE listings" and "ANALYZE TABLE listings_link" and see if that helps.

If you create a table that holds the result of the JOIN without the WHERE you may be able to speed things up by searching on that table. Again, this would be a table that you'd recreate periodically.
CREATE TABLE list_and_link ....
INSERT INTO TABLE list_and_link SELECT ..... FROM table1 INNER JOIN table2 ON table1.id = table2.id
ALTER TABLE ADD INDEX ...
SELECT l.*, COS * 1111/222 AS distance FROM list_and_link  AS l WHERE category = ... ORDER BY distance

You should be able to shorten the table creation process by using CREATE TABLE SELECT ... syntax, but I rather create the table definition by hand because MYSQL may choose column types when using the "create table select syntax" that would be problematic.

EDIT: Btw, make sure the EXPLAIN output is from the query posted. I'm mentioning this because the code snippet you posted although almost the same as the "Resulting query', is not the same.

#9 neoform

neoform
  • Members
  • PipPipPip
  • Advanced Member
  • 241 posts
  • LocationMontreal

Posted 17 October 2006 - 06:30 PM

If i do the query without ORDER BY it takes 0.16 seconds.

Without ORDER BY and LIMIT it takes 0.01 seconds.. (weird that it'd be faster to call more info....)

here's an EXPLAIN on the query you reposted..
+----+-------------+---------------+--------+--------------------------------+-------------+---------+-----------------------------------------+------+-------+
| id | select_type | table         | type   | possible_keys                  | key         | key_len | ref                                     | rows | Extra |
+----+-------------+---------------+--------+--------------------------------+-------------+---------+-----------------------------------------+------+-------+
|  1 | SIMPLE      | listings_link | ref    | PRIMARY,category_id,listing_id | category_id | 2       | const                                   |  200 |       |
|  1 | SIMPLE      | listings      | eq_ref | PRIMARY                        | PRIMARY     | 3       | localdirectory.listings_link.listing_id |    1 |       |
+----+-------------+---------------+--------+--------------------------------+-------------+---------+-----------------------------------------+------+-------+


I'm doing the analyze on the tables.. (never done that before... took a bloody long time)

Once it's done that i'll try doing to listings_and_link idea, but..  wont that eat up a ton more ram in the process since it's going to basically be double the size of the listings table.. ?

mysql> ANALYZE TABLE listings;
+-------------------------+---------+----------+----------+
| Table                   | Op      | Msg_type | Msg_text |
+-------------------------+---------+----------+----------+
| localdirectory.listings | analyze | status   | OK       |
+-------------------------+---------+----------+----------+
1 row in set (6 min 35.16 sec)

mysql> ANALYZE TABLE listings_link;
+------------------------------+---------+----------+----------+
| Table                        | Op      | Msg_type | Msg_text |
+------------------------------+---------+----------+----------+
| localdirectory.listings_link | analyze | status   | OK       |
+------------------------------+---------+----------+----------+
1 row in set (35.63 sec)

Newsique.com Social News Network

#10 shoz

shoz
  • Staff Alumni
  • Advanced Member
  • 600 posts

Posted 17 October 2006 - 06:58 PM

If the join is that fast without the ORDER BY then it seems to be taking far too long to sort 200 rows. The result is 200 rows yes?








#11 neoform

neoform
  • Members
  • PipPipPip
  • Advanced Member
  • 241 posts
  • LocationMontreal

Posted 17 October 2006 - 07:07 PM

If the join is that fast without the ORDER BY then it seems to be taking far too long to sort 200 rows. The result is 200 rows yes?



yeah, it's for the 200 rows, I felt it was taking too long for that amount as well.. :P

btw, if you wanna see the site, it's here: http://local.netdeal...s.ca/listings/ 

and example listing page is: http://local.netdeal...ns_consultants/

if you view the source for the page i stuck the SQL queries (3 of them, scroll to see them all) as an <!-- html comment -->


Maybe it's my imagination, but everything is loading a LOT faster after having run "ANALYZE TABLE"..  is it just me?

Newsique.com Social News Network

#12 shoz

shoz
  • Staff Alumni
  • Advanced Member
  • 600 posts

Posted 17 October 2006 - 07:50 PM

Maybe it's my imagination, but everything is loading a LOT faster after having run "ANALYZE TABLE"..  is it just me?


ANALYZE TABLE may have helped. As far as whether or not it's your imagination, I'd say that you should run the queries manually and see if they're faster.

#13 fenway

fenway
  • Staff Alumni
  • MySQL Si-Fu / PHP Resident Alien
  • 16,199 posts
  • LocationToronto, ON

Posted 17 October 2006 - 10:15 PM

Well, yes, if you indexes weren't nice distributed, then the optimizer wouldn't optimize very well!  You should be running this on all of your tables regularly, especially after any large updates.
Seriously... if people don't start reading this before posting, I'm going to consider not answering at all.




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users