Jump to content

Archived

This topic is now archived and is closed to further replies.

neoform

Big Tables and Slowness

Recommended Posts

So i'm working with a table with 8.1 million rows (business listings), and i have a second table containing a list of all the categories those businesses are found in which has about 15.6 million rows.

From what i can tell i've indexed everything properly, yet i'm still running into long load times for seemingly simple queries..

[code]SELECT COUNT(*) FROM listings WHERE longitude != 0[/code] (i've got longitude indexed) yet this query takes a minimum of 5 seconds to load (uncached)..  any ideas as to why this might be happening?

I've also got a number of other queries i'm running (much more complex) that take up to 10 seconds to load but i'm guessing it's because of a number of inner joins taking place..

question is, is there a way of speeding this up without splitting the table up into smaller ones? (i hate doing that)

[code]# Table "listings" DDL

CREATE TABLE `listings` (
  `id` mediumint(8) unsigned NOT NULL auto_increment,
  `url_name` char(50) default NULL,
  `longitude` double NOT NULL,
  `latitude` double NOT NULL,
  `name` char(255) default NULL,
  `address` char(255) default NULL,
  `state` smallint(5) unsigned default NULL,
  `city` mediumint(8) unsigned default NULL,
  `zip` char(6) default NULL,
  `phone` char(10) default NULL,
  `fax` char(10) default NULL,
  `email` char(128) default NULL,
  `website` char(255) default NULL,
  `description` text,
  `hours` text,
  `payment_options` text,
  `locations` smallint(5) unsigned default NULL,
  `established_on` date default NULL,
  `image` enum('yes','no') default 'no',
  `keywords` text,
  `added_on` datetime default NULL,
  `rating` tinyint(3) unsigned default NULL,
  PRIMARY KEY  (`id`),
  UNIQUE KEY `url` (`url_name`),
  KEY `locate2` (`longitude`,`latitude`),
  KEY `long` (`longitude`),
  KEY `city` (`city`),
  KEY `state` (`state`)
) ENGINE=MyISAM AUTO_INCREMENT=8164229 DEFAULT CHARSET=latin1



# Table "listings_link" DDL

CREATE TABLE `listings_link` (
  `listing_id` mediumint(8) unsigned NOT NULL default '0',
  `category_id` smallint(5) unsigned NOT NULL default '0',
  PRIMARY KEY  (`listing_id`,`category_id`),
  KEY `category_id` (`category_id`),
  KEY `listing_id` (`listing_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

[/code]

Share this post


Link to post
Share on other sites
just for kicks, when you are counting rows, try just counting on the id column instead of *. i don't know for sure if mysql handles aggregates the same as selects or not, but i know with selects, your query is usually substantially faster if you only select those items you need:
[code]
SELECT COUNT(id) FROM listings WHERE longitude != 0
[/code]

again, don't know if it will help, but typically optimizing even the smallest things when you're searching through millions of records can help.

good luck

Share this post


Link to post
Share on other sites
just ran it with COUNT(id)

*waits*

..

*minutes go by*

..

yeah, that's not working too good.. ;)

Share this post


Link to post
Share on other sites
If you do an "EXPLAIN" on the query ("EXPLAIN SELECT ... ") you should see that the query uses no indexes. That's because MYSQL can't use indexes for !=.

The following should be faster
[code]
SELECT ((SELECT COUNT(*) FROM listings) - (SELECT COUNT(*) FROM listings WHERE longitude = 0)) AS num
[/code]

As far as why the other queries take 10 seconds or more, you should post the queries and the EXPLAIN output.

If the database does not change frequently and the result is being used as a search result perhaps for users to browse through, you may want to store the result of the query in a table for quick retrieval. You can recreate the table every 1hr or so depending on how often you think is best.

Share this post


Link to post
Share on other sites
Ooo, that's good to know..  i didn't realize that indexes didn't work for != operators..  (any reason why?)


as for the big query.. caching the results would be nice, but it's not possible in this case since the resulting cache table would be far too large since it's calculating the distance from a point a (you) to point b (business) and ordering then by distance..

[code]$query = "
SELECT
listings.id,
listings.url_name,
listings.name,
listings.address,
listings.zip,
listings.website,
listings.phone,
listings.fax, 
listings.rating,
listings.city,
listings.state,
(
3958
* 3.1415926
* sqrt(
(listings.latitude - ".$city['latitude'].")
* (listings.latitude - ".$city['latitude'].")
+ cos(listings.latitude / 57.29578)
* cos(".$city['latitude']." / 57.29578)
* (listings.longitude - ".$city['longitude'].")
* (listings.longitude - ".$city['longitude'].")
)
/ 180
) distance
FROM
listings_link
INNER JOIN listings ON listings.id = listings_link.listing_id
WHERE
(listings_link.category_id = '".$category['id']."' ".$where.")
ORDER BY
url_name ASC
LIMIT
".(($_page - 1) * LOCAL_PER_PAGE_SEARCH).", ".LOCAL_PER_PAGE_SEARCH."
";[/code]

Resulting query..

[code]SELECT listings.id, listings.url_name, listings.name, listings.address, listings.zip, listings.website, listings.phone, listings.fax, listings.rating, listings.city, listings.state, ( 3958 * 3.1415926 * sqrt( (listings.latitude - 40.394700) * (listings.latitude - 40.394700) + cos(listings.latitude / 57.29578) * cos(40.394700 / 57.29578) * (listings.longitude - -99.121000) * (listings.longitude - -99.121000) ) / 180 ) distance FROM listings_link INNER JOIN listings ON listings.id = listings_link.listing_id WHERE (listings_link.category_id = '1360' ) ORDER BY distance ASC LIMIT 0, 15[/code]

Explained..

[code]| id | select_type | table        | type  | possible_keys                  | key        | key_len | ref                                    | rows | Extra                          |
+----+-------------+---------------+--------+--------------------------------+-------------+---------+-----------------------------------------+------+---------------------------------+
|  1 | SIMPLE      | listings_link | ref    | PRIMARY,category_id,listing_id | category_id | 2      | const                                  |  200 | Using temporary; Using filesort |
|  1 | SIMPLE      | listings      | eq_ref | PRIMARY                        | PRIMARY    | 3      | localdirectory.listings_link.listing_id |    1 |                                |
+----+-------------+---------------+--------+--------------------------------+-------------+---------+-----------------------------------------+------+---------------------------------+
[/code]

I understand filesort is slower, but i can't figure any way of indexing that..

Share this post


Link to post
Share on other sites
Well, you have some redundant indexes, but that would just make the inserts slower.  As for !=, just think about it... if I give a sorted list of 1000 numbers, and say find 543, that's easy.  But if I say show me all that aren't 543, you have to actually check them.  You can only use an index to find out what is there, not what isn't.

I'm not sure that you could make that last query much faster, the EXPLAIN doesn't really give you much room for improvement if you sort on a expression.

Share this post


Link to post
Share on other sites
*shakes fist at the mysql developpers*

Make me an index that indexes numbers that don't exist!! ;)

Share this post


Link to post
Share on other sites
I did a test with a large table using a similar JOIN but didn't have anywhere near a 10 sec query time.

The EXPLAIN shows
[code]
Using temporary; Using filesort
[/code]

There should at least be a "using where" in the output. I'm guessing that probably has something to do with it.

How long does the query take to do the JOIN without the LIMIT and the ORDER BY?
[code]
SELECT
listings.id, listings.url_name, listings.name, listings.address, listings.zip, listings.website,
listings.phone, listings.fax, listings.rating, listings.city, listings.state,
( 3958 * 3.1415926 * sqrt( (listings.latitude - 40.394700) * (listings.latitude - 40.394700) + cos(listings.latitude / 57.29578) * cos(40.394700 / 57.29578) * (listings.longitude - -99.121000) * (listings.longitude - -99.121000) ) / 180 ) distance
FROM
listings_link
INNER JOIN
listings
ON
listings.id = listings_link.listing_id
WHERE
(listings_link.category_id = '1360' )
[/code]

You might try an "ANALYZE TABLE listings" and "ANALYZE TABLE listings_link" and see if that helps.

If you create a table that holds the result of the JOIN without the WHERE you may be able to speed things up by searching on that table. Again, this would be a table that you'd recreate periodically.
[code]
CREATE TABLE list_and_link ....
INSERT INTO TABLE list_and_link SELECT ..... FROM table1 INNER JOIN table2 ON table1.id = table2.id
ALTER TABLE ADD INDEX ...
SELECT l.*, COS * 1111/222 AS distance FROM list_and_link  AS l WHERE category = ... ORDER BY distance
[/code]

You should be able to shorten the table creation process by using CREATE TABLE SELECT ... syntax, but I rather create the table definition by hand because MYSQL may choose column types when using the "create table select syntax" that would be problematic.

EDIT: Btw, make sure the EXPLAIN output is from the query posted. I'm mentioning this because the code snippet you posted although almost the same as the "Resulting query', is not the same.

Share this post


Link to post
Share on other sites
If i do the query without ORDER BY it takes 0.16 seconds.

Without ORDER BY and LIMIT it takes 0.01 seconds.. (weird that it'd be faster to call more info....)

here's an EXPLAIN on the query you reposted..
[code]
+----+-------------+---------------+--------+--------------------------------+-------------+---------+-----------------------------------------+------+-------+
| id | select_type | table        | type  | possible_keys                  | key        | key_len | ref                                    | rows | Extra |
+----+-------------+---------------+--------+--------------------------------+-------------+---------+-----------------------------------------+------+-------+
|  1 | SIMPLE      | listings_link | ref    | PRIMARY,category_id,listing_id | category_id | 2      | const                                  |  200 |      |
|  1 | SIMPLE      | listings      | eq_ref | PRIMARY                        | PRIMARY    | 3      | localdirectory.listings_link.listing_id |    1 |      |
+----+-------------+---------------+--------+--------------------------------+-------------+---------+-----------------------------------------+------+-------+
[/code]


I'm doing the analyze on the tables.. (never done that before... took a bloody long time)

Once it's done that i'll try doing to listings_and_link idea, but..  wont that eat up a ton more ram in the process since it's going to basically be double the size of the listings table.. ?

[code]
mysql> ANALYZE TABLE listings;
+-------------------------+---------+----------+----------+
| Table                  | Op      | Msg_type | Msg_text |
+-------------------------+---------+----------+----------+
| localdirectory.listings | analyze | status  | OK      |
+-------------------------+---------+----------+----------+
1 row in set (6 min 35.16 sec)
[/code]

[code]
mysql> ANALYZE TABLE listings_link;
+------------------------------+---------+----------+----------+
| Table                        | Op      | Msg_type | Msg_text |
+------------------------------+---------+----------+----------+
| localdirectory.listings_link | analyze | status  | OK      |
+------------------------------+---------+----------+----------+
1 row in set (35.63 sec)
[/code]

Share this post


Link to post
Share on other sites
If the join is that fast without the ORDER BY then it seems to be taking far too long to sort 200 rows. The result is 200 rows yes?






Share this post


Link to post
Share on other sites
[quote author=shoz link=topic=111770.msg453292#msg453292 date=1161111518]
If the join is that fast without the ORDER BY then it seems to be taking far too long to sort 200 rows. The result is 200 rows yes?


[/quote]

yeah, it's for the 200 rows, I felt it was taking too long for that amount as well.. :P

btw, if you wanna see the site, it's here: http://local.netdeals.ca/listings/ 

and example listing page is: http://local.netdeals.ca/listings/new_york/manhattan/computer_communications_consultants/

if you view the source for the page i stuck the SQL queries (3 of them, scroll to see them all) as an <!-- html comment -->


Maybe it's my imagination, but everything is loading a LOT faster after having run "ANALYZE TABLE"..  is it just me?

Share this post


Link to post
Share on other sites
[quote=neoform]
Maybe it's my imagination, but everything is loading a LOT faster after having run "ANALYZE TABLE"..  is it just me?
[/quote]

ANALYZE TABLE may have helped. As far as whether or not it's your imagination, I'd say that you should run the queries manually and see if they're faster.

Share this post


Link to post
Share on other sites
Well, yes, if you indexes weren't nice distributed, then the optimizer wouldn't optimize very well!  You should be running this on all of your tables regularly, especially after any large updates.

Share this post


Link to post
Share on other sites

×

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.