JChilds Posted September 8, 2009 Share Posted September 8, 2009 Using the following, I am having to wait upwards of 4 second for this query to execute. (for transaction 3) SELECT Sales.*, items.description FROM Sales LEFT JOIN Items ON sales.barcode = items.barcode WHERE sales.trans_id = ID I have uploaded a dump from phpmyadmin http://www.noexcuse.com.au/tmt.sql.gz Is there a faster way to execute this query? Or am I stuck having it take this long? Quote Link to comment https://forums.phpfreaks.com/topic/173475-left-join-extremely-slow-any-way-to-remedy-16000-lines/ Share on other sites More sharing options...
elflacodepr Posted September 8, 2009 Share Posted September 8, 2009 All depends, as the file may be too big...it will take up 4 seconds for sure. As for information you provided I can see you are doing a SELECT command which retrieves 16k rows? If so I recommend to use the LIMIT function so you don't have that heavy load of rows coming up at once. Quote Link to comment https://forums.phpfreaks.com/topic/173475-left-join-extremely-slow-any-way-to-remedy-16000-lines/#findComment-914434 Share on other sites More sharing options...
JChilds Posted September 8, 2009 Author Share Posted September 8, 2009 I thought using Where sales.Trans_ID = ID (which will only return around 15 lines) would make it faster. And for those 15 lines, select where sales.barcode = items.barcode and return items.description would be then searching the 16,000 line in items 15 times... Is there anything I can do about this. I can't see how using LIMIT will help. (unless there is a way of limiting the join to one items.barcode per sales.barcode [ as there will only be one matching for each anyway]) Quote Link to comment https://forums.phpfreaks.com/topic/173475-left-join-extremely-slow-any-way-to-remedy-16000-lines/#findComment-914438 Share on other sites More sharing options...
DavidAM Posted September 8, 2009 Share Posted September 8, 2009 [*]Create an index on sales.Trans_ID [*]Create an index on sales.barcode [*]Create an index on items.barcode [*]Avoid using SELECT * unless you absolutely need every column in the table; list the columns individually Are the sales.barcode and items.barcode columns defined using (exactly) the same datatype? What are the row counts on these two tables? Is the LEFT JOIN necessary? Is it possible to have a sale without an item? If not, remove LEFT and use a straight JOIN. Post the table descriptions and row counts and let's see what we can see Quote Link to comment https://forums.phpfreaks.com/topic/173475-left-join-extremely-slow-any-way-to-remedy-16000-lines/#findComment-914490 Share on other sites More sharing options...
JChilds Posted September 8, 2009 Author Share Posted September 8, 2009 Are the sales.barcode and items.barcode columns defined using (exactly) the same datatype? This was the problem. I don't know how I overlooked that. Thankyou very much for your help. Will putting an index on sales.barcode and items.barcode help at all? Becasue they are each unique (within their own table) Currently the items table has 16,000 rows (took five years to populate that many) The sales table currently has only ~20 rows(to test). It will grow by ~1 million per year, although the Trans_ID Cardinality will only grow by ~50,000 per year so should be manageable. Quote Link to comment https://forums.phpfreaks.com/topic/173475-left-join-extremely-slow-any-way-to-remedy-16000-lines/#findComment-915115 Share on other sites More sharing options...
DavidAM Posted September 9, 2009 Share Posted September 9, 2009 Will putting an index on sales.barcode and items.barcode help at all? Becasue they are each unique (within their own table) Almost definitely. Since your where clause is limiting on trans_id, an index here will help find the sales records quicker. Once the sales records are found, we need a quick way to find the items records. That means an index is needed on items.barcode (unique or not). The index on sales.barcode may or may not help with this query. Do you rebuild (or refresh, or whatever its called on your server) your indexes periodically? Considering the size of the items table and how long you have been collecting data, the table (and indexes) may be fragmented. Refreshing them will help make searches quicker. Some servers might just ignore the index if it is too badly fragmented and do a table scan anyway. Of course, I'm speaking from experience that is years old, but the concepts apply. Indexes are (usually) faster to search than a table scan. Fresh indexes are in order and can be searched even faster. Also, if all of the data (from a particular table) that is needed by a query is contained in a single index on that table, the server may scan only that index and never look at the actual data blocks. This is one reason to avoid using SELECT *. For instance, if you have a single index on sales.trans_id, sales.barcode, and an index on items.barcode, and all you were returning is the barcode, this query could run totally against the indexes. Since there are fewer columns in the index, there is more "data" per block so fewer disk reads are required. The values are the same as in the data segment, so there's no sense in looking there. Again, I'm speaking from years-old knowledge of the internals of a particular database server (almost definitely NOT the one you are using) and your server may behave differently. But the logic makes sense. Give the server as much opportunity to find the data as quickly as it can and it will be your friend. Most database servers provide a way to view the plan that is generated for a query. That should show you what indexes are being used, intermediate sorts, etc. Sometimes the plan is different for SELECT *, don't know why, except to say that once the server sees the *, it KNOWS it will have to read the data, so if some index is big enough it may decide a table scan would be faster anyway. Also, the plan may change as the table size changes (and indexes fragment). Also, and I don't mean to preach, SELECT * means more data read, more data extracted, more data sent to the client ... it chews up resources like mad (database-server cpu, database-server memory, bandwidth, client memory, client cpu). Quote Link to comment https://forums.phpfreaks.com/topic/173475-left-join-extremely-slow-any-way-to-remedy-16000-lines/#findComment-915388 Share on other sites More sharing options...
fenway Posted September 10, 2009 Share Posted September 10, 2009 The LEFT JOIN vs INNER JOIN is going to be signficiant here. Quote Link to comment https://forums.phpfreaks.com/topic/173475-left-join-extremely-slow-any-way-to-remedy-16000-lines/#findComment-916308 Share on other sites More sharing options...
corbin Posted September 10, 2009 Share Posted September 10, 2009 Errrr... Were you saying up there that SELECT * has something to do with indexes? What you're selecting doesn't affect indexes being used as far as I know. Also, I have some issues with your table schema. CREATE TABLE IF NOT EXISTS `items` ( `Barcode` varchar(20) NOT NULL, `Code` text NOT NULL, `Description` varchar(34) NOT NULL, `Qty` int(3) NOT NULL, `Price` text NOT NULL, `Department` varchar(64) NOT NULL, `Supplier` varchar(64) NOT NULL, KEY `Barcode` (`Barcode`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1; Based on your data dump, barcode is always a number. Numeric datatypes take up less space and index more quickly than textual data types. In other words, if barcode will always be a number, it should be a numeric type. (Edit: Oh, just saw that your barcodes are not always numeric....) Also, if barcode is always going to be unique, then you could go ahead and make it a primary key. Why is price a text field? That should be of the type DECIMAL(). Department should probably be an int that maps to another table instead of repeating the depo name over and over again. Same goes with supplier. Normalizing the department and supplier fields will in force data conformity and take less space. CREATE TABLE IF NOT EXISTS `tendered` ( `Timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, `Trans_ID` int(11) NOT NULL, `Cash` text NOT NULL, `EFTPOS` text NOT NULL, `Total` text NOT NULL, `GST` text NOT NULL ) ENGINE=MyISAM DEFAULT CHARSET=latin1; It's generally bad practice to use a reserved name... You should rename the timestamp column if you don't already have code that depends on it. "int(11)" might as well just be "int". The length is really only useful when autofill is turned on (int(5) still uses 4 bytes for example), and a 32 bit int can only be 10 digits anyway. Cash and Total should be DECIMAL(), not text. Quote Link to comment https://forums.phpfreaks.com/topic/173475-left-join-extremely-slow-any-way-to-remedy-16000-lines/#findComment-916332 Share on other sites More sharing options...
DavidAM Posted September 10, 2009 Share Posted September 10, 2009 corbin, all very good points. Normalize the data and use logical datatypes. That will definitely improve performance. What you're selecting doesn't affect indexes being used as far as I know. I don't know the specifics on how MySql determines the plan for a query. But I think it could make a difference. For example: Table: Books; Columns: ID (auto-increment & Primary Key), ISBN, Title, Author, Publisher, Synopsis, some other stuff; Index: (ID, ISBN) Table: Sales; Columns: ID (auto-increment & Primary Key), dateSold, BookID, qty, price, maybe some other stuff too; Index: (dateSold, BookID) SELECT b.ID, b.ISBN, s.dateSold FROM Books b JOIN sales s ON s.BookID = b.ID WHERE dateSold BETWEEN 2008-01-01 AND 2008-12-31 This query is covered by the indexes. If you use the Date index on the Sales table, and the ISBN index on the Books table, you never have to look at the data pages. So there are fewer disk seeks and reads. Now if instead of writing the query that way, I write this (because I'm too lazy to be specific about what I want and I'll just reference the fields I need on the front-end): SELECT * FROM Books b JOIN sales s ON s.BookID = b.ID WHERE dateSold BETWEEN 2008-01-01 AND 2008-12-31 The server has to read the data pages because I've asked for all of the data from the tables. This would probably use the Date Index on Sales and the Primary Key (ID) on Books. Or it might just do a table scan on one table or the other. Or it might use the same indexes. However, it will definitely have to read the data pages in addition to whatever index it uses because it has to retrieve ALL of the data from the selected rows. And that was my primary point. Using Select * can cause additional reads that are not needed if you don't actually need everything. This is a forced example, but I think it makes the point. Back when I did this for a living, I actually needed to create composite indexes to cover a query that would have run much longer than acceptable without it. (I did not design the database, and could not correct the design as it was a third-party application). Yes, the additional indexes used a lot of disk space, but it was a trade-off. Quote Link to comment https://forums.phpfreaks.com/topic/173475-left-join-extremely-slow-any-way-to-remedy-16000-lines/#findComment-916423 Share on other sites More sharing options...
corbin Posted September 11, 2009 Share Posted September 11, 2009 I do not disagree that * reads/transfers unncessary data if the user does not plan to actually use all of it (and it causes the over head of having to get the column names). What I'm saying though is that the actual seek time (not read/transfer time) is entirely dependent on the WHERE clause, not the SELECT <cols> stuff. As far as I know, when ever the index is searched, a read still takes place once the rows are found. I could be wrong though, because as you're saying, logically if the data is already in the index, there's no reason to read the row also. If I get bored enough later I'll make a table with a couple million rows or so and see how much of a difference it takes. (Oh, also, it's worth noting that just because something makes sense doesn't mean that's the way MySQL does it... The way you're explaining makes more sense than reading the table too, but I'm fairly sure that MySQL will still read the table.) (Oh, also, in the above, when I say "read the table" I mean reading the chunk of the file on the hard drive [or from memory depending on setup] that corresponds with the data being pulled in the query.) Blerh, now I really have no idea... What you're saying does make more sense (if you're saying what I think you're saying)... *Heads off to google* Quote Link to comment https://forums.phpfreaks.com/topic/173475-left-join-extremely-slow-any-way-to-remedy-16000-lines/#findComment-916425 Share on other sites More sharing options...
JChilds Posted September 11, 2009 Author Share Posted September 11, 2009 I just changed items.price to decimal and now all the prices have been rounded to whole numbers....!! Quote Link to comment https://forums.phpfreaks.com/topic/173475-left-join-extremely-slow-any-way-to-remedy-16000-lines/#findComment-916466 Share on other sites More sharing options...
corbin Posted September 11, 2009 Share Posted September 11, 2009 http://dev.mysql.com/doc/refman/5.1/en/precision-math-decimal-changes.html I didn't say the full arguments of the DECIMAL type. The declaration syntax for a DECIMAL column is DECIMAL(M,D). The ranges of values for the arguments in MySQL 5.1 are as follows: M is the maximum number of digits (the precision). It has a range of 1 to 65. (Older versions of MySQL allowed a range of 1 to 254.) D is the number of digits to the right of the decimal point (the scale). It has a range of 0 to 30 and must be no larger than M. Quote Link to comment https://forums.phpfreaks.com/topic/173475-left-join-extremely-slow-any-way-to-remedy-16000-lines/#findComment-916467 Share on other sites More sharing options...
JChilds Posted September 11, 2009 Author Share Posted September 11, 2009 Woops, Makes more sense Now to revert to the dump I uploaded. Quote Link to comment https://forums.phpfreaks.com/topic/173475-left-join-extremely-slow-any-way-to-remedy-16000-lines/#findComment-916474 Share on other sites More sharing options...
corbin Posted September 11, 2009 Share Posted September 11, 2009 Hopefully you didn't lose any data. I should have mentioned that x.x. Quote Link to comment https://forums.phpfreaks.com/topic/173475-left-join-extremely-slow-any-way-to-remedy-16000-lines/#findComment-916476 Share on other sites More sharing options...
JChilds Posted September 11, 2009 Author Share Posted September 11, 2009 It's ok. Currently running both systems (the old, and my in progress new) side by side so I still have the old ones records (in ms sql), plus the ones I uploaded. I have written a touch screen POS in vb, and the database is mysql, which is where I wish I was more skilled. Quote Link to comment https://forums.phpfreaks.com/topic/173475-left-join-extremely-slow-any-way-to-remedy-16000-lines/#findComment-916487 Share on other sites More sharing options...
fenway Posted September 11, 2009 Share Posted September 11, 2009 As far as I know, when ever the index is searched, a read still takes place once the rows are found. I could be wrong though, because as you're saying, logically if the data is already in the index, there's no reason to read the row also. And mysql does precisely this if you happen to have a multi-column key and specify only such columns in the select list and where clause. Quote Link to comment https://forums.phpfreaks.com/topic/173475-left-join-extremely-slow-any-way-to-remedy-16000-lines/#findComment-916660 Share on other sites More sharing options...
DavidAM Posted September 11, 2009 Share Posted September 11, 2009 That's too bad. It seems like such a waste. But good to know. I guess I'll stay off my soapbox for a while. Quote Link to comment https://forums.phpfreaks.com/topic/173475-left-join-extremely-slow-any-way-to-remedy-16000-lines/#findComment-916737 Share on other sites More sharing options...
fenway Posted September 11, 2009 Share Posted September 11, 2009 That's too bad. It seems like such a waste. But good to know. I guess I'll stay off my soapbox for a while. I think you've misunderstood -- there is no reading from the table in the aforementioned case. I was agreeing with the latter statement. Quote Link to comment https://forums.phpfreaks.com/topic/173475-left-join-extremely-slow-any-way-to-remedy-16000-lines/#findComment-916739 Share on other sites More sharing options...
DavidAM Posted September 11, 2009 Share Posted September 11, 2009 COOL! I did misunderstand. Thanks. Quote Link to comment https://forums.phpfreaks.com/topic/173475-left-join-extremely-slow-any-way-to-remedy-16000-lines/#findComment-916774 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.