optimizing 5GB 23 million table queries

PyraX · August 3, 2012

Hi,

I have a table mysql table with 23million records that is over 5GB and performing select queries such as WHERE x like '%y%' is taking forever.

Using:

AppServ 2.5.10

Apache 2.2.8

PHP 5.2.6

MySQL 5.0.51b

phpMyAdmin-2.10.3

Software and configuration wise what can I do to run these queries faster?

There is only 1 user that needs to access it

Willing to use cloud server also

table is very simple:

CREATE TABLE IF NOT EXISTS `data` (

`did` int(100) NOT NULL auto_increment,

`dfile` varchar(255) NOT NULL,

`0` varchar(255) NOT NULL,

`1` varchar(255) NOT NULL,

`2` varchar(255) NOT NULL,

`3` varchar(255) NOT NULL,

`4` varchar(255) NOT NULL,

`5` varchar(255) NOT NULL,

`6` varchar(255) NOT NULL,

`7` varchar(255) NOT NULL,

`8` varchar(255) NOT NULL,

`9` varchar(255) NOT NULL,

`10` varchar(255) NOT NULL,

`11` varchar(255) NOT NULL,

`12` varchar(255) NOT NULL,

`13` varchar(255) NOT NULL,

`14` varchar(255) NOT NULL,

`15` varchar(255) NOT NULL,

`16` varchar(255) NOT NULL,

`17` varchar(255) NOT NULL,

`18` varchar(255) NOT NULL,

`19` varchar(255) NOT NULL,

`20` varchar(255) NOT NULL,

`21` varchar(255) NOT NULL,

`22` varchar(255) NOT NULL,

`23` varchar(255) NOT NULL,

PRIMARY KEY (`did`)

) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=23415599 ;

Thanks in advance

PyraX

DavidAM · August 4, 2012

if WHERE x like '%y%' is a requirement, you will have to use a FULLTEXT index or some other indexing scheme.

Anytime you use a wildcard at the start of a LIKE expression, it prevents the server from using an index for that field. So, if this is the only condition in your WHERE clause, the server will have to read EVERY row in the table. If you can remove the first wildcard (like so): WHERE x like 'y%' then you can create an index on "x", and the server will (have the option to) use it to read fewer rows.

You might consider re-designing your database. Anytime you have non-meaningful names for columns, including numbered column names for "repeating" data; it generally means the database is not normalized. Contrary to what many think, fewer tables does not necessarily make a better database. If you can normalize that database, you could more than likely get better performance, even with 24 times as many rows.

cpd · August 4, 2012

You might consider re-designing your database. Anytime you have non-meaningful names for columns, including numbered column names for "repeating" data; it generally means the database is not normalized. Contrary to what many think, fewer tables does not necessarily make a better database. If you can normalize that database, you could more than likely get better performance, even with 24 times as many rows.

This may seem a little pointless but I wanted to emphasise this point. A relational database should be normalised ideally to the 3rd form minimum (you can research the various normalisation forms online). I would argue numbered columns are not normalised as opposed to "generally means" and strongly recommend re-designing the database as its usually an integral part of your system.

Again, apologies if that seemed a little unnecessary but I feel quite strongly about normalisation and whilst I'm not perfect, do insist upon it provided its suitable for the given scenario.

Sign In

optimizing 5GB 23 million table queries

Recommended Posts

PyraX

Link to comment

Share on other sites

DavidAM

Link to comment

Share on other sites

cpd

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information