Jump to content

What's the fastest way to search INNODB tables?


Fluoresce

Recommended Posts

I present videos on my site. The videos (their titles, descriptions, etc.) are stored in a MySQL database.

I want visitors to be able to search my database for videos.

My MySQL version is 5.5.40-36.1.

The storage engine of my tables is INNODB.

The two columns I want searched are:

  1. title (varchar; unique index)
  2. description (text; no index)

According to my research, the best way to allow users to search my database would be to use FULLTEXT, but my INNODB tables are incompatible. FULLTEXT is available for INNODB in MySQL 5.6.4 and above.

I would rather not use LIKE '%...%', and I would rather not use a third party solution such as Sphinx because my database will never grow particularly large (I doubt it'll ever exceed 6,000–8,000 videos).

What do you guys recommend? Is it okay to use LIKE with a database containing 6,000–8,000 items?

Link to comment
Share on other sites

you could change

+----------------+
| video (innodb) |
+----------------+
|  video_id      |
|  title         |
|  description   |
|  author        |
|  duration      |
|  etc           |
+----------------+

to

+----------------+        +----------------+                     
| video (innodb) |        | search (myisam)| 
+----------------+        +----------------+
|  video_id      |--------|  video_id      |
|                |        |  title         | (fulltext)
|                |        |  description   | (fulltext)
|  author        |        +----------------+
|  duration      |        
|  etc           |
+----------------+

then

SELECT .... FROM video INNER JOIN search USING (video_id) WHERE MATCH ... AGAINST ...

  • Like 1
Link to comment
Share on other sites

Thanks for the suggestion, Barand, but that would make my life difficult when it comes to reviewing the videos that are added to my database (they are added by other people). I would have to review the contents of both tables before accepting a video.

 

What do you think of the following idea?

 

  1. Create a new MyISAM table called search or whatever with just two columns: (1) video_id and (2) title_and_description.
  2. Strip the HTML tags, HTML entities and all punctuation from the videos' titles and descriptions.
  3. Insert each video's title and description into the title_and_description field, which will have a FULLTEXT index.

 

Then, when users conduct a query, I can search the search table but select from the videos table.

 

Of course, I would have to keep updating the search table, maybe by way of a cron job.

 

Is this idea terrible? Are there any better ideas?

Link to comment
Share on other sites

I don't see why you need a cron job. Make the writing of the "search" record part of the acceptance process. You just need a one-off process to create the table from your existing video records.

 

Also you don't have to merge the fields into one to have a full text index. You can index multiple fields.

Link to comment
Share on other sites

there is any reason why your video table need to use the Innodb storage engine?.... are you implementing referential integrity or transactions on it?

 

To tell you the truth, I don't know anything about storage engines. I thought I need InnoDB because I use a foreign key constraint. However, I'm not even sure if the constraint is necessary.

 

Please let me explain my setup, and tell me if I can switch my video table to MyISAM.

 

Basically, I have two tables:

 

video_table

video_id

category_id

subcategory_id

title

description

embed_code

views

thumbnail

 

category_table

category_id

category_name

 

As you can see, each video can be assigned to two categories with:

 

video_table.category_id

video_table.subcategory_id

 

These two columns have foreign key constraints. In other words, I can't delete or update a category in the category table if a video has been assigned to that category (unless, of course, I change the foreign key setting to cascade, null, or whatever).

 

That's the only reason I was using InnoDB. It just prevents me from accidentally deleting a category that's being used.

 

The truth is, I never touch the category table; it's almost never altered in any way. I might add some more categories to it in the future, but that's it. On the other hand, the video table is added to every day.

 

Do I need InnoDB?

 

I don't even know what "implementing referential integrity or transactions" means! :happy-04:

Edited by Fluoresce
Link to comment
Share on other sites

There are other reasons to utilize innodb besides the ones mentioned.  I highly recommend you stay with it.  

 

You could use Barand's suggestion -- have a denormalized search table, where you simply replicate the values into a small myisam table for the benefit of the search.   The coding and interruption required would be minimal. You'd simply need to write the replication script and schedule it in cron.  

 

To your original question one of the benefits of innodb is that it has a true data cache.  It's called the "buffer pool".  If you have sufficient resources on your server, you could increase the size of the buffer pool to insure that data is coming from cache.  It is often possible with a small fairly stable database like the one you describe where you are almost entirely "READ/SELECT" based, to have a pool where the data and indexes for the entire table will be in the buffer pool.  

 

At that point a SELECT '%...' will be far less disruptive than normally because it will be coming from memory.  An 8-10k row table is tiny in the database world.

 

You would have to invest some time trying to figure out the size of your overall database and the tables in particular, and you'd need to understand your overall memory usage to determine if you could allocate more RAM to mysql that you currently do.

 

You can start by looking at http://dev.mysql.com/doc/refman/5.5/en/innodb-buffer-pool.html

 

The innodb_buffer_pool_size and innodb_buffer_pool_instances are the only params you really need to understand and possibly change.   Whatever you do, these are good params to understand.  Although it might be a stretch if you are a novice sysadmin, the free tool, innotop is fantastic for making it easy to monitor the effectiveness of your caching and figuring out your cache hit ratio.  I would think based on your description that you should be aiming for close to 100% cache hit with your innodb tables.

  • Like 1
Link to comment
Share on other sites

I don't see why you need a cron job. Make the writing of the "search" record part of the acceptance process. You just need a one-off process to create the table from your existing video records.

 

:confused:

 

Please elaborate, Barand. Remember that I know very little about databases and MySQL. :happy-04:

Link to comment
Share on other sites

The innodb_buffer_pool_size and innodb_buffer_pool_instances are the only params you really need to understand and possibly change.

 

Thanks, Gizmola, but this is all a bit complicated for a novice such as me. Please tell me if I understand you correctly.

 

InnoDB has a data cache. By adjusting the following parameters, I can make InnoDB use the data cache instead of repeatedly running queries.

 

innodb_buffer_pool_size

innodb_buffer_pool_instances.

 

If I can do this, then I can keep InnoDB and use LIKE '%...%' statements on my existing table without having to create an additional MyISAM table.

 

Is that correct?

 

But I use shared hosting. Do I even have access to innodb_buffer_pool_size and innodb_buffer_pool_instances?

Edited by Fluoresce
Link to comment
Share on other sites

:confused:

 

Please elaborate, Barand. Remember that I know very little about databases and MySQL. :happy-04:

From what you have said it seems there is a two step process

  1. A video is submitted by the user
  2. You vet the video for acceptance and it is then loaded to the site and video table updated

On that second step you could write the record for the video to the search table.

  • Like 1
Link to comment
Share on other sites

MySQL is an odd database because it has pluggable engines that can work entirely differently.  MyISAM is the base engine and of course InnoDB was a pluggable engine that became very popular.  

 

Just about all DB's have some form of caching for queries.  In the case of MyISAM this extends to the parsing of the queries themselves but not the actual data.  With the buffer_pool in innodb -- the actual data retrieved is cached.  So, in terms of what you were asking, when you do a LIKE '%...' query, the table will have to be scanned starting at row 1 and continuing to the last row in order to return a result.  With Innodb and a large enough buffer pool, essentially the data will already be in memory and the tablescan process will be much faster than it would be reading the data from disk.

 

Unfortunately, with shared hosting, you are not in control of those parameters as you're sharing a mysql server your hosting company provides, either on your server or some secondary server they host.

 

In the shared host scenario, I would advise you to implement Barand's suggestion.  

  • Like 1
Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.