Jump to content

[SOLVED] SQLite, LIKE vs regular expression?


Axeia

Recommended Posts

I'm about to head off to bed and hoping to safe myself some work tomorrow..

 

I got a SQLstatement that's like this:

SELECT * FROM File WHERE downloaded IS NULL AND filename LIKE '%Open%' AND filename LIKE '%File%';

So in this example I'm looking if the strings "Open" and "File" occur in the column filename, and if they do then it's a result I'm interested in.

 

Now "Open" and "File" were just to illustrate the idea, instead of those 2 it can be somewhere between 1 and ~6 words with PHP building up either query (with either a regex or LIKE's).

Would it be faster to use a regular expression or a bunch of LIKE's like I have now?

 

I don't want the words to occur in a particular order, so I'm not sure how the regular expression would look like.. have always been looking for a particular order of things up till now.

Link to post
Share on other sites

Well laffin, I'm not sure if that's what Axeia is going after (I hope it helps).

I just saw the string of LIKE and AND keywords, and thought it might be easier to manage via REGEXP instead. As far as speed of execution of one method vs the other is concerned, I couldn't be sure to be honest.

 

I'll find out soon enough if what I suggested is heading in the right direction or not...

Link to post
Share on other sites

Almost, but I want both to occur before a match is made, so more like

SELECT * FROM File WHERE downloaded IS NULL AND filename REGEXP('.*Open.*File.*');

but I can't figure out how to toggle the case insensitive flag..

Link to post
Share on other sites

I've become quite adept at writing them, it's the reading later on if you don't document them properly that's a killer.

But I usually try to substr / strpos my way trough things as it's many times faster.

 

In this case it's hardly an option where I'm using a SQL(ite) query to get the results, cause doing things in the query itself instead of just selecting something big and then letting PHP get the proper results is

A) Bad practice, will make it hard(er) to port the application to another language. Queries tend to be more portable.

B) Slower (or well more often than not it is)

 

I'll just mark this as resolved, as I realised I do actually want to the words in a particular order. Decided to use a single query as well as sqlite doesn't seem to like a lot of small selects.[/code]

Link to post
Share on other sites

Almost, but I want both to occur before a match is made, so more like

SELECT * FROM File WHERE downloaded IS NULL AND filename REGEXP('.*Open.*File.*');

but I can't figure out how to toggle the case insensitive flag..

 

Well, as far as I understand it, the records fetched will be in alphabetical order in accordance to the complete WHERE statement..

As for your regex pattern, you are looking for all those words to be in a filename field (in other words, all these words MUST be in a filename field) where I understood to look for either Open or File, so perhaps I misunderstood...

 

According to the mySQL Dev zone, to make it case insensitive, use 'binary':

 

SELECT * FROM FILE WHERE downloaded IS NULL AND filename REGEXP BINARY ('.*Open.*File.*'); ?

 

Granted, when I tested REGEXP involving peoples' names for example, whatever I was looking for, I could use simply lower case, and REGEXP could find it regardless.. so if there are two people I am interested in finding (Wilkins and Kramer):

 

WHERE transactions.cust_id = customers.cust_id AND cust_name REGEXP 'kra|wil';

 

Both Wilkins and Kramer's records show up. So I'm not sure if it is a mySQL configuration thing, or if the data type of the table in interest comes into play in this regard. When I used BINARY in the mix, it didn't yield any results.. so I must do some more self research into BINARY...

 

EDIT - Nevermind, just re-read in the link I provided:

REGEXP is not case sensitive, except when used with binary strings.

 

Link to post
Share on other sites

defo would liek to the line is smaller less code when using regex

I checked beforehand and the maximum length for a query is 1 million bytes, so not really a problem.

The query itself is created by PHP anyway so the typing is minimal :).

Ended up with this in case you find it interesting

if( $epInfo['prefix'] != '' )
{
$specials = empty( $epInfo['specialTag'] ) ? '' : '%'.implode( "%", $epInfo['specialTag'] ); 

$queryLikes[] = sprintf( 
               '"%s%%%s%%%s%%" OR filename LIKE "%s%%%s%%"',
	$tmp = sqlite_escape_string'_', '__', $epInfo['prefix'] ),
	$epInfo['episode'],
	'v'.$epInfo['versionPlusOne'].$specials,
	$tmp,
	$epInfo['episodePlusOne'].$specials
);
}

And yes.. that's not exactly pretty with the % escaping %.. makes it look very messy.

Link to post
Share on other sites

defo would liek to the line is smaller less code when using regex

I checked beforehand and the maximum length for a query is 1 million bytes, so not really a problem.

The query itself is created by PHP anyway so the typing is minimal :).

Ended up with this in case you find it interesting

if( $epInfo['prefix'] != '' )
{
$specials = empty( $epInfo['specialTag'] ) ? '' : '%'.implode( "%", $epInfo['specialTag'] ); 

$queryLikes[] = sprintf( 
               '"%s%%%s%%%s%%" OR filename LIKE "%s%%%s%%"',
	$tmp = sqlite_escape_string'_', '__', $epInfo['prefix'] ),
	$epInfo['episode'],
	'v'.$epInfo['versionPlusOne'].$specials,
	$tmp,
	$epInfo['episodePlusOne'].$specials
);
}

And yes.. that's not exactly pretty with the % escaping %.. makes it look very messy.

 

we should start talking more about regex on this site because on the frameworks its used everywhere, validation etc and other things

Link to post
Share on other sites

They're also known for being hard to maintain and read ;)

and of course the famous

"You have a problem, you decide to use a regular expression. Now you have 2 problems".

 

But yeah they can do some pretty nifty things, I just tend to use the alternative if there is any. Most of the time as it's faster and easier to read. (though this one might be the exception as far as readability is concerned ;) )

Link to post
Share on other sites

for me the main advantage would be to do somthing in a tiny peice of code and then just comment next to that one functioner what it does, and also not having to do any iteration etc just being able to do it from the power of a expression is a powerfull high level thing

Link to post
Share on other sites
  • 4 weeks later...

Just wondering if yer looking for BOTH OPEN and FILE,

couldnt u use LIKE like:

SELECT * FROM File WHERE downloaded IS NULL AND filename LIKE '%Open%File%';

 

A percent symbol ("%") in the pattern matches any sequence of zero or more characters in the string. An underscore ("_") in the pattern matches any single character in the string. Any other character matches itself or it's lower/upper case equivalent (i.e. case-insensitive matching). (A bug: SQLite only understands upper/lower case for ASCII characters. The LIKE operator is case sensitive for unicode characters that are beyond the ASCII range. For example, the expression 'a' LIKE 'A' is TRUE but 'æ' LIKE 'Æ' is FALSE.).
Link to post
Share on other sites

just a thought the search query can contain up to1-6 keywords as u say

so not shure the above query is wut yer looking for. as that would result in a search pattern where all keywords must be in a specific sequence.

so maybe if yer keywords are in an array.

 

I checked (REGEXP) in SQLITE is user defined, otherwise it returns an error.

but if u need to do it in sqlite maybe using LIKE and ANDS will work

 

than came up with this code, prolly some optimization is in due order.

but it generates the LIKE as expected :)

<?php
$keywords=array("open","file");

function sqlesc($var,$quote=TRUE)
{
return is_int($var)?$var:(is_string($var)?(($quote?"'":'').sqlite_escape_string($var).($quote?"'":'')):FALSE);
}
function sqllike(&$var,$key,$param)
{
$var= "{$param} LIKE '%".sqlesc($var,FALSE)."%'";
}

$param="filename";
array_walk($keywords,'sqllike',$param);
$like=implode(' AND ',$keywords);
echo $like;


?>

Link to post
Share on other sites

To be fair I've changed my approach again since then (appropriate the effort though).

I'm using a simple preselection as the base (with LIKE) and then use PHP to further filter down the values  as I now got both a whitelist and a blacklist which would mean a ridiculously complicated query something would have to match against which I'd guess would be really slow.

 

As I'm guessing LIKE (if equal in number to REGEXP) would be the faster one of the TWO and has few false positives I took that as the base.

This leaves PHP with little work to do, but does mean my application might have become a bit harder to port to another language (and I can't seem to get PHP + QT working) but mweh.. Haven't decided yet if I will at all and if I do I still don't know whether it will be to java or C# (mono).

Link to post
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.