Jump to content

Archived

This topic is now archived and is closed to further replies.

Axeia

[SOLVED] SQLite, LIKE vs regular expression?

Recommended Posts

I'm about to head off to bed and hoping to safe myself some work tomorrow..

 

I got a SQLstatement that's like this:

SELECT * FROM File WHERE downloaded IS NULL AND filename LIKE '%Open%' AND filename LIKE '%File%';

So in this example I'm looking if the strings "Open" and "File" occur in the column filename, and if they do then it's a result I'm interested in.

 

Now "Open" and "File" were just to illustrate the idea, instead of those 2 it can be somewhere between 1 and ~6 words with PHP building up either query (with either a regex or LIKE's).

Would it be faster to use a regular expression or a bunch of LIKE's like I have now?

 

I don't want the words to occur in a particular order, so I'm not sure how the regular expression would look like.. have always been looking for a particular order of things up till now.

Share this post


Link to post
Share on other sites

Could you not use the REGEXP keyword?

 

SELECT * FROM FILE WHERE downloaded IS NULL AND filename REGEXP 'Open|File';

Share this post


Link to post
Share on other sites

Heh, the tricks ya can pick up here :)

 

Thanks nrg, shure this will help a lot of ppl out.

 

Share this post


Link to post
Share on other sites

Well laffin, I'm not sure if that's what Axeia is going after (I hope it helps).

I just saw the string of LIKE and AND keywords, and thought it might be easier to manage via REGEXP instead. As far as speed of execution of one method vs the other is concerned, I couldn't be sure to be honest.

 

I'll find out soon enough if what I suggested is heading in the right direction or not...

Share this post


Link to post
Share on other sites

Almost, but I want both to occur before a match is made, so more like

SELECT * FROM File WHERE downloaded IS NULL AND filename REGEXP('.*Open.*File.*');

but I can't figure out how to toggle the case insensitive flag..

Share this post


Link to post
Share on other sites

iv been thinking for more time to implament more regex its just so cryptic untill u memorize it practice makes perfect.

Share this post


Link to post
Share on other sites

I've become quite adept at writing them, it's the reading later on if you don't document them properly that's a killer.

But I usually try to substr / strpos my way trough things as it's many times faster.

 

In this case it's hardly an option where I'm using a SQL(ite) query to get the results, cause doing things in the query itself instead of just selecting something big and then letting PHP get the proper results is

A) Bad practice, will make it hard(er) to port the application to another language. Queries tend to be more portable.

B) Slower (or well more often than not it is)

 

I'll just mark this as resolved, as I realised I do actually want to the words in a particular order. Decided to use a single query as well as sqlite doesn't seem to like a lot of small selects.[/code]

Share this post


Link to post
Share on other sites

Almost, but I want both to occur before a match is made, so more like

SELECT * FROM File WHERE downloaded IS NULL AND filename REGEXP('.*Open.*File.*');

but I can't figure out how to toggle the case insensitive flag..

 

Well, as far as I understand it, the records fetched will be in alphabetical order in accordance to the complete WHERE statement..

As for your regex pattern, you are looking for all those words to be in a filename field (in other words, all these words MUST be in a filename field) where I understood to look for either Open or File, so perhaps I misunderstood...

 

According to the mySQL Dev zone, to make it case insensitive, use 'binary':

 

SELECT * FROM FILE WHERE downloaded IS NULL AND filename REGEXP BINARY ('.*Open.*File.*'); ?

 

Granted, when I tested REGEXP involving peoples' names for example, whatever I was looking for, I could use simply lower case, and REGEXP could find it regardless.. so if there are two people I am interested in finding (Wilkins and Kramer):

 

WHERE transactions.cust_id = customers.cust_id AND cust_name REGEXP 'kra|wil';

 

Both Wilkins and Kramer's records show up. So I'm not sure if it is a mySQL configuration thing, or if the data type of the table in interest comes into play in this regard. When I used BINARY in the mix, it didn't yield any results.. so I must do some more self research into BINARY...

 

EDIT - Nevermind, just re-read in the link I provided:

REGEXP is not case sensitive, except when used with binary strings.

 

Share this post


Link to post
Share on other sites

defo would liek to the line is smaller less code when using regex

I checked beforehand and the maximum length for a query is 1 million bytes, so not really a problem.

The query itself is created by PHP anyway so the typing is minimal :).

Ended up with this in case you find it interesting

if( $epInfo['prefix'] != '' )
{
$specials = empty( $epInfo['specialTag'] ) ? '' : '%'.implode( "%", $epInfo['specialTag'] ); 

$queryLikes[] = sprintf( 
               '"%s%%%s%%%s%%" OR filename LIKE "%s%%%s%%"',
	$tmp = sqlite_escape_string'_', '__', $epInfo['prefix'] ),
	$epInfo['episode'],
	'v'.$epInfo['versionPlusOne'].$specials,
	$tmp,
	$epInfo['episodePlusOne'].$specials
);
}

And yes.. that's not exactly pretty with the % escaping %.. makes it look very messy.

Share this post


Link to post
Share on other sites

defo would liek to the line is smaller less code when using regex

I checked beforehand and the maximum length for a query is 1 million bytes, so not really a problem.

The query itself is created by PHP anyway so the typing is minimal :).

Ended up with this in case you find it interesting

if( $epInfo['prefix'] != '' )
{
$specials = empty( $epInfo['specialTag'] ) ? '' : '%'.implode( "%", $epInfo['specialTag'] ); 

$queryLikes[] = sprintf( 
               '"%s%%%s%%%s%%" OR filename LIKE "%s%%%s%%"',
	$tmp = sqlite_escape_string'_', '__', $epInfo['prefix'] ),
	$epInfo['episode'],
	'v'.$epInfo['versionPlusOne'].$specials,
	$tmp,
	$epInfo['episodePlusOne'].$specials
);
}

And yes.. that's not exactly pretty with the % escaping %.. makes it look very messy.

 

we should start talking more about regex on this site because on the frameworks its used everywhere, validation etc and other things

Share this post


Link to post
Share on other sites

They're also known for being hard to maintain and read ;)

and of course the famous

"You have a problem, you decide to use a regular expression. Now you have 2 problems".

 

But yeah they can do some pretty nifty things, I just tend to use the alternative if there is any. Most of the time as it's faster and easier to read. (though this one might be the exception as far as readability is concerned ;) )

Share this post


Link to post
Share on other sites

for me the main advantage would be to do somthing in a tiny peice of code and then just comment next to that one functioner what it does, and also not having to do any iteration etc just being able to do it from the power of a expression is a powerfull high level thing

Share this post


Link to post
Share on other sites

Just wondering if yer looking for BOTH OPEN and FILE,

couldnt u use LIKE like:

SELECT * FROM File WHERE downloaded IS NULL AND filename LIKE '%Open%File%';

 

A percent symbol ("%") in the pattern matches any sequence of zero or more characters in the string. An underscore ("_") in the pattern matches any single character in the string. Any other character matches itself or it's lower/upper case equivalent (i.e. case-insensitive matching). (A bug: SQLite only understands upper/lower case for ASCII characters. The LIKE operator is case sensitive for unicode characters that are beyond the ASCII range. For example, the expression 'a' LIKE 'A' is TRUE but 'æ' LIKE 'Æ' is FALSE.).

Share this post


Link to post
Share on other sites

just a thought the search query can contain up to1-6 keywords as u say

so not shure the above query is wut yer looking for. as that would result in a search pattern where all keywords must be in a specific sequence.

so maybe if yer keywords are in an array.

 

I checked (REGEXP) in SQLITE is user defined, otherwise it returns an error.

but if u need to do it in sqlite maybe using LIKE and ANDS will work

 

than came up with this code, prolly some optimization is in due order.

but it generates the LIKE as expected :)

<?php
$keywords=array("open","file");

function sqlesc($var,$quote=TRUE)
{
return is_int($var)?$var:(is_string($var)?(($quote?"'":'').sqlite_escape_string($var).($quote?"'":'')):FALSE);
}
function sqllike(&$var,$key,$param)
{
$var= "{$param} LIKE '%".sqlesc($var,FALSE)."%'";
}

$param="filename";
array_walk($keywords,'sqllike',$param);
$like=implode(' AND ',$keywords);
echo $like;


?>

Share this post


Link to post
Share on other sites

To be fair I've changed my approach again since then (appropriate the effort though).

I'm using a simple preselection as the base (with LIKE) and then use PHP to further filter down the values  as I now got both a whitelist and a blacklist which would mean a ridiculously complicated query something would have to match against which I'd guess would be really slow.

 

As I'm guessing LIKE (if equal in number to REGEXP) would be the faster one of the TWO and has few false positives I took that as the base.

This leaves PHP with little work to do, but does mean my application might have become a bit harder to port to another language (and I can't seem to get PHP + QT working) but mweh.. Haven't decided yet if I will at all and if I do I still don't know whether it will be to java or C# (mono).

Share this post


Link to post
Share on other sites

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.