MikeM-2468 Posted March 1, 2013 Share Posted March 1, 2013 I think I need to use a regex to match part of a string. I've used preg_match before but my brain hasn't grasped the intricacies of adding regex. For background, I'm grabbing input from a form then querying MySQL to see if there is a match in the database. I'm looking for a match of 5 consecutive characters. $input = "test12345"; $found_match = "west1298"; or $found_match = "test1278"; Is preg_match enough or do I need more? Quote Link to comment https://forums.phpfreaks.com/topic/275086-match-a-chunk-of-text/ Share on other sites More sharing options...
requinix Posted March 1, 2013 Share Posted March 1, 2013 MySQL doesn't support the regex syntax you would need for this, and running all the data through PHP would be crazy. Any alternatives? Quote Link to comment https://forums.phpfreaks.com/topic/275086-match-a-chunk-of-text/#findComment-1415873 Share on other sites More sharing options...
MikeM-2468 Posted March 4, 2013 Author Share Posted March 4, 2013 I'm open to suggestions. Quote Link to comment https://forums.phpfreaks.com/topic/275086-match-a-chunk-of-text/#findComment-1416470 Share on other sites More sharing options...
Christian F. Posted March 4, 2013 Share Posted March 4, 2013 (edited) You may be able to use similar_text () for this, but it doesn't match consecutive substrings. That said, there's some comments there that might be useful, not to mention a reference to a book which contains lot of knowledge on stuff like this. Edited March 4, 2013 by Christian F. Quote Link to comment https://forums.phpfreaks.com/topic/275086-match-a-chunk-of-text/#findComment-1416498 Share on other sites More sharing options...
requinix Posted March 4, 2013 Share Posted March 4, 2013 For short search strings you can WHERE field LIKE "%test1%" OR field LIKE "%est12%" OR field LIKE "%st123%" OR field LIKE "%t1234%" OR field LIKE "%12345%"(ie, a bunch of LIKEs over each set of five consecutive characters) A variation of that would be using a kind of index table with two columns: a WHERE field = "test1" OR field = "est12" OR field = "st123" OR field = "t1234" OR field = "12345"The difference is that this query would perform a lot faster (if you index that one field) and allow you to search on longer strings. Quote Link to comment https://forums.phpfreaks.com/topic/275086-match-a-chunk-of-text/#findComment-1416520 Share on other sites More sharing options...
MikeM-2468 Posted March 4, 2013 Author Share Posted March 4, 2013 Since regex won't work in the MySQL query, can I read all of the entries into an array and then match to the array with regex? The query won't be very large - maybe 10 results, growing by about 5 per year. Quote Link to comment https://forums.phpfreaks.com/topic/275086-match-a-chunk-of-text/#findComment-1416551 Share on other sites More sharing options...
requinix Posted March 4, 2013 Share Posted March 4, 2013 If you have a small number like that, sure. Quote Link to comment https://forums.phpfreaks.com/topic/275086-match-a-chunk-of-text/#findComment-1416559 Share on other sites More sharing options...
MikeM-2468 Posted March 4, 2013 Author Share Posted March 4, 2013 OK. In that case, what's the regex voodoo to get the match I need? Quote Link to comment https://forums.phpfreaks.com/topic/275086-match-a-chunk-of-text/#findComment-1416560 Share on other sites More sharing options...
requinix Posted March 4, 2013 Share Posted March 4, 2013 If you can make sure that the input doesn't contain a certain character, either by validating it or removing any found, and pretty much any character will do, then you can do something like /([^#]{5}).*?\#.*?\1/It tries to find five characters on the left of a # then the same five on the right. You'd match it against the string $input . "#" . $found_match. However the simplest way would be a couple nested loops - not so bad when you consider how few passes they would make. $input = "test12345"; $found_matches = array("west1298", "test1278"); foreach ($found_matches as $match) { for ($i = 0, $ilen = strlen($input); $i + 5 <= $ilen; $i++) { if (strpos($match, substr($input, $i, 5)) !== false) { // found a match } } } Quote Link to comment https://forums.phpfreaks.com/topic/275086-match-a-chunk-of-text/#findComment-1416565 Share on other sites More sharing options...
Christian F. Posted March 5, 2013 Share Posted March 5, 2013 (edited) Not entirely sure this can be done with Regular Expressions, to be honest. If it is possible, then you'd probably be looking at a recursive pattern with named references and lookaheads. An extremely complex expression, in other words, which I suspect would require a lot of resources to compile. A better approach in this case would be to make a very simple tokenizer, and have it parse the strings character (group) incrementally. This is quite easily done by using mb_substr, mb_strpos and mb_strlen. Plus a loop. Using the MB functions to ensure that it doesn't break on multi-byte characters. Edited March 5, 2013 by Christian F. Quote Link to comment https://forums.phpfreaks.com/topic/275086-match-a-chunk-of-text/#findComment-1416721 Share on other sites More sharing options...
requinix Posted March 5, 2013 Share Posted March 5, 2013 (edited) Speaking of tokenizing, this problem is a form of the LCS problem with one key difference: if the two characters do not match then the new value is 0. Oh, and you can immediately return success if you hit five matching characters. l e l e p h o n e +---+---+---+---+---+---+---+---+---+ l | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | +---+---+---+---+---+---+---+---+---+ l | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | +---+---+---+---+---+---+---+---+---+ e | 0 | 2 | 0 | 2 | 0 | 0 | 0 | 0 | 1 | +---+---+---+---+---+---+---+---+---+ p | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | +---+---+---+---+---+---+---+---+---+ h | 0 | 0 | 0 | 0 | 0 | 4 | 0 | 0 | 0 | +---+---+---+---+---+---+---+---+---+ o | 0 | 0 | 0 | 0 | 0 | 0 | 5*| | | +---+---+---+---+---+---+---+---+---+ b a a c c b +---+---+---+---+---+---+ a | 0 | 1 | 1 | 0 | 0 | 0 | +---+---+---+---+---+---+ a | 0 | 1 | 2 | 0 | 0 | 0 | +---+---+---+---+---+---+ a | 0 | 1 | 2 | 0 | 0 | 0 | +---+---+---+---+---+---+ c | 0 | 0 | 0 | 3 | 1 | 0 | +---+---+---+---+---+---+ c | 0 | 0 | 0 | 1 | 4 | 0 | +---+---+---+---+---+---+ c | 0 | 0 | 0 | 1 | 2 | 0 | +---+---+---+---+---+---+ c | 0 | 0 | 0 | 1 | 2 | 0 | +---+---+---+---+---+---+ c | 0 | 0 | 0 | 1 | 2 | 0 | +---+---+---+---+---+---+ Plenty of room for optimizations too. [edit] Better examples. Edited March 5, 2013 by requinix Quote Link to comment https://forums.phpfreaks.com/topic/275086-match-a-chunk-of-text/#findComment-1416815 Share on other sites More sharing options...
MikeM-2468 Posted March 6, 2013 Author Share Posted March 6, 2013 I'm going to see what I can do with the PHP based stuff. Since it's a small result, I might be able to do it with mb_substr(), mb_strpos() and mb_strlen(). Quote Link to comment https://forums.phpfreaks.com/topic/275086-match-a-chunk-of-text/#findComment-1416945 Share on other sites More sharing options...
MikeM-2468 Posted March 6, 2013 Author Share Posted March 6, 2013 This seems to work for what I need: $input = "xxxxssworxxxickle"; $wordlist = array("password", "pickle", "passwest", "wordgame", "swords", "orxxx"); $charactercheckcount = 5; $charactercheckcountoffset = $charactercheckcount-1; for($x = 0; $x < count($wordlist); $x++) { $wordlistitem = $wordlist[$x]; $wordlistitemlength = strlen($wordlistitem); $loop = 0; while ($loop < $wordlistitemlength-$charactercheckcountoffset) { $checkstring = substr($wordlistitem, $loop, $charactercheckcount); $match = strpos($input, $checkstring); if ($match) { echo "Match found"; exit(); } ++$loop; } } Quote Link to comment https://forums.phpfreaks.com/topic/275086-match-a-chunk-of-text/#findComment-1416983 Share on other sites More sharing options...
rama schneider Posted May 1, 2013 Share Posted May 1, 2013 MySQL supports regexp searches - I do this on a regular basis from my php code. See http://dev.mysql.com/doc/refman/5.1/en/regexp.html Quote Link to comment https://forums.phpfreaks.com/topic/275086-match-a-chunk-of-text/#findComment-1427594 Share on other sites More sharing options...
requinix Posted May 1, 2013 Share Posted May 1, 2013 MySQL supports regexp searches - I do this on a regular basis from my php code. See http://dev.mysql.com/doc/refman/5.1/en/regexp.htmlAs I said in the very first reply to this thread, which you clearly didn't read, MySQL doesn't support the regex syntax you would need for this Quote Link to comment https://forums.phpfreaks.com/topic/275086-match-a-chunk-of-text/#findComment-1427666 Share on other sites More sharing options...
rama schneider Posted May 2, 2013 Share Posted May 2, 2013 (edited) Well ... before you get snooty about it - exactly what is it about finding "a match of 5 consecutive characters" that have come from an HTML form that MySQL's REGEXP can't do? // NOTE: make sure any data used to access a database is properly escaped - this example does not do this. $form_input = 'test1234'; $consecutive_chars = substr($form_input, [start], 5); $sql = "SELECT * FROM `table` WHERE `field` REGEXP '*" . $consecutive_chars . "*'"; The algorithm can get more complex as more fields or tables are searched, but overall it's a simple search as I understand it. Edited May 2, 2013 by rama schneider Quote Link to comment https://forums.phpfreaks.com/topic/275086-match-a-chunk-of-text/#findComment-1427715 Share on other sites More sharing options...
requinix Posted May 2, 2013 Share Posted May 2, 2013 Right. Now repeat that for every substring. SELECT * FROM table WHERE field REGEXP 'test1' OR field REGEXP 'est12' OR field REGEXP 'st123' OR field REGEXP 't1234'(And since all that does is check string contents a LIKE might be better.) Quote Link to comment https://forums.phpfreaks.com/topic/275086-match-a-chunk-of-text/#findComment-1427794 Share on other sites More sharing options...
rama schneider Posted May 3, 2013 Share Posted May 3, 2013 REGEXP (test1|est12| ....) - it has always worked for me. As you point out if one is going to check each possibility one at a time then LIKE would probably be quicker. But REGEXP would work well for what the original poster wants to do. The main point being that one can offload this simple type of search to the MySQL server which is very efficient at doing just this thing. Quote Link to comment https://forums.phpfreaks.com/topic/275086-match-a-chunk-of-text/#findComment-1427929 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.