Wuhtzu Posted May 14, 2007 Share Posted May 14, 2007 Hey I would like some advice and ideas on how to perform a search without the use of a database and functionality such as MySQL's LIKE. Basically I have an array which contains filenames and I would like to find find entries in this array which "look alike" a given filename. Let me exemplify it: [b]Array containing file names:[/b] Array ( [0] => pictures-of-something_1.ext [1] => picturesofsomething2.ext [2] => myfile.ext [3] => WeIrDcAsE.eXt [4] => stupid file name.ext } Search and result: pictures_of_something.ext will find pictures-of-something_1.ext and picturesofsomething2.ext weirdcase.ext will find WeIrDcAsE.eXt stupid_file.ext will find stupid file name.ext myfile.gif will find myfile.ext So I would like to be able to find entries which differs in case, use of space _ -, numbering and extension. Any ideas on how to accomplish this / approach it? One idea would be to take the filename, e.g. pictures_of_something_1.ext, and replace underscores, numbering and .ext with a wildcard in a regex. But I would like to hear if you have some ideas regarding this task/problem Best regards Wuhtzu Quote Link to comment Share on other sites More sharing options...
Wuhtzu Posted May 15, 2007 Author Share Posted May 15, 2007 Hmm... maybe I choose the wrong board, I thought of posting in PHP Help, but since it's not concrete coding help it choose this. Thank to those who have stopped by so far. To those who will be stopping by in the future I can widen the type of advice I would like: If you know of any articles regarding search in general, explanation of specific search mechanisms. For example how MySQL's LIKE works or something like that. Thanks again Wuhtzu Quote Link to comment Share on other sites More sharing options...
redbullmarky Posted May 15, 2007 Share Posted May 15, 2007 as having the 'ext' extension will also find 'gif', it's probably fair to say the extension is negligable. let's say that $search contains what you're looking for, and the array $haystack contains the data. first we'd get rid of the extension, then we'd make the $search and $haystack elements "common" - ie, take out all the characters that don't matter and put it all in the same case: <?php $search = 'pictures_of_something.ext'; $haystack = array ( 'pictures-of-something_1.ext', 'picturesofsomething2.ext', 'myfile.ext', 'WeIrDcAsE.eXt', 'stupid file name.ext' ); // simple function to convert filename to lowercase alpha, without extension function getCommon($file) { // get filename part in lowercase list($filename, $ext) = explode('.', strtolower($file)); // remove unwanted stuff $filename = preg_replace('/[^a-z]/', '', $filename); return $filename; } $needle = getCommon($search); $results = array(); // our results will go here // now check each array element foreach($haystack as $item) { if ($needle == getCommon($item)) { $results[] = $item; } } // output results! echo '<pre>'; print_r($results); echo '</pre>'; ?> as it's used a fair few times, i've put the "common" converter into a little function called getCommon. what that does is removes the extension (as we don't really need it for sake of search), leaving us with a lowercase filename. then we just strip out all the characters from it we don't like - so if you search for "pictures_of_something.ext", then it gets converted into "picturesofsomething". Running the first two elements of the $haystack array will also return "picturesofsomething" and voila - 2 matches. Hope that helps Quote Link to comment Share on other sites More sharing options...
Wuhtzu Posted May 15, 2007 Author Share Posted May 15, 2007 You are a genius man. I appreciate the code but the following was more than enough to paint the picture. let's say that $search contains what you're looking for, and the array $haystack contains the data. first we'd get rid of the extension, then we'd make the $search and $haystack elements "common" - ie, take out all the characters that don't matter and put it all in the same case: I think it's a good solution to the problem! Thanks alot Wuhtzu Quote Link to comment Share on other sites More sharing options...
johnrcornell Posted May 19, 2007 Share Posted May 19, 2007 If you're looking to search through the contents of an array with permutations of natural language or anything like that you'll have to write some regex. It looks intimidating at first but it's fun once you figure it out. Sorry for the vague answer, just trying to provide a next-step jumping off point if you find out that you mean something more ambitious by "search." Quote Link to comment Share on other sites More sharing options...
Wuhtzu Posted May 20, 2007 Author Share Posted May 20, 2007 I didn't mean anything more ambitious by search. I just wanted to be able to pick out lookalikes and you are right, it's some regex I need for the job Quote Link to comment Share on other sites More sharing options...
448191 Posted May 21, 2007 Share Posted May 21, 2007 similar_text() Quote Link to comment Share on other sites More sharing options...
Wuhtzu Posted May 21, 2007 Author Share Posted May 21, 2007 How nice of you to spoil my great regex effort Well similar_text() seems to do the job too - thanks 448191 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.