physaux Posted November 3, 2009 Share Posted November 3, 2009 Hey, Ok so here is my issue: I have a opened file, roughly 80,000 characters long I am using preg_match() to find a unique piece of text in this file, like 300 times. Is there a more efficient way to do this? **There is a patter that I have not taken advantage of but want to: -Each next term that I search for appears (somewhere, non constant distance) AFTER the spot where the previous term was. So, Is it possible to start the search at the previous found text place (somehow move the cursor there)?, so that I do not waste time and resources checking a spot where my text is not for sure, hundreds of times. Thanks Quote Link to comment https://forums.phpfreaks.com/topic/180033-solved-making-preg_match-more-efficient-when-batch-used-how-to-start-from-prev-spot/ Share on other sites More sharing options...
MadTechie Posted November 3, 2009 Share Posted November 3, 2009 You mean like match all ? do you have an example? Quote Link to comment https://forums.phpfreaks.com/topic/180033-solved-making-preg_match-more-efficient-when-batch-used-how-to-start-from-prev-spot/#findComment-949800 Share on other sites More sharing options...
physaux Posted November 3, 2009 Author Share Posted November 3, 2009 Here is an example: File contents: ajkdkansdkUNIQ1382NAME=sally idsdckdsjckUNIQ78a2NAME=bob kdjalkdsklaUNIQ8912NAME=tom osaijdoasksUNIQ8291NAME=charles aksjdkaskssUNIQds89NAME=sandy skdjsakjdskUNIQ8238NAME=rock ... And I have an array of "accepted names" accepted[1]="tom" accepted[2]="rock" ... Now, I am searching the text like so: preg_match('~UNIQ([a-z0-9]+){1,4}'.$accepted[1].'~', $textfile, $match); So, $match[1] would be 8912, and so on The thing is, as you can see, i have LOTS of names, and a LONG name list. I don't want to search through the whole list every time, because I know the names are in same order, only some are missing from the accepted. There is also lots of garbage text in the file, which I have now edited out. The program still takes like 40 seconds to finish. I know it could be faster kinda get it now? What to u think? Quote Link to comment https://forums.phpfreaks.com/topic/180033-solved-making-preg_match-more-efficient-when-batch-used-how-to-start-from-prev-spot/#findComment-949803 Share on other sites More sharing options...
Alex Posted November 3, 2009 Share Posted November 3, 2009 You can create a loop and utilize the second to last parameter of file_get_contents() (and the last parameter if you know that the name will be within ~X characters) to only grab a specific part of the file.. Ex: for($i = 0, $position = 0;$i < count($accepted);$i++) { $content = file_get_contents('somefile.txt', 0, null, $position); preg_match('...', $accepted[$i], $matches, PREG_OFFSET_CAPTURE); $position = $matches[1][1] + strlen($matches[1][0]); } Where $matches[1][1] would be the starting position of the match. Quote Link to comment https://forums.phpfreaks.com/topic/180033-solved-making-preg_match-more-efficient-when-batch-used-how-to-start-from-prev-spot/#findComment-949808 Share on other sites More sharing options...
physaux Posted November 3, 2009 Author Share Posted November 3, 2009 Ok cool, thanks. I got another question, do you know how I could remove lines that do not contain a certain character? I want to "process" my text file before I batch read it, and delete all the lines without the first required part of the preg_match(). So far, I have this: <?php $file=fopen($filename,"r"); while(!feof($file)){ $temparray = explode(" ", fgets($file)); $hadrequired = false; foreach($temparray as $result){ if($result == $required){ $hadrequired = true; } } if(!$hadrequired){ //DELETE THIS LINE SOMEHOW } } ?> I know i would have to probably have to change it to "rw" or what not, what do you thing? EDIT: I also had a new idea. Maybe the reason it is so slow is because the first part of preg_match shows up in every entry. Is there any way for it to NOT search for the first part, since it is not uniqe, and only for the second part, then take the text minus 5 characters? That way should be much faster. Quote Link to comment https://forums.phpfreaks.com/topic/180033-solved-making-preg_match-more-efficient-when-batch-used-how-to-start-from-prev-spot/#findComment-949815 Share on other sites More sharing options...
MadTechie Posted November 3, 2009 Share Posted November 3, 2009 Why not instead of deleting the line,, just don't write it to the new cleaned up file ie (untested) <?php $file=fopen($filename,"r"); $newFile = fopen('newfile.txt', 'w'); while(!feof($file)){ $line = fgets($file); $temparray = explode(" ",$line); $hadrequired = false; foreach($temparray as $result){ if($result == $required){ fwrite($newFile, $line); } } } fclose($newFile); fclose($file); ?> Quote Link to comment https://forums.phpfreaks.com/topic/180033-solved-making-preg_match-more-efficient-when-batch-used-how-to-start-from-prev-spot/#findComment-949821 Share on other sites More sharing options...
physaux Posted November 3, 2009 Author Share Posted November 3, 2009 Why not instead of deleting the line,, just don't write it to the new cleaned up file ie (untested) <?php $file=fopen($filename,"r"); $newFile = fopen('newfile.txt', 'w'); while(!feof($file)){ $line = fgets($file); $temparray = explode(" ",$line); $hadrequired = false; foreach($temparray as $result){ if($result == $required){ fwrite($newFile, $line); } } } fclose($newFile); fclose($file); ?> Hm Hm I like, thanks I will definetly use that. But also, would you have any idea about better searching: I think my search is inefficient because the first part of preg_matcha() is practically at every line, where as the ending of it is Unique. So preg_match is going sniffing every line when it detects the first part, then doesn't find it because it doesn't have the second part. Wouldn't it be better if I only searched for the second part, then got the lin number where this is. Then search this line using the prefix. Anyone know how I could use this? Quote Link to comment https://forums.phpfreaks.com/topic/180033-solved-making-preg_match-more-efficient-when-batch-used-how-to-start-from-prev-spot/#findComment-949826 Share on other sites More sharing options...
MadTechie Posted November 3, 2009 Share Posted November 3, 2009 How large is large ? can I see a copy (zipped or something) Quote Link to comment https://forums.phpfreaks.com/topic/180033-solved-making-preg_match-more-efficient-when-batch-used-how-to-start-from-prev-spot/#findComment-949829 Share on other sites More sharing options...
physaux Posted November 3, 2009 Author Share Posted November 3, 2009 How large is large ? can I see a copy (zipped or something) The File is 34mb. Yes mb. holy $hit. Quote Link to comment https://forums.phpfreaks.com/topic/180033-solved-making-preg_match-more-efficient-when-batch-used-how-to-start-from-prev-spot/#findComment-949831 Share on other sites More sharing options...
MadTechie Posted November 3, 2009 Share Posted November 3, 2009 Meah,, no biggie, can you upload to an FTP/Host or something Or shorten it.. whatever is easiest Quote Link to comment https://forums.phpfreaks.com/topic/180033-solved-making-preg_match-more-efficient-when-batch-used-how-to-start-from-prev-spot/#findComment-949834 Share on other sites More sharing options...
physaux Posted November 3, 2009 Author Share Posted November 3, 2009 I'm gona PM you the details Quote Link to comment https://forums.phpfreaks.com/topic/180033-solved-making-preg_match-more-efficient-when-batch-used-how-to-start-from-prev-spot/#findComment-949837 Share on other sites More sharing options...
physaux Posted November 3, 2009 Author Share Posted November 3, 2009 Wow! I filtered the file down to only include potential lines. This took the size down from 30mb to 300kb, Then i searched this new file, and everything went super fast. Thanks for all the help guys!! Quote Link to comment https://forums.phpfreaks.com/topic/180033-solved-making-preg_match-more-efficient-when-batch-used-how-to-start-from-prev-spot/#findComment-949846 Share on other sites More sharing options...
MadTechie Posted November 3, 2009 Share Posted November 3, 2009 Well that's "more efficient" Quote Link to comment https://forums.phpfreaks.com/topic/180033-solved-making-preg_match-more-efficient-when-batch-used-how-to-start-from-prev-spot/#findComment-949858 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.