[SOLVED] making preg_match() more efficient when batch used; how to start from prev spot?

physaux · November 3, 2009

Hey, Ok so here is my issue:

I have a opened file, roughly 80,000 characters long

I am using preg_match() to find a unique piece of text in this file, like 300 times.

Is there a more efficient way to do this?

**There is a patter that I have not taken advantage of but want to:

-Each next term that I search for appears (somewhere, non constant distance) AFTER the spot where the previous term was.

So, Is it possible to start the search at the previous found text place (somehow move the cursor there)?, so that I do not waste time and resources checking a spot where my text is not for sure, hundreds of times.

Thanks

MadTechie · November 3, 2009

You mean like match all ?

do you have an example?

physaux · November 3, 2009

Here is an example:

File contents:

ajkdkansdkUNIQ1382NAME=sally
idsdckdsjckUNIQ78a2NAME=bob
kdjalkdsklaUNIQ8912NAME=tom
osaijdoasksUNIQ8291NAME=charles
aksjdkaskssUNIQds89NAME=sandy
skdjsakjdskUNIQ8238NAME=rock
...

And I have an array of "accepted names"

accepted[1]="tom"
accepted[2]="rock"
...

Now, I am searching the text like so:

preg_match('~UNIQ([a-z0-9]+){1,4}'.$accepted[1].'~', $textfile, $match);

So, $match[1] would be 8912, and so on

The thing is, as you can see, i have LOTS of names, and a LONG name list.

I don't want to search through the whole list every time, because I know the names are in same order, only some are missing from the accepted. There is also lots of garbage text in the file, which I have now edited out. The program still takes like 40 seconds to finish. I know it could be faster

kinda get it now? What to u think?

Alex · November 3, 2009

You can create a loop and utilize the second to last parameter of file_get_contents() (and the last parameter if you know that the name will be within ~X characters) to only grab a specific part of the file..

Ex:

for($i = 0, $position = 0;$i < count($accepted);$i++)
{
$content = file_get_contents('somefile.txt', 0, null, $position);
preg_match('...', $accepted[$i], $matches, PREG_OFFSET_CAPTURE);
$position = $matches[1][1] + strlen($matches[1][0]);
}

Where $matches[1][1] would be the starting position of the match.

physaux · November 3, 2009

Ok cool, thanks.

I got another question, do you know how I could remove lines that do not contain a certain character?

I want to "process" my text file before I batch read it, and delete all the lines without the first required part of the preg_match().

So far, I have this:

<?php
$file=fopen($filename,"r");

while(!feof($file)){
$temparray = explode("	", fgets($file));
$hadrequired = false;
foreach($temparray as $result){
	if($result == $required){
		$hadrequired = true;
	}
}
if(!$hadrequired){
	//DELETE THIS LINE SOMEHOW
}
}
?>

I know i would have to probably have to change it to "rw" or what not, what do you thing?

EDIT: I also had a new idea. Maybe the reason it is so slow is because the first part of preg_match shows up in every entry. Is there any way for it to NOT search for the first part, since it is not uniqe, and only for the second part, then take the text minus 5 characters? That way should be much faster.

MadTechie · November 3, 2009

Why not instead of deleting the line,, just don't write it to the new cleaned up file

ie (untested)

<?php
$file=fopen($filename,"r");
$newFile = fopen('newfile.txt', 'w');
while(!feof($file)){
   $line = fgets($file);
   $temparray = explode("   ",$line);
   $hadrequired = false;
   foreach($temparray as $result){
      if($result == $required){
         fwrite($newFile, $line);
      }
   }
}
fclose($newFile);
fclose($file);
?>

physaux · November 3, 2009

Why not instead of deleting the line,, just don't write it to the new cleaned up file

ie (untested)

<?php
$file=fopen($filename,"r");
$newFile = fopen('newfile.txt', 'w');
while(!feof($file)){
   $line = fgets($file);
   $temparray = explode("   ",$line);
   $hadrequired = false;
   foreach($temparray as $result){
      if($result == $required){
         fwrite($newFile, $line);
      }
   }
}
fclose($newFile);
fclose($file);
?>

Hm Hm I like, thanks I will definetly use that.

But also, would you have any idea about better searching:

I think my search is inefficient because the first part of preg_matcha() is practically at every line, where as the ending of it is Unique.

So preg_match is going sniffing every line when it detects the first part, then doesn't find it because it doesn't have the second part.

Wouldn't it be better if I only searched for the second part, then got the lin number where this is. Then search this line using the prefix.

Anyone know how I could use this?

MadTechie · November 3, 2009

How large is large ?

can I see a copy (zipped or something)

physaux · November 3, 2009

How large is large ?

can I see a copy (zipped or something)

The File is 34mb.

Yes mb.

holy $hit.

MadTechie · November 3, 2009

Meah,, no biggie, can you upload to an FTP/Host or something

Or shorten it.. whatever is easiest

physaux · November 3, 2009

I'm gona PM you the details

physaux · November 3, 2009

Wow!

I filtered the file down to only include potential lines.

This took the size down from 30mb to 300kb,

Then i searched this new file, and everything went super fast.

Thanks for all the help guys!!

MadTechie · November 3, 2009

Well that's "more efficient"

Sign In

[SOLVED] making preg_match() more efficient when batch used; how to start from prev spot?

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Archived

Important Information