Getting the info I need?

Nodral · August 18, 2011

Hi All

I'm absolutely cr@p at regex and to be quite honest I'm also being a bit lazy here as I'm on a short timescale and haven't got hours to read through loads of tutorials at the moment. (I will be doing soon though coz it's not fair on you guys out there)

I have a huge text file which I need to search through and find the statement REGUK01. This will only ever appear once in the file. before this is 2 new lines and then a percentage. I need to be able to pull this particular percentage out. eg, it's 69.1% this time, however this changes on a daily basis and I receive a new text file every day.

Any thoughts / ideas / help?

Psycho · August 18, 2011

If you could provide a couple of examples of the text that will be parsed it would be helpful. Is the percentage the only thing on the next line or is it embedded in other text. Is the percentage always in the format dd.d% or can it be a single digit (or no digits) before the decimal? Is there always 1 digit after the decimal? Can there be a percentage on the same line as "REGUK01"?

Psycho · August 18, 2011

Without more info, I can't be certain this will fulfill your needs, but this might work for you:

preg_match("#REGUK01.*?(\d{1,2}(\.\d+)?%)#s", $text, $match);
$percent = $match[1];

Notes:

1. It finds the first "percentage" that follows after "REGUK01". So if you have a percentage on the same line it will find that one instead of the one on the next line. For that matter if the first match if after the next line it will find that as well.

2. It will match a "percentage" that is in any of the following formats:

1%

12%

1.2%

1.23% (or any number of digits following the decimal)

12.3%

12.34% (or any number of digits following the decimal)

There must be one or two digits at the beginning. The decimal is optional and when it exist there must be one or more digits that follow after it.

xyph · August 18, 2011

I think the percentage he wants is BEFORE the string REGUK01.

Your best bet is to post the text file.

Nodral · August 18, 2011

Here is an excerpt from the textfile showing the bit I want.

   
      Month to Date Summary UK
      From: 01/08/2011 00:00
      To: 18/08/2011 00:00
     



       
          Responses
     Overall
      Satisfaction
     Rent Next
      Time
     Recommend
          
      GB
     UK
     4344
     8.6
     8.6
     8.5
     69.1% <--------------------THIS IS THE FIGURE I NEED TO PASS TO MY SCRIPT AND CHANGES DAY BY DAY
     
      REGUK01 <-----------------THIS ONLY EVER APPEARS HERE
     London
     611
     8.5
     8.5
     8.4
     65.7%
     
      TERUK11
     Heathrow 
     253
     8.4
     8.4
     8.2
     61.9%
     
      LHRT01
     London 
     252
     8.4
     8.4
     8.2
     61.8%
     
      LHRT10
     Heathrow 
     1
     10.0
     10.0
     10.0
     100.0%
     
      TERUK12
     Central London Territory
     200
     8.7
     8.7
     8.6
     72.5%

Psycho · August 18, 2011

I don't know of a good method to check for number of line breaks since they can be different between OSes and such. Your example above has "REGUK01" after the very first percent. I could provide a regex to find the very first percent in the file, but I have a suspicion that might not always be correct. Someone might have a regex solution for you, but I'm not sure how to work it out. The only solution I can come up with would be to iterate through each line.

$lines = file('filename.txt');
foreach($lines as $index => $line)
{
    if(strpos($line, 'REGUK01')!==false)
    {
        $percent = trim($lines[$index-2]);
        break;
    }
}
echo "Percent: {$percent}";

xyph · August 19, 2011

<?php 

$pattern = '/([0-9]{1,3}(?:\.[0-9]+){0,1})%\s+REGUK01/';

$subject = getData();

preg_match($pattern, $subject, $matches);

print_r( $matches );

function getData() {
return <<<HEREDOC
   
      Month to Date Summary UK
      From: 01/08/2011 00:00
      To: 18/08/2011 00:00
     



       
          Responses
     Overall
      Satisfaction
     Rent Next
      Time
     Recommend
          
      GB
     UK
     4344
     8.6
     8.6
     8.5
     69.1%
     
      REGUK01
     London
     611
     8.5
     8.5
     8.4
     65.7%
     
      TERUK11
     Heathrow 
     253
     8.4
     8.4
     8.2
     61.9%
     
      LHRT01
     London 
     252
     8.4
     8.4
     8.2
     61.8%
     
      LHRT10
     Heathrow 
     1
     10.0
     10.0
     10.0
     100.0%
     
      TERUK12
     Central London Territory
     200
     8.7
     8.7
     8.6
     72.5%
HEREDOC;
}

?>

hope that helps

in english


([0-9]{1,3}(\.[0-9]+){0,1})%\s+REGUK01

Match the regular expression below and capture its match into backreference number 1 «([0-9]{1,3}(\.[0-9]+){0,1})»
   Match a single character in the range between “0” and “9” «[0-9]{1,3}»
      Between one and 3 times, as many times as possible, giving back as needed (greedy) «{1,3}»
   Match the regular expression below and capture its match into backreference number 2 «(\.[0-9]+){0,1}»
      Between zero and one times, as many times as possible, giving back as needed (greedy) «{0,1}»
      Match the character “.” literally «\.»
      Match a single character in the range between “0” and “9” «[0-9]+»
         Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the character “%” literally «%»
Match a single character that is a “whitespace character” (spaces, tabs, and line breaks) «\s+»
   Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the characters “REGUK01” literally «REGUK01»

Sign In

Getting the info I need?

Recommended Posts

Nodral

Link to comment

Share on other sites

Psycho

Link to comment

Share on other sites

Psycho

Link to comment

Share on other sites

xyph

Link to comment

Share on other sites

Nodral

Link to comment

Share on other sites

Psycho

Link to comment

Share on other sites

xyph

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information