Nodral Posted August 18, 2011 Share Posted August 18, 2011 Hi All I'm absolutely cr@p at regex and to be quite honest I'm also being a bit lazy here as I'm on a short timescale and haven't got hours to read through loads of tutorials at the moment. (I will be doing soon though coz it's not fair on you guys out there) I have a huge text file which I need to search through and find the statement REGUK01. This will only ever appear once in the file. before this is 2 new lines and then a percentage. I need to be able to pull this particular percentage out. eg, it's 69.1% this time, however this changes on a daily basis and I receive a new text file every day. Any thoughts / ideas / help? Quote Link to comment Share on other sites More sharing options...
Psycho Posted August 18, 2011 Share Posted August 18, 2011 If you could provide a couple of examples of the text that will be parsed it would be helpful. Is the percentage the only thing on the next line or is it embedded in other text. Is the percentage always in the format dd.d% or can it be a single digit (or no digits) before the decimal? Is there always 1 digit after the decimal? Can there be a percentage on the same line as "REGUK01"? Quote Link to comment Share on other sites More sharing options...
Psycho Posted August 18, 2011 Share Posted August 18, 2011 Without more info, I can't be certain this will fulfill your needs, but this might work for you: preg_match("#REGUK01.*?(\d{1,2}(\.\d+)?%)#s", $text, $match); $percent = $match[1]; Notes: 1. It finds the first "percentage" that follows after "REGUK01". So if you have a percentage on the same line it will find that one instead of the one on the next line. For that matter if the first match if after the next line it will find that as well. 2. It will match a "percentage" that is in any of the following formats: 1% 12% 1.2% 1.23% (or any number of digits following the decimal) 12.3% 12.34% (or any number of digits following the decimal) There must be one or two digits at the beginning. The decimal is optional and when it exist there must be one or more digits that follow after it. Quote Link to comment Share on other sites More sharing options...
xyph Posted August 18, 2011 Share Posted August 18, 2011 I think the percentage he wants is BEFORE the string REGUK01. Your best bet is to post the text file. Quote Link to comment Share on other sites More sharing options...
Nodral Posted August 18, 2011 Author Share Posted August 18, 2011 Here is an excerpt from the textfile showing the bit I want. Month to Date Summary UK From: 01/08/2011 00:00 To: 18/08/2011 00:00 Responses Overall Satisfaction Rent Next Time Recommend GB UK 4344 8.6 8.6 8.5 69.1% <--------------------THIS IS THE FIGURE I NEED TO PASS TO MY SCRIPT AND CHANGES DAY BY DAY REGUK01 <-----------------THIS ONLY EVER APPEARS HERE London 611 8.5 8.5 8.4 65.7% TERUK11 Heathrow 253 8.4 8.4 8.2 61.9% LHRT01 London 252 8.4 8.4 8.2 61.8% LHRT10 Heathrow 1 10.0 10.0 10.0 100.0% TERUK12 Central London Territory 200 8.7 8.7 8.6 72.5% Quote Link to comment Share on other sites More sharing options...
Psycho Posted August 18, 2011 Share Posted August 18, 2011 I don't know of a good method to check for number of line breaks since they can be different between OSes and such. Your example above has "REGUK01" after the very first percent. I could provide a regex to find the very first percent in the file, but I have a suspicion that might not always be correct. Someone might have a regex solution for you, but I'm not sure how to work it out. The only solution I can come up with would be to iterate through each line. $lines = file('filename.txt'); foreach($lines as $index => $line) { if(strpos($line, 'REGUK01')!==false) { $percent = trim($lines[$index-2]); break; } } echo "Percent: {$percent}"; Quote Link to comment Share on other sites More sharing options...
xyph Posted August 19, 2011 Share Posted August 19, 2011 <?php $pattern = '/([0-9]{1,3}(?:\.[0-9]+){0,1})%\s+REGUK01/'; $subject = getData(); preg_match($pattern, $subject, $matches); print_r( $matches ); function getData() { return <<<HEREDOC Month to Date Summary UK From: 01/08/2011 00:00 To: 18/08/2011 00:00 Responses Overall Satisfaction Rent Next Time Recommend GB UK 4344 8.6 8.6 8.5 69.1% REGUK01 London 611 8.5 8.5 8.4 65.7% TERUK11 Heathrow 253 8.4 8.4 8.2 61.9% LHRT01 London 252 8.4 8.4 8.2 61.8% LHRT10 Heathrow 1 10.0 10.0 10.0 100.0% TERUK12 Central London Territory 200 8.7 8.7 8.6 72.5% HEREDOC; } ?> hope that helps in english ([0-9]{1,3}(\.[0-9]+){0,1})%\s+REGUK01 Match the regular expression below and capture its match into backreference number 1 «([0-9]{1,3}(\.[0-9]+){0,1})» Match a single character in the range between “0” and “9” «[0-9]{1,3}» Between one and 3 times, as many times as possible, giving back as needed (greedy) «{1,3}» Match the regular expression below and capture its match into backreference number 2 «(\.[0-9]+){0,1}» Between zero and one times, as many times as possible, giving back as needed (greedy) «{0,1}» Match the character “.” literally «\.» Match a single character in the range between “0” and “9” «[0-9]+» Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+» Match the character “%” literally «%» Match a single character that is a “whitespace character” (spaces, tabs, and line breaks) «\s+» Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+» Match the characters “REGUK01” literally «REGUK01» Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.