Rottingham Posted December 27, 2007 Share Posted December 27, 2007 Ok, This is my first attempt at using regex.. I've been coding for years and can't believe I survived this long, but the day is come that I must accept it... So I'm having a problem... I am trying to interpret 6000 lines of the following style "792036": { d:"Holmes Heavy Duty Slide Bolt", p:"9.11", q:"1" }, My goal is to extract the first 6 numbers, everything withing the quotes after d: and everthing in the quotes after the p: In this line, I'm hoping to get an array like such array[0] = 792036 array[1] = Holmes Heavy Duty Slide Bolt array[2] = 9.11 My regex statement of: ereg("\"[1-9]{6}\"", $line, $regs); gives me some encouraging results but frustating also... Warning: Invalid argument supplied for foreach() in /home/macactio/public_html/topsoftweb/test.php on line 48 "101303": { d:"4015 NS 13 Key Blank", p:"0.00", q:"0" }, Warning: Invalid argument supplied for foreach() in /home/macactio/public_html/topsoftweb/test.php on line 48 "270625": { d:"Dor-O-Matic SCREW.1022 Dog Screws (25 Pack)", p:"64.00", q:"0" }, "271766" "271766": { d:"BRIG 691259 Key Blank, High Security", p:"45.00", q:"0" }, You will notice that my first line gets a warning, the second line gets a warning, but then the third line actually gets the number, albeit the " signs too. I need to remove the \"...\" from my regex expression. I can't figure out a) why it works on some of the lines and not others, and b) how will I get the rest of the parts I need? I think I have an idea but I'm going to show my ever simple function to see if someone can tell me why it works some times, and not on every line... foreach($file_lines as $line) { unset($regs); // Interpret Line // "792036": { d:"Holmes Heavy Duty Slide Bolt", p:"9.11", q:"1" }, ereg("\"[1-9]{6}\"", $line, $regs); foreach($regs as $reg) echo $reg." "; echo $line; echo "<br>"; } Quote Link to comment Share on other sites More sharing options...
dsaba Posted December 27, 2007 Share Posted December 27, 2007 Here's a preg (PCRE) solution, I hear its faster than ereg (POSIX) you can use preg_match_all() to grab all the matches in 1 parse ~"([0-9]{6})": { d:"([^"]+)", p:"([0-9]\.[0-9]{2})", q:"([0-9])" }~ tested: http://nancywalshee03.freehostia.com/regextester/regex_tester.php?seeSaved=yyvefddg I also noticed in your error report above one of you lines of data does not follow the format you specified: "271766" "271766": { d:"BRIG 691259 Key Blank, High Security", p:"45.00", q:"0" } Quote Link to comment Share on other sites More sharing options...
Rottingham Posted December 27, 2007 Author Share Posted December 27, 2007 Thanks man, looks like a little more success... I changed my code to the following foreach($file_lines as $line) { // Interpret Line // "792036": { d:"Holmes Heavy Duty Slide Bolt", p:"9.11", q:"1" }, // Places three space separated words into $regs[1], $regs[2] and $regs[3]. //ereg("[0-9]{6}", $line, $regs); preg_match_all('~"([0-9]{6})": { d:"([^"]+)", p:"([0-9]\.[0-9]{2})", q:"([0-9])" }~', $line, $regs, PREG_SET_ORDER); echo $regs[0][1].' '; echo $line; echo "<br>"; } You can see the results here: http://macaction.org/topsoftweb/test.php Unfortunately, it still works on some lines but not the others. Quote Link to comment Share on other sites More sharing options...
dsaba Posted December 27, 2007 Share Posted December 27, 2007 you need to show your parse/input data you're working with, the regex I supplied worked fine with what you showed, I cannot see what your problem if I don't see the input data instead of going though each line of the input data: foreach($file_lines as $line) read it: <?php $data = file_get_contents('whatever.txt'); $pat = '~"([0-9]{6})": { d:"([^"]+)", p:"([0-9]\.[0-9]{2})", q:"([0-9])" }~'; preg_match_all($pat, $data, $out); foreach ($out[0] as $k => $fullMatch) { $num = $out[1][$k]; $d = $out[2][$k]; $p = $out[3][$k]; $q = $out[4][$k]; echo "$num<br>$d<br>$p<br>$q<br><br>"; } ?> The matches array in my website is verbatim the same array that you will see spit out in the $out array from preg_match_all() with no special flags set. Quote Link to comment Share on other sites More sharing options...
Rottingham Posted December 27, 2007 Author Share Posted December 27, 2007 Hmm... You can view my results again at http://macaction.org/topsoftweb/test.php You can view the source file at http://macaction.org/topsoftweb/parts_prices.txt There are 6997 lines in that file of those parts, and I'm only getting 1400 results. I'm not sure what the deal is. Quote Link to comment Share on other sites More sharing options...
dsaba Posted December 27, 2007 Share Posted December 27, 2007 this works: ~"([0-9]*)": { d:"([^"]*)", p:"([0-9]{1,}\.[0-9]{2})", q:"([0-9]{1,})" }~ it was because you had varying formats in your data change + to * because the d:.. could be blank changed p to accept 1 or more digits {1,} changed the others accordingly.. <a href="http://www.regular-expressions.info/reference.html">Here some a reference to simple regex symbols/terms</a> Quote Link to comment Share on other sites More sharing options...
Rottingham Posted December 27, 2007 Author Share Posted December 27, 2007 Thanks a ton! That did the trick. I really appreciate your help. Quote Link to comment Share on other sites More sharing options...
dsaba Posted December 27, 2007 Share Posted December 27, 2007 *edited last post glad to help.. return the favor on the forums.. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.