lokie538 Posted January 17, 2009 Share Posted January 17, 2009 Hi, Im trying to extract data from a website. But for some reason its not working. This is the code on the website: <tr><td width="90">Suburb(s):</td><td> COOROY, COOROY MOUNTAIN, LAKE MACDONALD, TINBEERWAH </td></tr> and this is the code im trying to use to get it: preg_match('~<tr><td width="90">Suburb(s):</td><td>(.*?[^<])</td></tr>~i', $file, $yourpost3); print $yourpost3[1]; /// $file just uses a saved html file The bit im unsure of is (.*?[^<]) I don't know what this means? It returns this error Notice: Undefined offset: 1 in C:\wamp\www\get.php on line 15 Quote Link to comment Share on other sites More sharing options...
.josh Posted January 17, 2009 Share Posted January 17, 2009 It means the regex failed so there's no $yourpost3[1] defined. Quote Link to comment Share on other sites More sharing options...
lokie538 Posted January 17, 2009 Author Share Posted January 17, 2009 So have you got any ideas why it wouldnt work? and failed? Quote Link to comment Share on other sites More sharing options...
nrg_alpha Posted January 17, 2009 Share Posted January 17, 2009 To elaborate a bit more on the actual meaning of that line: .*? // match anything . (except a newline) zero or more times *, but make it lazy ?, so first check to see if the next character is anything but a < [^<], and if it is not, include the current character into the match, then move forward to the next character and retest.. otherwise, if it is a <, stop. The one problem I do see is this in your pattern: Suburb(s)... inside a pattern, brackets are considered the formation of grouping elements... so you need to escape those... I suspect this is what you are looking for? $str = <<<DATA <tr><td width="90">Suburb(s):</td><td> COOROY, COOROY MOUNTAIN, LAKE MACDONALD, TINBEERWAH </td></tr> DATA; preg_match('#<tr><td width="90">Suburb\(s\):</td><td>([^<]+)#s', $str, $match); echo '<pre>'.print_r($match[1], true); output: COOROY, COOROY MOUNTAIN, LAKE MACDONALD, TINBEERWAH You can have a look at the regex resources page to learn more about regex. Quote Link to comment Share on other sites More sharing options...
lokie538 Posted January 17, 2009 Author Share Posted January 17, 2009 Yep that works a treat thanks mate!!! Now I just have to find how to remove all the white spaces and line breaks so it is just a string. For instance the output should be "cooroy, cooroy mountain, lake mcdonald, tinbeerwah" Ive been looking at this http://www.gskinner.com/RegExr/ trying to understand more hehe Thanks for your help!! Edit: this kinda works to remove the white spaces! $apples = str_replace(" ", "", $match[1]); echo '<pre>'.print_r($match[1], true); echo $apples; Quote Link to comment Share on other sites More sharing options...
nrg_alpha Posted January 17, 2009 Share Posted January 17, 2009 Or if you wanted to break the remaining display into their own separate entries, you could also do this: $arr = preg_split('#(?:\s{2,}|, )#', $match[1], -1, PREG_SPLIT_NO_EMPTY); echo '<pre>'.print_r($arr, true); Output: Array ( [0] => COOROY [1] => COOROY MOUNTAIN [2] => LAKE MACDONALD [3] => TINBEERWAH ) Quote Link to comment Share on other sites More sharing options...
nrg_alpha Posted January 17, 2009 Share Posted January 17, 2009 Don't forget to flag this as TOPIC SOLVED. Quote Link to comment Share on other sites More sharing options...
lokie538 Posted January 17, 2009 Author Share Posted January 17, 2009 Thanks for the help your a legend!! A legend of the internet!! Quote Link to comment Share on other sites More sharing options...
nrg_alpha Posted January 17, 2009 Share Posted January 17, 2009 Thanks for the help your a legend!! A legend of the internet!! hehe.. not quite... I'm stilll a peon. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.