Jump to content

preg_match( help needed


lokie538

Recommended Posts

Hi,

 

Im trying to extract data from a website. But for some reason its not working.

 

This is the code on the website:

 			  			<tr><td width="90">Suburb(s):</td><td>
		  			
		  			



                            
                            COOROY, 
                            


                            
                            COOROY MOUNTAIN, 
                            


                            
                            LAKE MACDONALD, 
                            


                            
                            TINBEERWAH
                            


                        </td></tr> 

 

and this is the code im trying to use to get it:

preg_match('~<tr><td width="90">Suburb(s):</td><td>(.*?[^<])</td></tr>~i', $file, $yourpost3);

print $yourpost3[1]; 

/// $file just uses a saved html file

 

The bit im unsure of is (.*?[^<]) I don't know what this means?

 

It returns this error Notice: Undefined offset: 1 in C:\wamp\www\get.php on line 15

Link to comment
https://forums.phpfreaks.com/topic/141183-preg_match-help-needed/
Share on other sites

To elaborate a bit more on the actual meaning of that line:

 

.*? // match anything . (except a newline) zero or more times *, but make it lazy ?, so first check to see if the next character is anything but a < [^<], and if it is not, include the current character into the match, then move forward to the next character and retest.. otherwise, if it is a <, stop. The one problem I do see is this in your pattern: Suburb(s)... inside a pattern, brackets are considered the formation of grouping elements... so you need to escape those...

 

I suspect this is what you are looking for?

$str = <<<DATA
	  			<tr><td width="90">Suburb(s):</td><td>
		  			
		  			



                            
                            COOROY, 
                            


                            
                            COOROY MOUNTAIN, 
                            


                            
                            LAKE MACDONALD, 
                            


                            
                            TINBEERWAH
                            


                        </td></tr>
DATA;
preg_match('#<tr><td width="90">Suburb\(s\):</td><td>([^<]+)#s', $str, $match);
echo '<pre>'.print_r($match[1], true);

 

output:

  			
		  			



                            
                            COOROY, 
                            


                            
                            COOROY MOUNTAIN, 
                            


                            
                            LAKE MACDONALD, 
                            


                            
                            TINBEERWAH
                            


                        

You can have a look at the regex resources page to learn more about regex.

Yep that works a treat thanks mate!!!

 

Now I just have to find how to remove all the white spaces and line breaks so it is just a string. For instance the output should be "cooroy, cooroy mountain, lake mcdonald, tinbeerwah"

 

Ive been looking at this http://www.gskinner.com/RegExr/ trying to understand more hehe

 

Thanks for your help!!

 

Edit: this kinda works to remove the white spaces!

 $apples = str_replace(" ", "", $match[1]);
echo '<pre>'.print_r($match[1], true);

echo $apples;

Or if you wanted to break the remaining display into their own separate entries, you could also do this:

 

$arr = preg_split('#(?:\s{2,}|, )#', $match[1], -1, PREG_SPLIT_NO_EMPTY);
echo '<pre>'.print_r($arr, true);

 

Output:

Array
(
    [0] => COOROY
    [1] => COOROY MOUNTAIN
    [2] => LAKE MACDONALD
    [3] => TINBEERWAH
)

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.