Jump to content

Recommended Posts

Hi,

 

Im trying to extract data from a website. But for some reason its not working.

 

This is the code on the website:

 			  			<tr><td width="90">Suburb(s):</td><td>
		  			
		  			



                            
                            COOROY, 
                            


                            
                            COOROY MOUNTAIN, 
                            


                            
                            LAKE MACDONALD, 
                            


                            
                            TINBEERWAH
                            


                        </td></tr> 

 

and this is the code im trying to use to get it:

preg_match('~<tr><td width="90">Suburb(s):</td><td>(.*?[^<])</td></tr>~i', $file, $yourpost3);

print $yourpost3[1]; 

/// $file just uses a saved html file

 

The bit im unsure of is (.*?[^<]) I don't know what this means?

 

It returns this error Notice: Undefined offset: 1 in C:\wamp\www\get.php on line 15

Link to comment
https://forums.phpfreaks.com/topic/141183-preg_match-help-needed/
Share on other sites

To elaborate a bit more on the actual meaning of that line:

 

.*? // match anything . (except a newline) zero or more times *, but make it lazy ?, so first check to see if the next character is anything but a < [^<], and if it is not, include the current character into the match, then move forward to the next character and retest.. otherwise, if it is a <, stop. The one problem I do see is this in your pattern: Suburb(s)... inside a pattern, brackets are considered the formation of grouping elements... so you need to escape those...

 

I suspect this is what you are looking for?

$str = <<<DATA
	  			<tr><td width="90">Suburb(s):</td><td>
		  			
		  			



                            
                            COOROY, 
                            


                            
                            COOROY MOUNTAIN, 
                            


                            
                            LAKE MACDONALD, 
                            


                            
                            TINBEERWAH
                            


                        </td></tr>
DATA;
preg_match('#<tr><td width="90">Suburb\(s\):</td><td>([^<]+)#s', $str, $match);
echo '<pre>'.print_r($match[1], true);

 

output:

  			
		  			



                            
                            COOROY, 
                            


                            
                            COOROY MOUNTAIN, 
                            


                            
                            LAKE MACDONALD, 
                            


                            
                            TINBEERWAH
                            


                        

You can have a look at the regex resources page to learn more about regex.

Yep that works a treat thanks mate!!!

 

Now I just have to find how to remove all the white spaces and line breaks so it is just a string. For instance the output should be "cooroy, cooroy mountain, lake mcdonald, tinbeerwah"

 

Ive been looking at this http://www.gskinner.com/RegExr/ trying to understand more hehe

 

Thanks for your help!!

 

Edit: this kinda works to remove the white spaces!

 $apples = str_replace(" ", "", $match[1]);
echo '<pre>'.print_r($match[1], true);

echo $apples;

Or if you wanted to break the remaining display into their own separate entries, you could also do this:

 

$arr = preg_split('#(?:\s{2,}|, )#', $match[1], -1, PREG_SPLIT_NO_EMPTY);
echo '<pre>'.print_r($arr, true);

 

Output:

Array
(
    [0] => COOROY
    [1] => COOROY MOUNTAIN
    [2] => LAKE MACDONALD
    [3] => TINBEERWAH
)

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.