[SOLVED] Take data from another webpage

Carnacior · December 18, 2008

<?
error_reporting(E_ALL);
$file = file_get_contents('http://tv.bascalie.ro/program~data-20-decembrie-2008~post-pro-tv.html');
preg_match_all("/<tr style=\"background-color:#cccccc;\">(.*)<\/table><\/td><\/tr><\/table><\/td><\/tr><\/table>/", $file, $matches, PREG_SET_ORDER);
echo $matches[0];  
?>

This is how far i got but i keep getting Notice: Undefined offset: 0 in C:\xampp\htdocs\programtv\read.php on line 12

what can i do ?

i wnat to extract the data between

<tr style="background-color:#cccccc;">

and

</table></td></tr></table></td></tr></table>

premiso · December 18, 2008

Be so kind as to inform us what line 12 is.

The undefined offset, usually means that you are trying to print an element of an array that is not there.

Carnacior · December 18, 2008

its the echo line ... sorry i shrinked the code when posted it here

premiso · December 18, 2008

I see. Not a problem, unfortunately I am not great at Regex, but from what I know I think you are being too descriptive.....

preg_match_all('/<tr style=\"background-color:#cccccc;\">(.*)<\/table>/', $file, $matches, PREG_SET_ORDER);

Unsure if that will work, but yea. Probably would have been better posting in the regex forum ^.-

Carnacior · December 18, 2008

i was thingking at that too but maybe it can be solved with something else

sloth456 · December 18, 2008

You're in luck, just spent a million years myself getting to grips with scraping, annoying at first, but once you crack it, it's useful as hell.

Try this:

$file = file_get_contents('http://tv.bascalie.ro/program~data-20-decembrie-2008~post-pro-tv.html');
preg_match_all('/<tr style="background-color:#cccccc;">(.*?)<\/table><\/td><\/tr><\/table><\/td><\/tr><\/table>/', $file, $matches);
print_r($matches);

made 3 changes to your code.

1) In the regular expression area you've started with "/ and /", I've changed the m to '/ and /'. Much easier because then you do not have to escape all your double quotes with back slashes.

2) I've changed (.*) to (.*?), trust me I know very little about regular expression. But (.*?) is about the only thing I ever use, it just means "grab whatever's here".

3) I've changed echo $matches[0] to print_r($matches). print_r is a nice little command that will output the full array with numbers as well so you can see where your content is stored.

Usually when I'm doing this kind of scraping I find that $matches[0] does not contain what I want and $matches[1] does. Take a look and see which bit you need.

A full tutorial can be found at http://www.thefutureoftheweb.com/blog/web-scrape-with-php-tutorial

If you need anymore help, I'll see what I can do.

premiso · December 18, 2008

i was thingking at that too but maybe it can be solved with something else

It can...

<?php
error_reporting(E_ALL);
$file = file_get_contents('http://tv.bascalie.ro/program~data-20-decembrie-2008~post-pro-tv.html');

$matches = split('<tr style="background-color:#cccccc;">', $file);
$matches = split('</table>', $matches[1]);
echo $matches[0];  
?>

That should work lol. =)

Carnacior · December 18, 2008

damn that was fast. thank you very much guys

i go have a few tests and i will be back with results or questions

question...

how can i strip the links from the grabbed data ? the href ?

later edit:

i have done it myself

$matches = split('<tr style="background-color:#cccccc;">', $file);
$matches = split('</table>', $matches[1]);
$out = $matches[0]; 

$text = preg_replace('@<a[^>]*.*?>@si', '', $out);
$text = str_replace("</a>", "", $text);
echo $text;

sloth456 · December 18, 2008

try the following as your regular expression

'/<tr style="background-color:#cccccc;">.*?<a href="(.*?)".*?<\/table><\/td><\/tr><\/table><\/td><\/tr><\/table>/'

.*? just means "skip over whatever is here"

(.*?) means "grab this"

Carnacior · December 18, 2008

thanks for the info sloth

premiso · December 18, 2008

thanks for the info sloth

Second that! =) That tidbit of information helps me out a ton with RegEX too. Thanks!

Sign In

[SOLVED] Take data from another webpage

Recommended Posts

Carnacior

Link to comment

Share on other sites

premiso

Link to comment

Share on other sites

Carnacior

Link to comment

Share on other sites

premiso

Link to comment

Share on other sites

Carnacior

Link to comment

Share on other sites

sloth456

Link to comment

Share on other sites

premiso

Link to comment

Share on other sites

Carnacior

Link to comment

Share on other sites

sloth456

Link to comment

Share on other sites

Carnacior

Link to comment

Share on other sites

premiso

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information