Jump to content

[SOLVED] Take data from another webpage


Carnacior

Recommended Posts

<?
error_reporting(E_ALL);
$file = file_get_contents('http://tv.bascalie.ro/program~data-20-decembrie-2008~post-pro-tv.html');
preg_match_all("/<tr style=\"background-color:#cccccc;\">(.*)<\/table><\/td><\/tr><\/table><\/td><\/tr><\/table>/", $file, $matches, PREG_SET_ORDER);
echo $matches[0];  
?>

This is how far i got but i keep getting Notice: Undefined offset: 0 in C:\xampp\htdocs\programtv\read.php on line 12 :(

what can i do ?

i wnat to extract the data between

<tr style="background-color:#cccccc;">

and

</table></td></tr></table></td></tr></table>

Link to comment
https://forums.phpfreaks.com/topic/137610-solved-take-data-from-another-webpage/
Share on other sites

I see. Not a problem, unfortunately I am not great at Regex, but from what I know I think you are being too descriptive.....

 

preg_match_all('/<tr style=\"background-color:#cccccc;\">(.*)<\/table>/', $file, $matches, PREG_SET_ORDER);

 

Unsure if that will work, but yea. Probably would have been better posting in the regex forum ^.-

You're in luck, just spent a million years myself getting to grips with scraping, annoying at first, but once you crack it, it's useful as hell.

 

Try this:

 

$file = file_get_contents('http://tv.bascalie.ro/program~data-20-decembrie-2008~post-pro-tv.html');
preg_match_all('/<tr style="background-color:#cccccc;">(.*?)<\/table><\/td><\/tr><\/table><\/td><\/tr><\/table>/', $file, $matches);
print_r($matches);

 

made 3 changes to your code.

 

1) In the regular expression area you've started with "/ and /", I've changed the m to '/ and /'. Much easier because then you do not have to escape all your double quotes with back slashes.

 

2) I've changed (.*) to (.*?), trust me I know very little about regular expression.  But (.*?) is about the only thing I ever use, it just means "grab whatever's here".

 

3) I've changed echo $matches[0] to print_r($matches).  print_r is a nice little command that will output the full array with numbers as well so you can see where your content is stored.

 

Usually when I'm doing this kind of scraping I find that $matches[0] does not contain what I want and $matches[1] does.  Take a look and see which bit you need.

 

A full tutorial can be found at http://www.thefutureoftheweb.com/blog/web-scrape-with-php-tutorial

 

If you need anymore help, I'll see what I can do.

i was thingking at that too but maybe it can be solved with something else :)

 

It can...

 

<?php
error_reporting(E_ALL);
$file = file_get_contents('http://tv.bascalie.ro/program~data-20-decembrie-2008~post-pro-tv.html');

$matches = split('<tr style="background-color:#cccccc;">', $file);
$matches = split('</table>', $matches[1]);
echo $matches[0];  
?>

 

That should work lol. =)

damn that was fast. thank you very much guys :)

i go have a few tests and i will be back with results or questions :)

 

question...

how can i strip the links from the grabbed data ? the href ?

 

later edit:

i have done it myself :D

$matches = split('<tr style="background-color:#cccccc;">', $file);
$matches = split('</table>', $matches[1]);
$out = $matches[0]; 

$text = preg_replace('@<a[^>]*.*?>@si', '', $out);
$text = str_replace("</a>", "", $text);
echo $text; 

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.