Jump to content

Recommended Posts

<?
error_reporting(E_ALL);
$file = file_get_contents('http://tv.bascalie.ro/program~data-20-decembrie-2008~post-pro-tv.html');
preg_match_all("/<tr style=\"background-color:#cccccc;\">(.*)<\/table><\/td><\/tr><\/table><\/td><\/tr><\/table>/", $file, $matches, PREG_SET_ORDER);
echo $matches[0];  
?>

This is how far i got but i keep getting Notice: Undefined offset: 0 in C:\xampp\htdocs\programtv\read.php on line 12 :(

what can i do ?

i wnat to extract the data between

<tr style="background-color:#cccccc;">

and

</table></td></tr></table></td></tr></table>

Link to comment
https://forums.phpfreaks.com/topic/137610-solved-take-data-from-another-webpage/
Share on other sites

I see. Not a problem, unfortunately I am not great at Regex, but from what I know I think you are being too descriptive.....

 

preg_match_all('/<tr style=\"background-color:#cccccc;\">(.*)<\/table>/', $file, $matches, PREG_SET_ORDER);

 

Unsure if that will work, but yea. Probably would have been better posting in the regex forum ^.-

You're in luck, just spent a million years myself getting to grips with scraping, annoying at first, but once you crack it, it's useful as hell.

 

Try this:

 

$file = file_get_contents('http://tv.bascalie.ro/program~data-20-decembrie-2008~post-pro-tv.html');
preg_match_all('/<tr style="background-color:#cccccc;">(.*?)<\/table><\/td><\/tr><\/table><\/td><\/tr><\/table>/', $file, $matches);
print_r($matches);

 

made 3 changes to your code.

 

1) In the regular expression area you've started with "/ and /", I've changed the m to '/ and /'. Much easier because then you do not have to escape all your double quotes with back slashes.

 

2) I've changed (.*) to (.*?), trust me I know very little about regular expression.  But (.*?) is about the only thing I ever use, it just means "grab whatever's here".

 

3) I've changed echo $matches[0] to print_r($matches).  print_r is a nice little command that will output the full array with numbers as well so you can see where your content is stored.

 

Usually when I'm doing this kind of scraping I find that $matches[0] does not contain what I want and $matches[1] does.  Take a look and see which bit you need.

 

A full tutorial can be found at http://www.thefutureoftheweb.com/blog/web-scrape-with-php-tutorial

 

If you need anymore help, I'll see what I can do.

i was thingking at that too but maybe it can be solved with something else :)

 

It can...

 

<?php
error_reporting(E_ALL);
$file = file_get_contents('http://tv.bascalie.ro/program~data-20-decembrie-2008~post-pro-tv.html');

$matches = split('<tr style="background-color:#cccccc;">', $file);
$matches = split('</table>', $matches[1]);
echo $matches[0];  
?>

 

That should work lol. =)

damn that was fast. thank you very much guys :)

i go have a few tests and i will be back with results or questions :)

 

question...

how can i strip the links from the grabbed data ? the href ?

 

later edit:

i have done it myself :D

$matches = split('<tr style="background-color:#cccccc;">', $file);
$matches = split('</table>', $matches[1]);
$out = $matches[0]; 

$text = preg_replace('@<a[^>]*.*?>@si', '', $out);
$text = str_replace("</a>", "", $text);
echo $text; 

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.