Carnacior Posted December 18, 2008 Share Posted December 18, 2008 <? error_reporting(E_ALL); $file = file_get_contents('http://tv.bascalie.ro/program~data-20-decembrie-2008~post-pro-tv.html'); preg_match_all("/<tr style=\"background-color:#cccccc;\">(.*)<\/table><\/td><\/tr><\/table><\/td><\/tr><\/table>/", $file, $matches, PREG_SET_ORDER); echo $matches[0]; ?> This is how far i got but i keep getting Notice: Undefined offset: 0 in C:\xampp\htdocs\programtv\read.php on line 12 what can i do ? i wnat to extract the data between <tr style="background-color:#cccccc;"> and </table></td></tr></table></td></tr></table> Quote Link to comment Share on other sites More sharing options...
premiso Posted December 18, 2008 Share Posted December 18, 2008 Be so kind as to inform us what line 12 is. The undefined offset, usually means that you are trying to print an element of an array that is not there. Quote Link to comment Share on other sites More sharing options...
Carnacior Posted December 18, 2008 Author Share Posted December 18, 2008 its the echo line ... sorry i shrinked the code when posted it here Quote Link to comment Share on other sites More sharing options...
premiso Posted December 18, 2008 Share Posted December 18, 2008 I see. Not a problem, unfortunately I am not great at Regex, but from what I know I think you are being too descriptive..... preg_match_all('/<tr style=\"background-color:#cccccc;\">(.*)<\/table>/', $file, $matches, PREG_SET_ORDER); Unsure if that will work, but yea. Probably would have been better posting in the regex forum ^.- Quote Link to comment Share on other sites More sharing options...
Carnacior Posted December 18, 2008 Author Share Posted December 18, 2008 i was thingking at that too but maybe it can be solved with something else Quote Link to comment Share on other sites More sharing options...
sloth456 Posted December 18, 2008 Share Posted December 18, 2008 You're in luck, just spent a million years myself getting to grips with scraping, annoying at first, but once you crack it, it's useful as hell. Try this: $file = file_get_contents('http://tv.bascalie.ro/program~data-20-decembrie-2008~post-pro-tv.html'); preg_match_all('/<tr style="background-color:#cccccc;">(.*?)<\/table><\/td><\/tr><\/table><\/td><\/tr><\/table>/', $file, $matches); print_r($matches); made 3 changes to your code. 1) In the regular expression area you've started with "/ and /", I've changed the m to '/ and /'. Much easier because then you do not have to escape all your double quotes with back slashes. 2) I've changed (.*) to (.*?), trust me I know very little about regular expression. But (.*?) is about the only thing I ever use, it just means "grab whatever's here". 3) I've changed echo $matches[0] to print_r($matches). print_r is a nice little command that will output the full array with numbers as well so you can see where your content is stored. Usually when I'm doing this kind of scraping I find that $matches[0] does not contain what I want and $matches[1] does. Take a look and see which bit you need. A full tutorial can be found at http://www.thefutureoftheweb.com/blog/web-scrape-with-php-tutorial If you need anymore help, I'll see what I can do. Quote Link to comment Share on other sites More sharing options...
premiso Posted December 18, 2008 Share Posted December 18, 2008 i was thingking at that too but maybe it can be solved with something else It can... <?php error_reporting(E_ALL); $file = file_get_contents('http://tv.bascalie.ro/program~data-20-decembrie-2008~post-pro-tv.html'); $matches = split('<tr style="background-color:#cccccc;">', $file); $matches = split('</table>', $matches[1]); echo $matches[0]; ?> That should work lol. =) Quote Link to comment Share on other sites More sharing options...
Carnacior Posted December 18, 2008 Author Share Posted December 18, 2008 damn that was fast. thank you very much guys i go have a few tests and i will be back with results or questions question... how can i strip the links from the grabbed data ? the href ? later edit: i have done it myself $matches = split('<tr style="background-color:#cccccc;">', $file); $matches = split('</table>', $matches[1]); $out = $matches[0]; $text = preg_replace('@<a[^>]*.*?>@si', '', $out); $text = str_replace("</a>", "", $text); echo $text; Quote Link to comment Share on other sites More sharing options...
sloth456 Posted December 18, 2008 Share Posted December 18, 2008 try the following as your regular expression '/<tr style="background-color:#cccccc;">.*?<a href="(.*?)".*?<\/table><\/td><\/tr><\/table><\/td><\/tr><\/table>/' .*? just means "skip over whatever is here" (.*?) means "grab this" Quote Link to comment Share on other sites More sharing options...
Carnacior Posted December 18, 2008 Author Share Posted December 18, 2008 thanks for the info sloth Quote Link to comment Share on other sites More sharing options...
premiso Posted December 18, 2008 Share Posted December 18, 2008 thanks for the info sloth Second that! =) That tidbit of information helps me out a ton with RegEX too. Thanks! Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.