playwright Posted June 2, 2010 Share Posted June 2, 2010 Hello..i'm new to php so i need some real help in here... I trying to create a web scraper that grabs a forum's content and shows only the posts. . The source code is here: <html> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/> <?php $html = file_get_contents ('http://www.......'); $dom = new DomDocument(); @$dom->loadHTML ($html); $xpath = new DOMXPath ($dom); $key = $xpath->query ('//*[@class=postTextContainer]'); foreach($key as $keys){ echo $keys->nodeValue ,"<br/> \n"; } ?> </html> can anyone tell me how i could grab all the posts that are in the same thread??now i can only grab the posts that are in the above url..i think it's called multiple page scraping?? Link to comment https://forums.phpfreaks.com/topic/203696-help-with-web-scraping/ Share on other sites More sharing options...
playwright Posted June 2, 2010 Author Share Posted June 2, 2010 I also want to ask how i can delete the content that exists between two tags and exists in the content that i have grabbed with the above code?? more specific the tag is <div class="........">bla bla</div> Link to comment https://forums.phpfreaks.com/topic/203696-help-with-web-scraping/#findComment-1066930 Share on other sites More sharing options...
newb Posted June 3, 2010 Share Posted June 3, 2010 do a pregmatch Link to comment https://forums.phpfreaks.com/topic/203696-help-with-web-scraping/#findComment-1066936 Share on other sites More sharing options...
playwright Posted June 3, 2010 Author Share Posted June 3, 2010 do you know which regex may fit for <div class="quotationHeaderText">bla bla bla </div>?? Link to comment https://forums.phpfreaks.com/topic/203696-help-with-web-scraping/#findComment-1066940 Share on other sites More sharing options...
newb Posted June 3, 2010 Share Posted June 3, 2010 something like this: if (preg_match('|<div class="quotationHeaderText">(.*)</div>|U',$html,$result)) { $match = $result[1]; } Link to comment https://forums.phpfreaks.com/topic/203696-help-with-web-scraping/#findComment-1066946 Share on other sites More sharing options...
playwright Posted June 3, 2010 Author Share Posted June 3, 2010 unfortunately it gives me an error: Parse error: syntax error, unexpected T_STRING are you sure i can leave whitespace between div and class?? Link to comment https://forums.phpfreaks.com/topic/203696-help-with-web-scraping/#findComment-1066949 Share on other sites More sharing options...
inversesoft123 Posted June 3, 2010 Share Posted June 3, 2010 $pieces = explode("<div class=\"quotationHeaderText\">", $Data); Try something like this Link to comment https://forums.phpfreaks.com/topic/203696-help-with-web-scraping/#findComment-1067013 Share on other sites More sharing options...
playwright Posted June 3, 2010 Author Share Posted June 3, 2010 any idea about how i can parse multiple pages???? Link to comment https://forums.phpfreaks.com/topic/203696-help-with-web-scraping/#findComment-1067242 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.