playwright Posted June 2, 2010 Share Posted June 2, 2010 Hello..i'm new to php so i need some real help in here... I trying to create a web scraper that grabs a forum's content and shows only the posts. . The source code is here: <html> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/> <?php $html = file_get_contents ('http://www.......'); $dom = new DomDocument(); @$dom->loadHTML ($html); $xpath = new DOMXPath ($dom); $key = $xpath->query ('//*[@class=postTextContainer]'); foreach($key as $keys){ echo $keys->nodeValue ,"<br/> \n"; } ?> </html> can anyone tell me how i could grab all the posts that are in the same thread??now i can only grab the posts that are in the above url..i think it's called multiple page scraping?? Quote Link to comment Share on other sites More sharing options...
playwright Posted June 2, 2010 Author Share Posted June 2, 2010 I also want to ask how i can delete the content that exists between two tags and exists in the content that i have grabbed with the above code?? more specific the tag is <div class="........">bla bla</div> Quote Link to comment Share on other sites More sharing options...
newb Posted June 3, 2010 Share Posted June 3, 2010 do a pregmatch Quote Link to comment Share on other sites More sharing options...
playwright Posted June 3, 2010 Author Share Posted June 3, 2010 do you know which regex may fit for <div class="quotationHeaderText">bla bla bla </div>?? Quote Link to comment Share on other sites More sharing options...
newb Posted June 3, 2010 Share Posted June 3, 2010 something like this: if (preg_match('|<div class="quotationHeaderText">(.*)</div>|U',$html,$result)) { $match = $result[1]; } Quote Link to comment Share on other sites More sharing options...
playwright Posted June 3, 2010 Author Share Posted June 3, 2010 unfortunately it gives me an error: Parse error: syntax error, unexpected T_STRING are you sure i can leave whitespace between div and class?? Quote Link to comment Share on other sites More sharing options...
inversesoft123 Posted June 3, 2010 Share Posted June 3, 2010 $pieces = explode("<div class=\"quotationHeaderText\">", $Data); Try something like this Quote Link to comment Share on other sites More sharing options...
playwright Posted June 3, 2010 Author Share Posted June 3, 2010 any idea about how i can parse multiple pages???? Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.