chupacabrot Posted January 22, 2014 Share Posted January 22, 2014 I'm quite new to regular expressions and i looked through all the asked questions and unfortunately couldn't find any answer.. I want to extract a specific parts of my website and simply echo them to a new page. My list of categories is structured alphabeticlly like this - <a href="architecture.html">ARCHITECTURE</a><br /><a href="art.html">ART</a><br /><a href="avantgarde.html">AVANTGARDE</a><br /> . . . and so on. now, what i'm trying to actually do is to extract all the categories as a plain text and simply echo them on the screen. (in this case i need to extract every string that starts with ">A and ends with </a (assuming i dont have any other similiar pattern within my code). i found this piece of code actualy in stackoverflow that supposed to extract anything that exists between tags, but unfortunately it's not the case.. html part - <div name="changeable_text">**GET THIS TEXT**</div> php part - $categories = file_get_contents( $url);libxml_use_internal_errors( true);$doc = new DOMDocument;$doc->loadHTML( $categories);$xpath = new DOMXpath( $doc);$node = $xpath->query( '//div[@name=changeable_text]')->item( 0);echo $node->textContent; // This will print **GET THIS TEXT** Quote Link to comment https://forums.phpfreaks.com/topic/285575-extracting-specific-string-out-of-html-code/ Share on other sites More sharing options...
Ch0cu3r Posted January 22, 2014 Share Posted January 22, 2014 (edited) i found this piece of code actualy in stackoverflow that supposed to extract anything that exists between tags, but unfortunately it's not the case.. The PHP code you posted works fine, it will find all div tags that have a name attribute set to "changeable_text" and return the nodes text value. My list of categories is structured alphabeticlly like this -<a href="architecture.html">ARCHITECTURE</a><br /> <a href="art.html">ART</a><br /> <a href="avantgarde.html">AVANTGARDE</a><br /> . . . and so on. To get all anchor tags on the page, you'd use //a as the xpath query. If you only to get the category links then you need to specify the container they belong to, eg $categories = '<div id="categories"> <a href="architecture.html">ARCHITECTURE</a><br /> <a href="art.html">ART</a><br /> <a href="avantgarde.html">AVANTGARDE</a><br /> </div>'; libxml_use_internal_errors( true); $doc = new DOMDocument; $doc->loadHTML( $categories); $xpath = new DOMXpath( $doc); // find all anchor tags within the <div id="categories"> tag $categorylinks = $xpath->query('//div[@id="categories"]/a'); // loop through the links and echo the link text foreach($categorylinks as $link) { echo $link->textContent .'<br />'; } Edited January 22, 2014 by Ch0cu3r Quote Link to comment https://forums.phpfreaks.com/topic/285575-extracting-specific-string-out-of-html-code/#findComment-1466150 Share on other sites More sharing options...
chupacabrot Posted January 22, 2014 Author Share Posted January 22, 2014 Assuming that http://www.mywebsiteforexample.com/categories.html has a div called 'categories en', here's what i tried and for some reason it still doesn't work <?php $url = 'http://www.mywebsiteforexample.com/categories.html'; libxml_use_internal_errors( true); $doc = new DOMDocument; $doc->loadHTML( $url); $xpath = new DOMXpath( $doc); $categorylinks = $xpath->query('//div[@id="categories en"]/a'); //please notice the space in the div's name - maybe that causes any trouble // loop through the links and echo the link text foreach($categorylinks as $link) { echo $link->textContent .'<br />'; } ?> Quote Link to comment https://forums.phpfreaks.com/topic/285575-extracting-specific-string-out-of-html-code/#findComment-1466165 Share on other sites More sharing options...
Ch0cu3r Posted January 22, 2014 Share Posted January 22, 2014 (edited) You need to pass the url to file_get_contents first. $contents = file_get_contents($url); // load the html into the variable libxml_use_internal_errors( true); $doc = new DOMDocument; $doc->loadHTML($content); // pass in the html ... Edited January 22, 2014 by Ch0cu3r Quote Link to comment https://forums.phpfreaks.com/topic/285575-extracting-specific-string-out-of-html-code/#findComment-1466166 Share on other sites More sharing options...
chupacabrot Posted January 22, 2014 Author Share Posted January 22, 2014 well.. unfortunately it still doesn't work.. :\ <?php $url = 'http://www.mywebsiteforexample.com/categories.html'; $contents = file_get_contents($url); libxml_use_internal_errors( true); $doc = new DOMDocument; $doc->loadHTML( $contents); $xpath = new DOMXpath( $doc); $categorylinks = $xpath->query('//div[@id="categories en"]/a'); //please notice the space in the div's name - maybe that causes any trouble // loop through the links and echo the link text foreach($categorylinks as $link) { echo $link->textContent .'<br />'; } ?> Quote Link to comment https://forums.phpfreaks.com/topic/285575-extracting-specific-string-out-of-html-code/#findComment-1466177 Share on other sites More sharing options...
Ch0cu3r Posted January 22, 2014 Share Posted January 22, 2014 Can you post the html? Quote Link to comment https://forums.phpfreaks.com/topic/285575-extracting-specific-string-out-of-html-code/#findComment-1466188 Share on other sites More sharing options...
chupacabrot Posted January 23, 2014 Author Share Posted January 23, 2014 <!-- start of categories list --><div class="categories en"> </div><a href="../agriculture.html" target="_blank">agriculture</a><br /><a href="../avantgarde.html" target="_blank">avantgarde</a><br /><a href="../azyx.html" target="_blank">azyx</a><br /> Quote Link to comment https://forums.phpfreaks.com/topic/285575-extracting-specific-string-out-of-html-code/#findComment-1466255 Share on other sites More sharing options...
Ch0cu3r Posted January 23, 2014 Share Posted January 23, 2014 Is that right you close the div tag as soon as you open it? The closing div needs to go after the anchor tags <!-- start of categories list --> <div class="categories en"> <!-- open div --> <a href="../agriculture.html" target="_blank">agriculture</a><br /> <a href="../avantgarde.html" target="_blank">avantgarde</a><br /> <a href="../azyx.html" target="_blank">azyx</a><br /> </div> <!-- close div --> Quote Link to comment https://forums.phpfreaks.com/topic/285575-extracting-specific-string-out-of-html-code/#findComment-1466266 Share on other sites More sharing options...
Solution chupacabrot Posted January 23, 2014 Author Solution Share Posted January 23, 2014 Yep, Just noticed that, thanks! Problem solved! Quote Link to comment https://forums.phpfreaks.com/topic/285575-extracting-specific-string-out-of-html-code/#findComment-1466308 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.