SecureMind Posted July 20, 2011 Share Posted July 20, 2011 My php skills are modest and I would like to be able to visit a given URL or other item like img tags and count how many exist on the page. ie.: 1.) take a url like "http://www.google.com" 2.) visit that url from within the script 3.) walk through the code of the page finding all the href instances 4.) return a count of the number of links found Sounds simple enough but I fear it won't be. I assume that I should start with curl but I'm not sure. Any advice is appreciated. Link to comment https://forums.phpfreaks.com/topic/242490-count-the-number-of-links-or-some-other-tag-from-a-website/ Share on other sites More sharing options...
teynon Posted July 21, 2011 Share Posted July 21, 2011 You're going to want to use preg_match_all with a string such as "<a[^>]+>" http://www.php.net/manual/en/function.preg-match-all.php <?php $contents=file_get_contents("http://www.google.com"); $count=preg_match_all("@<a [^>]+>@", $contents, $matches); echo $count; print_r($matches); ?> Edit: Tested code / updated above. Link to comment https://forums.phpfreaks.com/topic/242490-count-the-number-of-links-or-some-other-tag-from-a-website/#findComment-1245426 Share on other sites More sharing options...
.josh Posted July 21, 2011 Share Posted July 21, 2011 see this post Link to comment https://forums.phpfreaks.com/topic/242490-count-the-number-of-links-or-some-other-tag-from-a-website/#findComment-1245429 Share on other sites More sharing options...
teynon Posted July 21, 2011 Share Posted July 21, 2011 Crayons solution is much better than preg. Please use his example. Link to comment https://forums.phpfreaks.com/topic/242490-count-the-number-of-links-or-some-other-tag-from-a-website/#findComment-1245430 Share on other sites More sharing options...
SecureMind Posted July 21, 2011 Author Share Posted July 21, 2011 I did some more digging based on the links you suggested and found: http://www.phpfreaks.com/forums/index.php?topic=317646.msg1497723#msg1497723 Based on the initial post in that thread I came up with: <?php if (isset($_POST['Submit'])) { // First Function for Getting Links from the page function urlstatspoller($link) { $ret = array(); // returns an array $dom = new domDocument; // sets up a new dom object @$dom->loadHTML(file_get_contents($link)); // gets the html of the page while supressing any errors $dom->preserveWhiteSpace = false; // does not preserve whitespaces in the html $links = $dom->getElementsByTagName('a'); // polls the links in the page and stores them as "$links" // Loop for walking through each "a" tag and looking for href to make sure it's a link foreach ($links as $tag) { $ret[$tag->getAttribute('href')] = $tag->childNodes->item(0)->nodeValue; } return $ret; } // Second Function for Getting images from the page function imgstatspoller($link) { $ret = array(); // returns an array $dom = new domDocument; // sets up a new dom object @$dom->loadHTML(file_get_contents($link)); // gets the html of the page while supressing any errors $dom->preserveWhiteSpace = false; // does not preserve whitespaces in the html $images = $dom->getElementsByTagName('img'); // polls the links in the page and stores them as "$links" // Loop for walking through each "a" tag and looking for href to make sure it's a link foreach ($images as $tag) { $ret[$tag->getAttribute('src')] = $tag->childNodes->item(0)->nodeValue; } return $ret; } // Get the Link to Search From the Web Form $link = $_POST['address']; // Call the URL Stats Polling Function Function $urls = urlstatspoller($link); // Call the Image Stats Polling Function $imgs = imgstatspoller($link); // Output Findings, they are output to a csv file as: #of links, #of images if(sizeof($urls) > 0) { $counter1 = count($urls); echo $counter1, ","; } else { echo "0,"; } if(sizeof($imgs) > 0) { $counter2 = count($imgs); echo $counter2, ","; } else { echo "0,"; } } ?> <br /><br /> <form action="" method="post" enctype="multipart/form-data" name="link"> <input name="address" type="text" value="" /> <input name="Submit" type="Submit" /> </form> I'm sure that there is a more elegant solution and even this needs a little more work to fit my in/out format needs but this will work for now. I'm just trying to get some really basic stats from a list of links. Thanks for both of your help! I really appreciate it. Link to comment https://forums.phpfreaks.com/topic/242490-count-the-number-of-links-or-some-other-tag-from-a-website/#findComment-1245434 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.