SecureMind Posted July 20, 2011 Share Posted July 20, 2011 My php skills are modest and I would like to be able to visit a given URL or other item like img tags and count how many exist on the page. ie.: 1.) take a url like "http://www.google.com" 2.) visit that url from within the script 3.) walk through the code of the page finding all the href instances 4.) return a count of the number of links found Sounds simple enough but I fear it won't be. I assume that I should start with curl but I'm not sure. Any advice is appreciated. Quote Link to comment https://forums.phpfreaks.com/topic/242490-count-the-number-of-links-or-some-other-tag-from-a-website/ Share on other sites More sharing options...
teynon Posted July 21, 2011 Share Posted July 21, 2011 You're going to want to use preg_match_all with a string such as "<a[^>]+>" http://www.php.net/manual/en/function.preg-match-all.php <?php $contents=file_get_contents("http://www.google.com"); $count=preg_match_all("@<a [^>]+>@", $contents, $matches); echo $count; print_r($matches); ?> Edit: Tested code / updated above. Quote Link to comment https://forums.phpfreaks.com/topic/242490-count-the-number-of-links-or-some-other-tag-from-a-website/#findComment-1245426 Share on other sites More sharing options...
.josh Posted July 21, 2011 Share Posted July 21, 2011 see this post Quote Link to comment https://forums.phpfreaks.com/topic/242490-count-the-number-of-links-or-some-other-tag-from-a-website/#findComment-1245429 Share on other sites More sharing options...
teynon Posted July 21, 2011 Share Posted July 21, 2011 Crayons solution is much better than preg. Please use his example. Quote Link to comment https://forums.phpfreaks.com/topic/242490-count-the-number-of-links-or-some-other-tag-from-a-website/#findComment-1245430 Share on other sites More sharing options...
SecureMind Posted July 21, 2011 Author Share Posted July 21, 2011 I did some more digging based on the links you suggested and found: http://www.phpfreaks.com/forums/index.php?topic=317646.msg1497723#msg1497723 Based on the initial post in that thread I came up with: <?php if (isset($_POST['Submit'])) { // First Function for Getting Links from the page function urlstatspoller($link) { $ret = array(); // returns an array $dom = new domDocument; // sets up a new dom object @$dom->loadHTML(file_get_contents($link)); // gets the html of the page while supressing any errors $dom->preserveWhiteSpace = false; // does not preserve whitespaces in the html $links = $dom->getElementsByTagName('a'); // polls the links in the page and stores them as "$links" // Loop for walking through each "a" tag and looking for href to make sure it's a link foreach ($links as $tag) { $ret[$tag->getAttribute('href')] = $tag->childNodes->item(0)->nodeValue; } return $ret; } // Second Function for Getting images from the page function imgstatspoller($link) { $ret = array(); // returns an array $dom = new domDocument; // sets up a new dom object @$dom->loadHTML(file_get_contents($link)); // gets the html of the page while supressing any errors $dom->preserveWhiteSpace = false; // does not preserve whitespaces in the html $images = $dom->getElementsByTagName('img'); // polls the links in the page and stores them as "$links" // Loop for walking through each "a" tag and looking for href to make sure it's a link foreach ($images as $tag) { $ret[$tag->getAttribute('src')] = $tag->childNodes->item(0)->nodeValue; } return $ret; } // Get the Link to Search From the Web Form $link = $_POST['address']; // Call the URL Stats Polling Function Function $urls = urlstatspoller($link); // Call the Image Stats Polling Function $imgs = imgstatspoller($link); // Output Findings, they are output to a csv file as: #of links, #of images if(sizeof($urls) > 0) { $counter1 = count($urls); echo $counter1, ","; } else { echo "0,"; } if(sizeof($imgs) > 0) { $counter2 = count($imgs); echo $counter2, ","; } else { echo "0,"; } } ?> <br /><br /> <form action="" method="post" enctype="multipart/form-data" name="link"> <input name="address" type="text" value="" /> <input name="Submit" type="Submit" /> </form> I'm sure that there is a more elegant solution and even this needs a little more work to fit my in/out format needs but this will work for now. I'm just trying to get some really basic stats from a list of links. Thanks for both of your help! I really appreciate it. Quote Link to comment https://forums.phpfreaks.com/topic/242490-count-the-number-of-links-or-some-other-tag-from-a-website/#findComment-1245434 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.