Search the Community
Showing results for tags 'web scraper'.
-
I'm just learning php and I have a web scraper I'm working on using Simple HTML DOM. It's almost complete but still lacks a bit of logic. What I want the script to do is scrape multiple pages and compare the links, and IF a matching domain is found linked from more than 1 page, send an email What I've come up works for matching a domain that's hard coded into the script, but I want to match domains from other pages. And, the script will send an email for every match it finds but I just want 1 email with all the matching domains. I believe array_intersect() is the function I need to be working with but I can't figure this out. I will be so happy if I can get this completed. Thanks for your time and consideration. Here is my code // Pull in PHP Simple HTML DOM Parser include("simple_html_dom.php"); $sitesToCheck = array( array("url" => "http://www.google.com"), array("url" => "http://www.yahoo.com"), array("url" => "http://www.facebook.com") ); // For every page to check... foreach($sitesToCheck as $site) { $url = $site["url"]; // Get the URL's current page content $html = file_get_html($url); // Find all links foreach($html->find('a') as $element) { $href = $element->href; $link = $href; $pattern = '/\w+\..{2,3}(?:\..{2,3})?(?:$|(?=\/))/i'; $domain = $link; if (preg_match($pattern, $domain, $matches) === 1) { $domain = $matches[0]; } // This works for matching google.com // but I want to match with $domain from other sites if (preg_match("/google.com/", $domain)) { mail("someone@example.com","Match found",$domain); } else { echo "A match was not found." . "<br />"; } } }
- 8 replies
-
- simple html dom
- web scraper
-
(and 1 more)
Tagged with: