drizac Posted July 13, 2011 Share Posted July 13, 2011 Hi First time poster, long time reader. Background: I want to extract different URLs from websites by using get_content() + regex. The amount of URLs differ from website to website. Status Quo: I have the result in a variable. But when I use print_r I get the result as an array inside another array, this is my code: $html = file_get_contents($html); preg_match_all('/'http:\/\/website.com\/.+#1/', $html , $strng); print_r ($strng); Result: Array ( [0] => Array ( [0] => 'http://website.com/first#1 [1] => 'http://slideshop.com/website-second#1 ) ) Question: How can i print the result as a list when the array is inside another array? Thank you in advance D Quote Link to comment Share on other sites More sharing options...
AyKay47 Posted July 13, 2011 Share Posted July 13, 2011 to print out each individual result, you would have something like $html = file_get_contents($html); preg_match_all('/'http:\/\/website.com\/.+#1/', $html , $strng); print $strng[0][0]; print $strng[0][1]; //etc... or you can use a foreach loop foreach ($strng as $val) { echo $val[0] . "\n"; echo $val[1] . "\n"; echo $val[2] . "\n"; echo $val[3] . "\n"; echo $val[4] . "\n"; } Quote Link to comment Share on other sites More sharing options...
cags Posted July 13, 2011 Share Posted July 13, 2011 The top level array represents the capture groups, and the inner arrays are the items in each capture group. Since you don't use a capture group in your pattern you only get the resulting match of the entire pattern. Therefore if you wish to loop through all of the strings that matched the entire pattern you can use ... foreach($strng[0] as $url) { echo $url; } Quote Link to comment Share on other sites More sharing options...
drizac Posted July 15, 2011 Author Share Posted July 15, 2011 Thanks a lot, that really helped Is there a point award system somewhere? Quote Link to comment Share on other sites More sharing options...
.josh Posted July 15, 2011 Share Posted July 15, 2011 I'm not entirely sure what you're really trying to do here, because your preg_match_all as-is produces a syntax error, and you also say you are trying to grab urls from a website but have a specific url named in the regex. Are you trying to regex for urls that match a specific domain? In any case, whether or not it's all links or specific domain, your regex will match for the URL wherever it appears on the page, from a viewsource PoV. So if it is for instance in a javascript variable or displayed on-page as plain-text (not a link), your regex is going to match it. You should include in your pattern to match it only if it appears in anchor tag href attribute. Also, it looks like you're attempting and exact "head" match... is that what you really want? for instance, your regex will match http://www.somesite.com but not https://www.somesite.com or www.somesite.com And overall, you shouldn't really be using regex to parse for links or other html to begin with. You should instead use DOM. Here is a function to grab link urls on a page: function get_links($page,$url=false) { $xml = new DOMDocument(); $xml->loadHTML($page); $links = array(); foreach($xml->getElementsByTagName('a') as $link) { $href = $link->getAttribute('href'); if ( $url ) { if ( strcasecmp($url, parse_url($href,PHP_URL_HOST)) == 0 ) $links[] = $href; } else { $links[] = $href; } } return $links; } // end get_links get all links // get the page content $page = file_get_contents('http://www.somesite.com'); // get all the links $links = get_links($page); echo "<pre>";print_r($links); echo "</pre>"; output: Array ( [0] => http://www.somelink.com [1] => / [2] => /some/path/to/file.html [3] => http://www.someotherlink.com/a/b/c/d.html ) get only links from www.xyz.com // get the page content $page = file_get_contents('http://www.somesite.com'); // get all the links $links = get_links($page,'www.xyz.com'); echo "<pre>";print_r($links); echo "</pre>"; output: Array ( [0] => http://www.xyx.com [1] => http://www.xyz.com/some/file.html ) Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.