physaux Posted January 18, 2010 Share Posted January 18, 2010 Hey guys, here is what I have so far: $regex = '/<a(.+?)\/a>/'; preg_match($regex,$htmlcode,$output); echo $output[1] . '<br>'; But I think I am doing it wrong. Here is what I want to do: I have all the html code from a page. I want to extract all the links into an array, and preferable get the anchor text too. So I want my output to be like so: $finaloutput[1]['url']="http://google.com"; $finaloutput[1]['anchor']="google"; $finaloutput[2]['url']="http://phpfreaks.com"; $finaloutput[2]['anchor']="phpfreaks"; ... :confused: Could anyone please point me in the right direction to do this?? Thank you!! Link to comment https://forums.phpfreaks.com/topic/188917-trying-to-use-preg-match-to-get-all-links-in-html-source-code/ Share on other sites More sharing options...
JAY6390 Posted January 18, 2010 Share Posted January 18, 2010 Take a look at the article I wrote a while ago http://www.jaygilford.com/php/common-questions/how-to-get-all-links-from-a-web-page/ Link to comment https://forums.phpfreaks.com/topic/188917-trying-to-use-preg-match-to-get-all-links-in-html-source-code/#findComment-997467 Share on other sites More sharing options...
physaux Posted January 18, 2010 Author Share Posted January 18, 2010 Ok, I am reading it now thanks. I'll post back if I am still having problems Link to comment https://forums.phpfreaks.com/topic/188917-trying-to-use-preg-match-to-get-all-links-in-html-source-code/#findComment-997471 Share on other sites More sharing options...
physaux Posted January 18, 2010 Author Share Posted January 18, 2010 Ok it is not working, just prints out "Array ()" I tried changing echo print_r to just print_r, and then nothing was outputted. Please help! <?php if($_POST){ $domains = explode("\n", $_POST['domains']); $ch = curl_init(); foreach($domains as $url) { curl_setopt ($ch, CURLOPT_URL, $url); curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6"); curl_setopt ($ch, CURLOPT_TIMEOUT, 60); curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1); curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt ($ch, CURLOPT_REFERER, 'http://www.google.com/search?q=best+community+forum'); $AskApache_result = curl_exec ($ch); $pattern = '%<a [^>]+href="(?P<url>[^"]+)"[^>*]*>(?P<text>[^< ]+)</a>%si'; preg_match_all($pattern, $AskApache_result, $matches); $urls = array(); foreach($matches['url'] as $k=>$v) { $urls[$k] = array('url' => $v,'text' => $matches['text'][$k]); } echo print_r($urls, true); flush(); //ob_flush(); } } ?> what is wrong thank you!! Link to comment https://forums.phpfreaks.com/topic/188917-trying-to-use-preg-match-to-get-all-links-in-html-source-code/#findComment-997489 Share on other sites More sharing options...
physaux Posted January 18, 2010 Author Share Posted January 18, 2010 Well I see that I am having the exact same problems even with the basic script, it is not outputting anything: http://www.jaygilford.com/php/common-questions/how-to-get-all-links-from-a-web-page/ Anyone help plzz!! Link to comment https://forums.phpfreaks.com/topic/188917-trying-to-use-preg-match-to-get-all-links-in-html-source-code/#findComment-997590 Share on other sites More sharing options...
nrg_alpha Posted January 18, 2010 Share Posted January 18, 2010 My suggestion would be to to use something like dom / domxpath. So for example, suppose I wanted to fetch all the links from say http://www.sfu.ca/, this would be one way to do it: $dom = new DOMDocument; libxml_use_internal_errors(true); @$dom->loadHTMLFile('http://www.cs.sfu.ca/'); // insert url of choice libxml_use_internal_errors(false); $xpath = new DOMXPath($dom); $aTag = $xpath->query('//a[@href]'); // search for all anchor tags that provide an href attribute $finaloutput = array(); // declaring array $finaloutput foreach($aTag as $url){ $finalouput[] = array('url' => $url->getAttribute('href'), 'anchor' => $url->nodeValue); } echo '<pre>'.print_r($finalouput, true); Link to comment https://forums.phpfreaks.com/topic/188917-trying-to-use-preg-match-to-get-all-links-in-html-source-code/#findComment-997618 Share on other sites More sharing options...
physaux Posted January 18, 2010 Author Share Posted January 18, 2010 ah thanks that works great Link to comment https://forums.phpfreaks.com/topic/188917-trying-to-use-preg-match-to-get-all-links-in-html-source-code/#findComment-997708 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.