physaux Posted January 18, 2010 Share Posted January 18, 2010 Hey guys, here is what I have so far: $regex = '/<a(.+?)\/a>/'; preg_match($regex,$htmlcode,$output); echo $output[1] . '<br>'; But I think I am doing it wrong. Here is what I want to do: I have all the html code from a page. I want to extract all the links into an array, and preferable get the anchor text too. So I want my output to be like so: $finaloutput[1]['url']="http://google.com"; $finaloutput[1]['anchor']="google"; $finaloutput[2]['url']="http://phpfreaks.com"; $finaloutput[2]['anchor']="phpfreaks"; ... :confused: Could anyone please point me in the right direction to do this?? Thank you!! Quote Link to comment https://forums.phpfreaks.com/topic/188917-trying-to-use-preg-match-to-get-all-links-in-html-source-code/ Share on other sites More sharing options...
JAY6390 Posted January 18, 2010 Share Posted January 18, 2010 Take a look at the article I wrote a while ago http://www.jaygilford.com/php/common-questions/how-to-get-all-links-from-a-web-page/ Quote Link to comment https://forums.phpfreaks.com/topic/188917-trying-to-use-preg-match-to-get-all-links-in-html-source-code/#findComment-997467 Share on other sites More sharing options...
physaux Posted January 18, 2010 Author Share Posted January 18, 2010 Ok, I am reading it now thanks. I'll post back if I am still having problems Quote Link to comment https://forums.phpfreaks.com/topic/188917-trying-to-use-preg-match-to-get-all-links-in-html-source-code/#findComment-997471 Share on other sites More sharing options...
physaux Posted January 18, 2010 Author Share Posted January 18, 2010 Ok it is not working, just prints out "Array ()" I tried changing echo print_r to just print_r, and then nothing was outputted. Please help! <?php if($_POST){ $domains = explode("\n", $_POST['domains']); $ch = curl_init(); foreach($domains as $url) { curl_setopt ($ch, CURLOPT_URL, $url); curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6"); curl_setopt ($ch, CURLOPT_TIMEOUT, 60); curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1); curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt ($ch, CURLOPT_REFERER, 'http://www.google.com/search?q=best+community+forum'); $AskApache_result = curl_exec ($ch); $pattern = '%<a [^>]+href="(?P<url>[^"]+)"[^>*]*>(?P<text>[^< ]+)</a>%si'; preg_match_all($pattern, $AskApache_result, $matches); $urls = array(); foreach($matches['url'] as $k=>$v) { $urls[$k] = array('url' => $v,'text' => $matches['text'][$k]); } echo print_r($urls, true); flush(); //ob_flush(); } } ?> what is wrong thank you!! Quote Link to comment https://forums.phpfreaks.com/topic/188917-trying-to-use-preg-match-to-get-all-links-in-html-source-code/#findComment-997489 Share on other sites More sharing options...
physaux Posted January 18, 2010 Author Share Posted January 18, 2010 Well I see that I am having the exact same problems even with the basic script, it is not outputting anything: http://www.jaygilford.com/php/common-questions/how-to-get-all-links-from-a-web-page/ Anyone help plzz!! Quote Link to comment https://forums.phpfreaks.com/topic/188917-trying-to-use-preg-match-to-get-all-links-in-html-source-code/#findComment-997590 Share on other sites More sharing options...
nrg_alpha Posted January 18, 2010 Share Posted January 18, 2010 My suggestion would be to to use something like dom / domxpath. So for example, suppose I wanted to fetch all the links from say http://www.sfu.ca/, this would be one way to do it: $dom = new DOMDocument; libxml_use_internal_errors(true); @$dom->loadHTMLFile('http://www.cs.sfu.ca/'); // insert url of choice libxml_use_internal_errors(false); $xpath = new DOMXPath($dom); $aTag = $xpath->query('//a[@href]'); // search for all anchor tags that provide an href attribute $finaloutput = array(); // declaring array $finaloutput foreach($aTag as $url){ $finalouput[] = array('url' => $url->getAttribute('href'), 'anchor' => $url->nodeValue); } echo '<pre>'.print_r($finalouput, true); Quote Link to comment https://forums.phpfreaks.com/topic/188917-trying-to-use-preg-match-to-get-all-links-in-html-source-code/#findComment-997618 Share on other sites More sharing options...
physaux Posted January 18, 2010 Author Share Posted January 18, 2010 ah thanks that works great Quote Link to comment https://forums.phpfreaks.com/topic/188917-trying-to-use-preg-match-to-get-all-links-in-html-source-code/#findComment-997708 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.