xor83 Posted March 2, 2009 Share Posted March 2, 2009 How can I extract URLs from webpage and want that all url should be from some specific site only like "www.abc.com/32432/file.zip" it should search abc.com and the extenstion can zip,rar,001 any help? Quote Link to comment https://forums.phpfreaks.com/topic/147633-extract-url/ Share on other sites More sharing options...
br0ken Posted March 2, 2009 Share Posted March 2, 2009 Use file_get_contents($url) to get the HTML of the website in question. Then loop through that content looking for <a tags. Each time you find an anchor tag, use substr() and strpos() to extract the link. Do this for all links adding each one into an array. Once you've done this, go through each link checking to see whether it is from the domain you're looking for. You can use the parse_url() function to do this or you can use a combination of strpos() and substr() If you need more information on the functions given above please see these links: http://uk2.php.net/substr http://uk.php.net/strpos http://uk2.php.net/function.file-get-contents http://uk3.php.net/function.parse-url Quote Link to comment https://forums.phpfreaks.com/topic/147633-extract-url/#findComment-775011 Share on other sites More sharing options...
a-scripts.com Posted March 2, 2009 Share Posted March 2, 2009 i think regular expressions are your friends for this ... using preg_match_all() and some fancy pattern you can get exactly what you need Quote Link to comment https://forums.phpfreaks.com/topic/147633-extract-url/#findComment-775122 Share on other sites More sharing options...
premiso Posted March 2, 2009 Share Posted March 2, 2009 <?php $string = '<a href="http://www.abc.com/rand/file.r01"> <a href="http://www.abc.com/rand/file.r02"> <a href="http://www.abc.com/rand/file.r03">'; preg_match_all('~www.abc.com/(.+?)"~is', $string, $matches); echo "<pre>" . print_r($matches, 1) . "</pre>"; die(); ?> Rough example. But should work. Outputs: Array ( [0] => Array ( [0] => www.abc.com/rand/file.r01" [1] => www.abc.com/rand/file.r02" [2] => www.abc.com/rand/file.r03" ) [1] => Array ( [0] => rand/file.r01 [1] => rand/file.r02 [2] => rand/file.r03 ) ) Quote Link to comment https://forums.phpfreaks.com/topic/147633-extract-url/#findComment-775125 Share on other sites More sharing options...
xor83 Posted March 3, 2009 Author Share Posted March 3, 2009 thanx all for your rpl premiso: I have tested many sample but none of them is working the way I want but the code you gave me is working exactly the way I wanted thax guru... Quote Link to comment https://forums.phpfreaks.com/topic/147633-extract-url/#findComment-775318 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.