seb hughes Posted November 13, 2007 Share Posted November 13, 2007 I'm written a script which grabs all URLs from a page and outputs them and also it takes out all duplicates. I need it to grab from a <a href ="www.domain.com>akhajha</a> just the www.domain.com or domain.net. If the link is www.domain.org/helllloooooo.html it needs to trim itto www.domain.org. I written the program, just this Regex is driving me nuts. $url = $_POST['url']; $banned = array($url); $handle = fopen($url, "r"); $html = file_get_contents($url); $matches = array(); $status = preg_match_all('@(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?)@', $html, $matches); $unique = array(); foreach($matches[1] as $match) { $found = false; if (in_array($match, $banned)) { $found = true; } foreach($unique as $u) { if($u == $match) { $found = true; break; } } if(!$found) { array_push($unique, $match); } } foreach ($unique as $link) { echo $link . "<br />"; } Please help me fix this problem. Thanks. Quote Link to comment https://forums.phpfreaks.com/topic/77160-preg_mtach_all-url-regex-help-need/ Share on other sites More sharing options...
effigy Posted November 13, 2007 Share Posted November 13, 2007 %href=[\'"]?(?:https?://)?([^\s/\'"]+)% Quote Link to comment https://forums.phpfreaks.com/topic/77160-preg_mtach_all-url-regex-help-need/#findComment-390659 Share on other sites More sharing options...
seb hughes Posted November 13, 2007 Author Share Posted November 13, 2007 %href=[\'"]?(?:https?://)?([^\s/\'"]+)% $status = preg_match_all(%href=[\'"]?(?:https?://)?([^\s/\'"]+)%, $html, $matches); I get a parse error. Quote Link to comment https://forums.phpfreaks.com/topic/77160-preg_mtach_all-url-regex-help-need/#findComment-390666 Share on other sites More sharing options...
effigy Posted November 13, 2007 Share Posted November 13, 2007 Patterns are strings--single quote it. Quote Link to comment https://forums.phpfreaks.com/topic/77160-preg_mtach_all-url-regex-help-need/#findComment-390670 Share on other sites More sharing options...
seb hughes Posted November 13, 2007 Author Share Posted November 13, 2007 I got it to work, but it shows page names like image.php?dfdf=dfsdkfdkfkd which it should'nt do. Quote Link to comment https://forums.phpfreaks.com/topic/77160-preg_mtach_all-url-regex-help-need/#findComment-390671 Share on other sites More sharing options...
effigy Posted November 13, 2007 Share Posted November 13, 2007 Is "www." always a requirement? There are "ww2."s I think, along with a variety of domain endings. Quote Link to comment https://forums.phpfreaks.com/topic/77160-preg_mtach_all-url-regex-help-need/#findComment-390676 Share on other sites More sharing options...
seb hughes Posted November 13, 2007 Author Share Posted November 13, 2007 Is "www." always a requirement? There are "ww2."s I think, along with a variety of domain endings. all I need it to do is get things in between <a href = "whatisinhere"> but it has to trim it to the domain name so www.doming.com hello.php.net or hwello.hello.php.net it can't have www.domain.com/what_ever_else_is_here.html If it doesnt have www. then its even better. Quote Link to comment https://forums.phpfreaks.com/topic/77160-preg_mtach_all-url-regex-help-need/#findComment-390681 Share on other sites More sharing options...
seb hughes Posted November 13, 2007 Author Share Posted November 13, 2007 How can I have it so I can get the following: domain.tld. From a "a href" tag. This has been driving me crazy for DAYS. Quote Link to comment https://forums.phpfreaks.com/topic/77160-preg_mtach_all-url-regex-help-need/#findComment-390793 Share on other sites More sharing options...
effigy Posted November 13, 2007 Share Posted November 13, 2007 The challenge is that domain extensions can look like file extensions. Is "image.php" a domain? No--we can see that, but the computer cannot unless we give it specific direction. Here are just a few examples. Quote Link to comment https://forums.phpfreaks.com/topic/77160-preg_mtach_all-url-regex-help-need/#findComment-390863 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.