slushpuppie Posted November 10, 2009 Share Posted November 10, 2009 i have a variable called $raw, which is a string of html code i pull in from another source. however i need to make any relative links like: <a href="pageOne.html">Page One</a> into complete pathed links, like: <a href="http://www.domain.com/pageOne.html">Page One</a> right now i am doing this: $patterns[0] = '/<a href="/'; $replacements[0] = '<a href="http://www.domain.com'; $raw = preg_replace($patterns, $replacements, $raw); which IS working, however i'm sure any of you looking at this can see it's inherent flaws... like if the link markup is: <a style="color:red;" href="pageTwo">Page Two</a> my pattern would not catch that. it will also put http://www.domain.com at the start of any link that may already start with a domain. what i need is a patern that would find any link that does not have http:// or https:// at the beginning of the href, and would put http://www.domain.com at the start of it, while leaving any other attributes of the a tag along. the $raw variable is coming from a consistent source, so i know that just adding the http://www.domain.com to the start of the hrefs will do what i want it to do, with that the path will be complete. any help or insight would be GREATLY appreciated. thank you. Quote Link to comment https://forums.phpfreaks.com/topic/181028-looking-for-help-making-links-complete-paths/ Share on other sites More sharing options...
.josh Posted November 10, 2009 Share Posted November 10, 2009 $raw = preg_replace('~href\s?=\s?"((?!https?://)[^"]*)"~i','href="http://www.domain.com/$1"',$raw); Quote Link to comment https://forums.phpfreaks.com/topic/181028-looking-for-help-making-links-complete-paths/#findComment-955156 Share on other sites More sharing options...
slushpuppie Posted November 10, 2009 Author Share Posted November 10, 2009 seriously... it's like you regex guys are not human... i can never comprehend how your patterns work, but they always seem to... thank you. Quote Link to comment https://forums.phpfreaks.com/topic/181028-looking-for-help-making-links-complete-paths/#findComment-955184 Share on other sites More sharing options...
nrg_alpha Posted November 10, 2009 Share Posted November 10, 2009 seriously... it's like you regex guys are not human... i can never comprehend how your patterns work, but they always seem to... thank you. You can always have a look at these starter links to help kickstart the learning process (it's not as bad as it looks... everyone starts somewhere, and indeed, regex is daunting at first.. but more easily 'tame-able' then you think - much like other aspects of programming.. just takes some time and practice): http://www.phpfreaks.com/tutorial/regular-expressions-part1---basic-syntax http://www.regular-expressions.info/ http://weblogtoolscollection.com/regex/regex.php Obviously, google will give even more results, but these should be more than enough to get you started. And obviously, if you're stuck, the non-human regex members here can also help you out along the learning journey too! Quote Link to comment https://forums.phpfreaks.com/topic/181028-looking-for-help-making-links-complete-paths/#findComment-955266 Share on other sites More sharing options...
thebadbad Posted November 10, 2009 Share Posted November 10, 2009 If you're looking for a more robust way of translating relative paths to absolute paths, there's a function at http://w-shadow.com/blog/2007/07/16/how-to-extract-all-urls-from-a-page-using-php/. A way to use it: <?php function relative2absolute($absolute, $relative) { $p = @parse_url($relative); if(!$p) { //$relative is a seriously malformed URL return false; } if(isset($p["scheme"])) return $relative; $parts=(parse_url($absolute)); if(substr($relative,0,1)=='/') { $cparts = (explode("/", $relative)); array_shift($cparts); } else { if(isset($parts['path'])){ $aparts=explode('/',$parts['path']); array_pop($aparts); $aparts=array_filter($aparts); } else { $aparts=array(); } $rparts = (explode("/", $relative)); $cparts = array_merge($aparts, $rparts); foreach($cparts as $i => $part) { if($part == '.') { unset($cparts[$i]); } else if($part == '..') { unset($cparts[$i]); unset($cparts[$i-1]); } } } $path = implode("/", $cparts); $url = ''; if($parts['scheme']) { $url = "$parts[scheme]://"; } if(isset($parts['user'])) { $url .= $parts['user']; if(isset($parts['pass'])) { $url .= ":".$parts['pass']; } $url .= "@"; } if(isset($parts['host'])) { $url .= $parts['host']."/"; } $url .= $path; return $url; } $raw = preg_replace_callback( '~\b(href|src)\s?=\s?([\'"])(.+?)\2~is', create_function( '$matches', 'return $matches[1] . \'=\' . $matches[2] . relative2absolute(\'http://www.domain.com/\', $matches[3]) . $matches[2];' ), $raw ); ?> Quote Link to comment https://forums.phpfreaks.com/topic/181028-looking-for-help-making-links-complete-paths/#findComment-955275 Share on other sites More sharing options...
.josh Posted November 10, 2009 Share Posted November 10, 2009 there's some limitations to the regex i supplied. 1) it assumes your href attrib is wrapped in double quotes. 2) if there are nested double quotes inside it (escaped, like if there's some js in there...) it's gonna break 3) if you have a relative path with a leading / it's going to replace as http://www.site.com//blahblah (which won't actually break the url, but just thought i'd mention it) Quote Link to comment https://forums.phpfreaks.com/topic/181028-looking-for-help-making-links-complete-paths/#findComment-955277 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.