Xeoncross Posted August 6, 2007 Share Posted August 6, 2007 I would like to create a function (or two) that takes input and pulls the URL out of each link and then replaces the link with the plain URL. Later in the script I want to change the URL back to a link but this time with a short version of the URL as the link text. This is my text with a link to <a href="http://mysite.com/mypage.html">This is my site</a> to This is my text with a link to http://mysite.com/mypage.html then finally to: This is my text with a link to <a href="http://mysite.com/mypage.html">http://mysite.com/my...</a> Right now I have the middle part taken care of (thanks to php.net) <?php function hyperlink($text) { // match protocol://address/path/ $text = ereg_replace("[a-zA-Z]+://([-]*[.]?[a-zA-Z0-9_/-?&%])*", "<a href=\"\\0\">\\0</a>", $text); //$text = ereg_replace("[a-zA-Z]+://([-]*[.]?[a-zA-Z0-9_/-?&%])*", "<a href=\"\\0\">". shorten_word('\\0', 5, '...')."</a>", $text); // match www.something $text = ereg_replace("(^| )(www([-]*[.]?[a-zA-Z0-9_/-?&%])*)", "\\1<a href=\"http://\\2\">\\2</a>", $text); return $text; } ?> This will turn URL's into links but how do I start off pulling urls out of links and leaving just the URL? Also, in this code I tried a to use a shorten_word() function but it didn't work so I comment it out. Anyone know how I can get something like that to work as well? like http://us3.php.net/manual/en/function.substr.php or something? Link to comment https://forums.phpfreaks.com/topic/63622-pulling-urls-from-links-then-turning-them-back/ Share on other sites More sharing options...
pplexr Posted August 6, 2007 Share Posted August 6, 2007 <?php $pat = '/(<a href="([\w\W]*?)">([\w\W]*?)<\/a>)/'; $content=' This is my text with a link to <a href="http://mysite.com/mypage.html">This is my site</a> '; if(preg_match_all($pat,$content,$matches,PREG_SET_ORDER)) { foreach ($matches as $match) { $content=str_replace($match[3],$match[2],$content); } echo $content; } ?> Result This is my text with a link to <a href="http://mysite.com/mypage.html">http://mysite.com/mypage.html</a> is that what you want? Link to comment https://forums.phpfreaks.com/topic/63622-pulling-urls-from-links-then-turning-them-back/#findComment-317043 Share on other sites More sharing options...
Xeoncross Posted August 8, 2007 Author Share Posted August 8, 2007 Not quite - but it is another good start. The goal is a way to clean out XSS and extra stuff from links that users submit. So that is why I want to pull the link URL out of the link. Then I can clean everything else and when I am done I can turn the URL back into a link. <a class="myclass" href="/">This is a link</a> This made it through the filter which is bad. I tried fixing the code - but I am having trouble: <?php //$pat = '/(<a href="([\w\W]*?)">([\w\W]*?)<\/a>)/'; $pat = '/(<a(.*)href="([\w\W]*?)"(.*)>([\w\W]*?)<\/a>)/'; $content='This is my text with a link to <a href="http://mysite.com/mypage.html">This is my site</a>'. '<a class="myclass" href="/">This is it</a>'; if(preg_match_all($pat,$content,$matches,PREG_SET_ORDER)) { foreach ($matches as $match) { $content=str_replace($match[1],$match[3]. ':::'. $match[5],$content); } echo $content; } ?> I was hoping I could get the above to work like this: <a href="http://site.com">Read</a> <a class="myclass" href="this.html" target="_blank">this</a> page. to http://site.com:::Read this.html:::this page. which I could change back to <a href="http://site.com">Read</a> <a href="this.html">this</a> page. when I was done with the other cleaning functions. http://www.ilovejackdaniels.com/regular_expressions_cheat_sheet.png Link to comment https://forums.phpfreaks.com/topic/63622-pulling-urls-from-links-then-turning-them-back/#findComment-318669 Share on other sites More sharing options...
effigy Posted August 8, 2007 Share Posted August 8, 2007 <pre> <?php $tests = array( 'This is my text with a link to <a href="http://mysite.com/mypage.html">This is my site</a>', '<a class="myclass" href="/"><b>This is a link</b></a>', '<a href="http://site.com">Read</a> <a class="myclass" href="this.html" target="_blank">this</a> page.', ); foreach ($tests as $test) { $test = strip_tags($test, '<a>'); preg_match_all('#<a[^>]+href="(.+?)"[^>]*>(.*?)</a>#', $test, $matches, PREG_SET_ORDER); print_r($matches); foreach ($matches as $match) { echo '<a href="' . $match[1] . '">' . $match[2] . '</a> '; } echo '<br>'; } ?> </pre> Link to comment https://forums.phpfreaks.com/topic/63622-pulling-urls-from-links-then-turning-them-back/#findComment-318692 Share on other sites More sharing options...
Xeoncross Posted August 8, 2007 Author Share Posted August 8, 2007 Ok, your code really helped. I just reworked it into this: <?php $text = 'This is my text with a link to <a href="http://mysite.com/mypage.html">This is my site</a><br />'. "\n". '<a class="myclass" href="/"><b>This is a link</b></a><br />'. "\n". '<a href="">target<a href="site.html">Link</a> link</a><br />'. "\n". '<a href="javascript:alert(\'XSS\')">javascript:alert(\'XSS\')</a>'. "\n". '<a href="http://site.com">Read</a> <a class="myclass" href="this.html" target="_blank">this</a> page.<br />'; function clean_links($text) { preg_match_all('#(<a[^>]+href="(.+?)"[^>]*>(.*?)</a>)#', $text, $matches, PREG_SET_ORDER); //print_r($matches); foreach ($matches as $match) { $text = str_replace($match[0], '['. htmlentities($match[2]. '::::'. $match[3], ENT_QUOTES, 'UTF-8'). ']', $text); } return $text; } $text = htmlentities(strip_tags(clean_links($text)), ENT_QUOTES, 'UTF-8'); //Now turn our "URL::::LINKTEXT" into links (DOESN'T WORK!) $text_with_links = ereg_replace("(\[([a-zA-Z0-9_/-?&%:]*):::[a-zA-Z0-9_/-?&%:]*)\])*", "<a href=\"\\2\">\\3</a>", $text); print "<pre>$text</pre>\n\n\n<br /><br /><pre>$text_with_links</pre>"; ?> However, I am not able to change links from [url::::LINKTEXT] back into regular links. Link to comment https://forums.phpfreaks.com/topic/63622-pulling-urls-from-links-then-turning-them-back/#findComment-318845 Share on other sites More sharing options...
Xeoncross Posted August 8, 2007 Author Share Posted August 8, 2007 This works better - but only for the last link: <?php $text_with_links = ereg_replace("(\[([A-Za-z0-9\.]*):::.*)\])", "<a href=\"\\2\">\\3</a>", $text_with_links); ?> Link to comment https://forums.phpfreaks.com/topic/63622-pulling-urls-from-links-then-turning-them-back/#findComment-318854 Share on other sites More sharing options...
effigy Posted August 9, 2007 Share Posted August 9, 2007 How do you want to handle nested links? Link to comment https://forums.phpfreaks.com/topic/63622-pulling-urls-from-links-then-turning-them-back/#findComment-319356 Share on other sites More sharing options...
Xeoncross Posted August 15, 2007 Author Share Posted August 15, 2007 If there is a nested link - I guess I would just want to delete it (unless their was an easy way to change it back into 2+ links.) Link to comment https://forums.phpfreaks.com/topic/63622-pulling-urls-from-links-then-turning-them-back/#findComment-324738 Share on other sites More sharing options...
Xeoncross Posted September 5, 2007 Author Share Posted September 5, 2007 Bump So if someone can't do the above - how about just checking links with regex to make sure nothing like this gets by: <a href="javascript:alert('XSS')">javascript:alert('XSS')</a> <a href="this.com"><a href="site.com">This</a>site</a> <a href="site.com" STYLE="background-image: url(javascript:alert('XSS'))">site.com</a> That is what I wanted to do with the original code anyway... Link to comment https://forums.phpfreaks.com/topic/63622-pulling-urls-from-links-then-turning-them-back/#findComment-342421 Share on other sites More sharing options...
effigy Posted September 6, 2007 Share Posted September 6, 2007 Which links are bad? Is "this.com" wrong because it doesn't have "http://"? Or better yet, which links should be allowed? Link to comment https://forums.phpfreaks.com/topic/63622-pulling-urls-from-links-then-turning-them-back/#findComment-343007 Share on other sites More sharing options...
Xeoncross Posted September 6, 2007 Author Share Posted September 6, 2007 I don't mind - if someone wants to make a link to "/" or "invalid-URL.sud.cudjd.ud.sud.duf.uk" I could care-a-less - my spam will catch that. All I want is to keep XSS out of my links - wither it is by pulling the URL and LINKTEXT out of a the post and them turning it back into a link later - or by just using regex to make sure links don't have extra stuff in them (like the three in my last post). Either way I don't care. Link to comment https://forums.phpfreaks.com/topic/63622-pulling-urls-from-links-then-turning-them-back/#findComment-343056 Share on other sites More sharing options...
effigy Posted September 6, 2007 Share Posted September 6, 2007 How about using a pattern to mask valid links, then using strip_tags to get rid of anything you missed? You can use (?!javascript:) to avoid the JS and (?:.(?!style=))+ as a filler between a attributes. Link to comment https://forums.phpfreaks.com/topic/63622-pulling-urls-from-links-then-turning-them-back/#findComment-343090 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.