Jump to content

regex: replace links captions by url


gdfhghjdfghgfhf

Recommended Posts

Hello,

 

i have a string that looks like this (a lot of different links with different attributes)

 

<a href="http://www.mediafire.com/something">http://www.mediafire.com/something</a>
text text text text text text text text text text text 
<a href="http://www.somelink.com">
text text text text text text text 
<a href="http://megaupload.com/something/blabla">Some link without WWW</a>
text text text text text text text text text text text text text text 
<a title="somename" href="http://www.rapidshare.com/download" style="color:#000000" target="_blank">some confusing link</a>
...........more text here..........
<a href="http://www.microsoft.com">
....more text...........
<a href="http://www.4shared.com/download.php?file=myfile"><img src="a_link_with_an_image.gif"></a>

 

I'm trying to replace the links and change the captions with the URL of the link.... BUT only for links pointing to these sites:

mediafire.com, megaupload.com, rapidshare.com and 4shared.com

 

Additionally, i want to add [DL][/DL] tags between the replaced links

 

the links can come with or without WWW and with or without additional attributes (like Style, target, title, etc...)

 

Here is the result i would want with the above example:

[DL]<a href="http://www.mediafire.com/something">http://www.mediafire.com/something</a>[/DL]
text text text text text text text text text text text 
<a href="http://www.somelink.com">link</a>
text text text text text text text 
[DL]<a href="http://megaupload.com/something/blabla">http://megaupload.com/something/blabla</a>[/DL]
text text text text text text text text text text text text text text 
[DL]<a title="somename" href="http://www.rapidshare.com/download" style="color:#000000" target="_blank">http://www.rapidshare.com/download</a>[/DL]
...........more text here..........
<a href="http://www.microsoft.com">
....more text...........
[DL]<a href="http://www.4shared.com/download.php?file=myfile">http://www.4shared.com/download.php?file=myfile</a>[/DL]

 

Thanks a LOT to whoever can help me with this! I'm newbie to regex and i gave up after a lot of searchs :(

Link to comment
https://forums.phpfreaks.com/topic/180628-regex-replace-links-captions-by-url/
Share on other sites

Hmm... difficult to say, is that input and output definately correct or is it an example you quickly knocked up? The reason I ask is line 3 of the input is not a valid anchor link, not is the microsoft one further down. With regex this can make a big difference since the obvious way to match your pattern will require searching for the '</a>' part.

<?php
$test = '<a href="http://www.mediafire.com/something">http://www.mediafire.com/something</a>
text text text text text text text text text text text 
<a href="http://www.somelink.com">
text text text text text text text 
<a href="http://megaupload.com/something/blabla">Some link without WWW</a>
text text text text text text text text text text text text text text 
<a title="somename" href="http://www.rapidshare.com/download" style="color:#000000" target="_blank">some confusing link</a>
...........more text here..........
<a href="http://www.microsoft.com">
....more text...........
<a href="http://www.4shared.com/download.php?file=myfile"><img src="a_link_with_an_image.gif"></a>';
$patern ='#(<a\s[^>]*href\="([^"]*(mediafire\.com|rapidshare\.com|megaupload\.com|4shared\.com)[^"]*)"[^>]*>).*?</a>#';
echo preg_replace($patern, '[DL]$1$2</a>[/DL]', $test);
?>

Sorry, my bad, \1 assumed ("|\') was pattern 1, I didn't see the opening bracket at the start. It should have been \2. Which means you will also need to change the $2 in the replace pattern to $3 as it's been moved up one. Looking at the pattern though I'm not convinced it will work as their are no lasy quantifiers I think it may grab multiple urls in one.

I would probably parse each URL with parse_url(), for more reliable results:

 

<?php
$str = '<a href="http://www.mediafire.com/something">http://www.mediafire.com/something</a>
text text text text text text text text text text text
<a href="http://www.somelink.com"></a>
text text text text text text text
<a href="http://megaupload.com/something/blabla">Some link without WWW</a>
text text text text text text text text text text text text text text
<a title="somename" href="http://www.rapidshare.com/download" style="color:#000000" target="_blank">some confusing link</a>
...........more text here..........
<a href="http://www.microsoft.com"></a>
....more text...........
<a href="http://www.4shared.com/download.php?file=myfile"><img src="a_link_with_an_image.gif"></a>';
function _callback($matches) {
$domains = array('mediafire.com', 'megaupload.com', 'rapidshare.com', '4shared.com');
$domain = parse_url($matches[2], PHP_URL_HOST);
//remove any sub domains
$parts = array_reverse(explode('.', $domain));
$domain = "{$parts[1]}.{$parts[0]}";
if (in_array($domain, $domains)) {
	$matches[0] = "[DL]{$matches[0]}[/DL]";
}
return $matches[0];
}
$str = preg_replace_callback('~<a\b[^>]+\bhref\s?=\s?([\'"])(.+?)\1[^>]*>.*?</a>~is', '_callback', $str);
echo $str;
?>

 

If you add a domain with a double TLD (e.g. .co.uk) to the $domains array, you would have to rewrite the code.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.