Jump to content

Using REGEX to harvest links in HTML


Rottingham

Recommended Posts

Hello

 

I have a replace_links() function that searches HTML source and harvests all the links into an array. I'm finishing eNewsletter software and have a few last bugs to work out.

 

My regex is:

if( !preg_match_all("/<a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a>/siU", $template_source, $matches, PREG_SET_ORDER ) ) {
	echo "      Failed to find any matches using regex: /<a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a>/siU.<br>";
	return;
}

 

This works beautifully! I then add some code to change the link to point to a click tracker and insert it back into the HTML.

 

My dilemma is that if the link includes any white spaces the regex fails. IE if I did a mailto link like this:

 

<a href="mailto:[email protected]?subject=Thanks for visiting">Mail me</a>

 

The regex fails. If I place any character in the whitespace, like an underscore it works fine. How can I include whitespace in that regex line? I'm not very good at this stuff!

 

Thanks for any help.

Link to comment
https://forums.phpfreaks.com/topic/223102-using-regex-to-harvest-links-in-html/
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.