ryanh_106 Posted February 29, 2008 Share Posted February 29, 2008 Hi, I am using preg_match_all() I am trying to split the source for a hyperlink into the web address and the text. e.g. <a href="test.php">Test Page</a> link: test.php text: Test Page Can anyone help? Links may contain title="" attributes so I need to account for those, I originally had something like this: preg_match_all('/<a href="(.*)"( title=(.*))?>(.*)<\/a>/', $newhtml, $matches); // Find all links but this caused problems because if there was a title, the index of the link text in $matches moved up one so I couldnt extract it, so i just did this: preg_match_all('/<a (.*)>(.*)<\/a>/', $newhtml, $matches); // Find all links which was fine as I only need to test the link address, I dont need to use it so if it had title="" at the end it didnt matter, but this doesnt work if there is HTML in the link text (such as <strong>) How can I do this effectively? Hope I explained that well enough Please help, im getting really fed up of this! Cheers Quote Link to comment Share on other sites More sharing options...
effigy Posted February 29, 2008 Share Posted February 29, 2008 this doesnt work if there is HTML in the link text (such as <strong>) Use htmlspecialchars or strip_tags? Quote Link to comment Share on other sites More sharing options...
ryanh_106 Posted February 29, 2008 Author Share Posted February 29, 2008 Thanks for the assistance, unfortunately this would take out the <a> tags that I need also. I ended up solving this issue by ditching regex altogether. As follows: - Split html by "<a" - Split each result by "</a>" - Finally split by ">" This left me roughly with the link to be tested and the text to be dumped to screen. Pretty messy but it works and this is urgent! thanks for the help anyway! Quote Link to comment Share on other sites More sharing options...
effigy Posted February 29, 2008 Share Posted February 29, 2008 So there are <a> tags within <a> tags? Quote Link to comment Share on other sites More sharing options...
dsaba Posted February 29, 2008 Share Posted February 29, 2008 <a href..> tags can surround more than just plain text, including other html tags and images from your specifications, here's the regex: ~<a(??!href=).)*href=(['"])(?P<href>(??!\1).)*)['"][^>]*>(?P<text>[^<]*)</a>~is http://nancywalshee03.freehostia.com/regextester/regex_tester.php?seeSaved=cd4p3rtz Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.