Jump to content

regex to parse html code


nasser bahaj

Recommended Posts

hi all

 

I am developing a code can put links at some words .

 

the code look like this :

$html='
<b> download firefox program now from this link <a href="http://www.download8.com/firefox">download firefox program now</a><b><br>';
echo $html;
$words2=array("firefox","program");
$links2=array("http://www.firefox.com","http://www.soft.com");
for ($i=0;$i<count($words2);$i++){
$words[]="/  ".$words2[$i]."  /i";
$links[]=" <a href='$links2[$i]' style='color:red;'>$words2[$i]</a> ";
}
$count=5;
$html=preg_replace($words,$links,$html,$count);
echo $html;

 

now ,

 

this code is working well but if one of words ("firefox","program") already inside link this will damage the original link because it put link inside link .

 

for example this html code

$html='
<b> download firefox program now from this link <a href="http://www.download8.com/firefox">download firefox program now</a><b><br>';

 

I want to put this url "http://www.firefox.com" on "firefox" word but I want first to check if this word are already link (already inside <a></a> tags) , to avoid damage of existing links in the code .

 

I hope your help please

 

thank you .

Link to comment
Share on other sites

Based on your current code you could use a simple negative look behind assertion to check if the word isn't preceded by www. or http://, note I also encased used the \b escape sequence to ensure it only matches at word end so that 'programmer' wouldn't match.

 

#(?<!www\.|http://)\b{$word}\b#'

 

Of course I based this off of your code, which doesn't actually use anchor links, the solution would need to be different in a more complex system.

Link to comment
Share on other sites

thank you mr cags for your reply

 

problem still unsolved

 

for more clear example

 

this code

 

$html='
<b> download firefox program now from this link <a href="http://www.download8.com/firefox">download firefox program now</a><b><br>';

 

now we have "firefox" word three times in code

 

I want to put these words into anchor link "<a href='http://www.firefox.com'>firefox</a>" but  I dont want put all "firefox" words into anchor link because some words already inside anchor link  like this "a href="http://www.download8.com/firefox">download firefox program now</a>" if I put link on this word it will destroy the existing link .

 

then the solve of this problem that is to replace words by links if the word not already inside anchor link.

 

thanks

 

Link to comment
Share on other sites

As I already said in my previous post, if it was more complex a more complex solution would be required. Perhaps something like this would be more appropriate...

 

'#\bfirefox\b(?![^<]*>|[^<]*</a)#'

 

 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.