Jump to content

Help parsing HTML


phreek

Recommended Posts

Howdy 

 

I've found several sites talking about regex to parse HTML but none that are close enough to what I need for me to make work.  (regex Noob here)

 

I will be parsing a HTML file which should valid and if it is not then someone else needs to fix it.  So I can assume valid XHTML.  What I need is to find Every Tag with a ID element and pull the ID name out so I can compare that to a list I will have.  I do not need access to the HTML between tags I wont be touching it.  At most I may need to Echo something inside the tag with the ID but it would be positioned at the very beginning of the tag. 

 

To be as descriptive as possible here is a small example:  the HTML file contains "....  <div id="one"> here is some text</div> ..."

 

I need the value of the ID name and the ability to reference the end of the opening tag so I can stick something in before the "here is some text" part.

 

Regex may not be the best/fastest way to handle this if not feel free to let me know.  I just decided to try it this way to learn a little more about Regular expressions.

 

Thanks in advance guys

Link to comment
Share on other sites

<pre>
<?php
$str = <<<DATA
<div id="one"> here is some text</div>
<div id="two"> here is some other text</div>
DATA;
echo htmlspecialchars(preg_replace_callback('/(<[^>]+id="(.+?)"[^>]*)>/', 'html_id', $str));
function html_id ($matches) {
	$id = $matches[2];
	if ($id == 'one') {
		return $matches[1] . ' attr="value">';
	}
	else {
		return $matches[0];
	}
}
?>
</pre>

Link to comment
Share on other sites

Thank you very much.

 

That pointed me where I needed to go. Just to make sure my understanding of Reg Ex is solid so far..

 

changing

 '/(<[^>]+id="(.+?)"[^>]*)>/' 

  - to -

 '/(<[^>]+id\s*?=\s*?"(.+?)"[^>]*)>/' 

should make it match any amount of spaces between id and the equal sign as well as those between the equal sign and the first quotation mark.. correct? 

 

Thanks again.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.