phreek Posted April 8, 2008 Share Posted April 8, 2008 Howdy I've found several sites talking about regex to parse HTML but none that are close enough to what I need for me to make work. (regex Noob here) I will be parsing a HTML file which should valid and if it is not then someone else needs to fix it. So I can assume valid XHTML. What I need is to find Every Tag with a ID element and pull the ID name out so I can compare that to a list I will have. I do not need access to the HTML between tags I wont be touching it. At most I may need to Echo something inside the tag with the ID but it would be positioned at the very beginning of the tag. To be as descriptive as possible here is a small example: the HTML file contains ".... <div id="one"> here is some text</div> ..." I need the value of the ID name and the ability to reference the end of the opening tag so I can stick something in before the "here is some text" part. Regex may not be the best/fastest way to handle this if not feel free to let me know. I just decided to try it this way to learn a little more about Regular expressions. Thanks in advance guys Quote Link to comment Share on other sites More sharing options...
effigy Posted April 8, 2008 Share Posted April 8, 2008 <pre> <?php $str = <<<DATA <div id="one"> here is some text</div> <div id="two"> here is some other text</div> DATA; echo htmlspecialchars(preg_replace_callback('/(<[^>]+id="(.+?)"[^>]*)>/', 'html_id', $str)); function html_id ($matches) { $id = $matches[2]; if ($id == 'one') { return $matches[1] . ' attr="value">'; } else { return $matches[0]; } } ?> </pre> Quote Link to comment Share on other sites More sharing options...
phreek Posted April 8, 2008 Author Share Posted April 8, 2008 Thank you very much. That pointed me where I needed to go. Just to make sure my understanding of Reg Ex is solid so far.. changing '/(<[^>]+id="(.+?)"[^>]*)>/' - to - '/(<[^>]+id\s*?=\s*?"(.+?)"[^>]*)>/' should make it match any amount of spaces between id and the equal sign as well as those between the equal sign and the first quotation mark.. correct? Thanks again. Quote Link to comment Share on other sites More sharing options...
effigy Posted April 8, 2008 Share Posted April 8, 2008 Correct, but there's no need for laziness in this case; just use \s*. The \s is specific enough to be safe for greedy matching. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.