kyleabaker Posted December 4, 2013 Share Posted December 4, 2013 I'm trying to write regex to take an input string and return results of each anchor tag that is found. For example, in the following string, it should return 3 results: This is an test <a href="link1.html" data-modal="sdfdsf87ds87fdsf8bds8fb">string</a> example to <a href="link2.html">parse</a> and return <a class="someclass" href="link3.html">some</a> anchor tag results. My expected results are: 1. <a href="link1.html" data-modal="sdfdsf87ds87fdsf8bds8fb">string</a> 2. <a href="link2.html">parse</a> 3. <a class="someclass" href="link3.html">some</a> I'm trying to test this at http://regexpal.com/ and the problem I'm seeing is that my regex ( <a (.+)[^<]*</a> ) is selecting everything from the start of the first anchor tag to the end of the last anchor tag and I can't seem to figure out how to split these apart. Any suggestions so it returns each tag as a separate result in the match array? Thanks in advance! Quote Link to comment Share on other sites More sharing options...
dalecosp Posted December 4, 2013 Share Posted December 4, 2013 Try spaces, newlines, etc?Or, perhaps, use something like DOMDocument to read the HTML instead of a regexp. Quote Link to comment Share on other sites More sharing options...
requinix Posted December 4, 2013 Share Posted December 4, 2013 (edited) Or, perhaps, use something like DOMDocument to read the HTML instead of a regexp.That. Very that. Not only are regular expressions the wrong tool for dealing with HTML, DOMDocument is actually better at doing what you want. getElementsByTagName Edited December 4, 2013 by requinix Quote Link to comment Share on other sites More sharing options...
dalecosp Posted December 4, 2013 Share Posted December 4, 2013 Yeah, that's what I use; he didn't say if it was a requirement ... never can tell when people are doing coursework ;) Quote Link to comment Share on other sites More sharing options...
Solution .josh Posted December 5, 2013 Solution Share Posted December 5, 2013 I agree that in general a DOM parser would be better for general DOM parsing/manipulation, but regex isn't a bad alternative if what you are looking for is regular. If that is all you want, this regex should work ($anchors will hold the results): preg_match_all('~<a\s+.*?</a>~is',$string,$anchors); If however you want to parse individual attributes or just the "text" of the anchor etc. then using a DOM parser would definitely be better. since you are using regex buddy, <a\s+.*?</a> is the actual pattern and is are modifiers for making it case-insensitive (i) and also allowing the dot to match newline chars (s), in the event that the "text" inside the anchor tags has newline chars (so IOW make sure to add those flags in regex buddy) Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.