Jump to content

Regex to match and return all anchor tags in a string


kyleabaker

Recommended Posts

I'm trying to write regex to take an input string and return results of each anchor tag that is found. For example, in the following string, it should return 3 results:

 

 

This is an test <a href="link1.html" data-modal="sdfdsf87ds87fdsf8bds8fb">string</a> example to <a href="link2.html">parse</a> and return <a class="someclass" href="link3.html">some</a> anchor tag results.

 

My expected results are:

1. <a href="link1.html" data-modal="sdfdsf87ds87fdsf8bds8fb">string</a>

2. <a href="link2.html">parse</a>

3. <a class="someclass" href="link3.html">some</a>

 

 

I'm trying to test this at http://regexpal.com/ and the problem I'm seeing is that my regex ( <a (.+)[^<]*</a> ) is selecting everything from the start of the first anchor tag to the end of the last anchor tag and I can't seem to figure out how to split these apart.

 

Any suggestions so it returns each tag as a separate result in the match array?

 

Thanks in advance!

Or, perhaps, use something like DOMDocument to read the HTML instead of a regexp.

That. Very that.

 

Not only are regular expressions the wrong tool for dealing with HTML, DOMDocument is actually better at doing what you want.

 

getElementsByTagName

I agree that in general a DOM parser would be better for general DOM parsing/manipulation, but regex isn't a bad alternative if what you are looking for is regular. If that is all you want, this regex should work ($anchors will hold the results):

 

preg_match_all('~<a\s+.*?</a>~is',$string,$anchors);
If however you want to parse individual attributes or just the "text" of the anchor etc. then using a DOM parser would definitely be better.

 

since you are using regex buddy, <a\s+.*?</a> is the actual pattern and is are modifiers for making it case-insensitive (i) and also allowing the dot to match newline chars (s), in the event that the "text" inside the anchor tags has newline chars (so IOW make sure to add those flags in regex buddy)

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.