RegExp to parse links

lsousa · February 5, 2009

Yes, I know you shouldn't use regexp to parse html but this example could be applied to other things. But anyway, I have this problem:

<?
$input = '<html><a href="whatever">nome1</a><a href="whatever&prodID=isso">nome2</a></html>';

$ret = preg_match_all('|href="(.*?prodID=.*?)</a>|',$input,$matches);

var_export($matches);
?>

I want it to return only the second link, because that's the one that has prodID as an argument, but even with non-greedy search this is the result:

array (
  0 =>
  array (
    0 => 'href="whatever">nome1</a><a href="whatever&prodID=isso">nome2</a>',
  ),
  1 =>
  array (
    0 => 'whatever">nome1</a><a href="whatever&prodID=isso">nome2',
  ),
)

Shoudln't array[1][0] contain only the second link? How can I do the regexp that only returns the URL of the second link: 'whatever&prodID=isso">nome2' ?

Thanks

effigy · February 5, 2009

Your first .*? will capture anything up to prodID, even it has to go beyond another double quote or into another tag.

Try |href="([^"]+prodID=[^"]+)"[^>]*>(.*?)</a>|.

Sign In

RegExp to parse links

Recommended Posts

lsousa

Link to comment

Share on other sites

effigy

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information