Jump to content


Trouble with Regular expresspions

  • Please log in to reply
3 replies to this topic

#1 iamX

  • New Members
  • Pip
  • Newbie
  • 3 posts

Posted 13 March 2006 - 01:59 PM

Hay there,

Hope someone can help me with the following.

I'm creating a script that indexes links on pages to create easy sitemaps.
I have already stripped tags except <a>

and I use preg_match_all to collect all the links with the following:

$matched = preg_match_all("{m=more&id=(.*?)>(.*?)</a>}", $txtonly, $match);

but now the problem.:
Some links have a graphic before them with the same m=more&id=###> link.

But since the <img> tags has been stripped it's empty so when rebuild it results in:
<a href=sitemap.php?m=more&id=###></a> and an
<a href=sitemap.php?m=more&id=###>Link name</a>

I tried replacing the second (.*?) in (.+?) but then the result is:
<a href=sitemap.php?m=more&id=###></a><a href=sitemap.php?m=more&id=###>Link name</a>
(double result)

No result when matching for [a-zA-Z0-9]

Am I missing something or approaching this the wrong way?

#2 wickning1

  • Members
  • PipPipPip
  • Advanced Member
  • 405 posts

Posted 13 March 2006 - 02:26 PM

$matched = preg_match_all("{m=more&id=(.*?)>([^<]+?)</a>}", $txtonly, $match);

Try that.

#3 iamX

  • New Members
  • Pip
  • Newbie
  • 3 posts

Posted 13 March 2006 - 03:59 PM

[!--quoteo(post=354497:date=Mar 13 2006, 03:26 PM:name=wickning1)--][div class=\'quotetop\']QUOTE(wickning1 @ Mar 13 2006, 03:26 PM) View Post[/div][div class=\'quotemain\'][!--quotec--]
$matched = preg_match_all("{m=more&id=(.*?)>([^<]+?)</a>}", $txtonly, $match);

Try that.

Thanks for replying wicknick1 :)

unfortunatly I still get the double results.
so one without linkname (where the <img> was) and one with the correct link as following:

<a href=sitemap.php?m=more&id=###></a><a href=sitemap.php?m=more&id=###>Linkname</a>

Maybe a way to seperate id=### and the link name?
So when an ID is the same, it only results one ID ?

#4 iamX

  • New Members
  • Pip
  • Newbie
  • 3 posts

Posted 13 March 2006 - 05:51 PM

Aah I found it!

The greedy first (.*?) was the troublemaker.

My code came out like this:

$matched = preg_match_all("{m=more&id=([A-Z0-9]+?)>([^<]+?)</a>}", $txtonly, $match);

No double Href's anymore :)

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users