Jump to content


This topic is now archived and is closed to further replies.


Trouble with Regular expresspions

Recommended Posts

Hay there,

Hope someone can help me with the following.

I'm creating a script that indexes links on pages to create easy sitemaps.
I have already stripped tags except <a>

and I use preg_match_all to collect all the links with the following:

$matched = preg_match_all("{m=more&id=(.*?)>(.*?)</a>}", $txtonly, $match);

but now the problem.:
Some links have a graphic before them with the same m=more&id=###> link.

But since the <img> tags has been stripped it's empty so when rebuild it results in:
<a href=sitemap.php?m=more&id=###></a> and an
<a href=sitemap.php?m=more&id=###>Link name</a>

I tried replacing the second (.*?) in (.+?) but then the result is:
<a href=sitemap.php?m=more&id=###></a><a href=sitemap.php?m=more&id=###>Link name</a>
(double result)

No result when matching for [a-zA-Z0-9]

Am I missing something or approaching this the wrong way?

Share this post

Link to post
Share on other sites
$matched = preg_match_all("{m=more&id=(.*?)>([^<]+?)</a>}", $txtonly, $match);

Try that.

Share this post

Link to post
Share on other sites
[!--quoteo(post=354497:date=Mar 13 2006, 03:26 PM:name=wickning1)--][div class=\'quotetop\']QUOTE(wickning1 @ Mar 13 2006, 03:26 PM) [snapback]354497[/snapback][/div][div class=\'quotemain\'][!--quotec--]
$matched = preg_match_all("{m=more&id=(.*?)>([^<]+?)</a>}", $txtonly, $match);

Try that.

Thanks for replying wicknick1 :)

unfortunatly I still get the double results.
so one without linkname (where the <img> was) and one with the correct link as following:

<a href=sitemap.php?m=more&id=###></a><a href=sitemap.php?m=more&id=###>Linkname</a>

Maybe a way to seperate id=### and the link name?
So when an ID is the same, it only results one ID ?

Share this post

Link to post
Share on other sites
Aah I found it!

The greedy first (.*?) was the troublemaker.

My code came out like this:

$matched = preg_match_all("{m=more&id=([A-Z0-9]+?)>([^<]+?)</a>}", $txtonly, $match);

No double Href's anymore :)

Share this post

Link to post
Share on other sites


Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.