iamX Posted March 13, 2006 Share Posted March 13, 2006 Hay there, Hope someone can help me with the following. I'm creating a script that indexes links on pages to create easy sitemaps.I have already stripped tags except <a> and I use preg_match_all to collect all the links with the following:$matched = preg_match_all("{m=more&id=(.*?)>(.*?)</a>}", $txtonly, $match);but now the problem.:Some links have a graphic before them with the same m=more&id=###> link.But since the <img> tags has been stripped it's empty so when rebuild it results in:<a href=sitemap.php?m=more&id=###></a> and an <a href=sitemap.php?m=more&id=###>Link name</a>I tried replacing the second (.*?) in (.+?) but then the result is:<a href=sitemap.php?m=more&id=###></a><a href=sitemap.php?m=more&id=###>Link name</a>(double result)No result when matching for [a-zA-Z0-9]Am I missing something or approaching this the wrong way? Quote Link to comment Share on other sites More sharing options...
wickning1 Posted March 13, 2006 Share Posted March 13, 2006 $matched = preg_match_all("{m=more&id=(.*?)>([^<]+?)</a>}", $txtonly, $match);Try that. Quote Link to comment Share on other sites More sharing options...
iamX Posted March 13, 2006 Author Share Posted March 13, 2006 [!--quoteo(post=354497:date=Mar 13 2006, 03:26 PM:name=wickning1)--][div class=\'quotetop\']QUOTE(wickning1 @ Mar 13 2006, 03:26 PM) [snapback]354497[/snapback][/div][div class=\'quotemain\'][!--quotec--]$matched = preg_match_all("{m=more&id=(.*?)>([^<]+?)</a>}", $txtonly, $match);Try that.[/quote]Thanks for replying wicknick1 :)unfortunatly I still get the double results. so one without linkname (where the <img> was) and one with the correct link as following:<a href=sitemap.php?m=more&id=###></a><a href=sitemap.php?m=more&id=###>Linkname</a>Maybe a way to seperate id=### and the link name?So when an ID is the same, it only results one ID ? Quote Link to comment Share on other sites More sharing options...
iamX Posted March 13, 2006 Author Share Posted March 13, 2006 Aah I found it! The greedy first (.*?) was the troublemaker. My code came out like this:$matched = preg_match_all("{m=more&id=([A-Z0-9]+?)>([^<]+?)</a>}", $txtonly, $match);No double Href's anymore :) Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.