DJphp Posted November 25, 2007 Share Posted November 25, 2007 I have a script that scrapes a web page for URL's. preg_match_all($urlpatternLink, $data, $matchesLink); foreach ($matchesLink[1] as $u) { echo $u; } The output looks like: showthread.php?s=49432320845584ac41dadf97ecf4db95&t=26149 showthread.php?s=49432320845584ac41dadf97ecf4db95&t=26149&page=2 showthread.php?s=49432320845584ac41dadf97ecf4db95&p=274948#post274948 I do not want to print out the lines that contain "&p", that is, similar to the last two lines ('&p' is found in '&p=' and '&page'). So, I tried to use "substr_count" as below: $unwantedLink1 = "&p"; preg_match_all($urlpatternLink, $data, $matchesLink); foreach ($matchesLink[1] as $u) { $matchUnwantedLinkCount = substr_count($u, $unwantedLink1); echo $matchUnwantedLinkCount; if ( $matchUnwantedLinkCount == 0 ) { $urlLink = $u; } echo $urlLink; } Unfortunately, $matchUnwantedLinkCount is always 0, which means that the output is the same as earlier, that is, all results are displayed, not just those that do not contain "&p". Any help would be appreciated in using substr_count, or doing what I need. thanks, DJphp Quote Link to comment Share on other sites More sharing options...
DJphp Posted November 25, 2007 Author Share Posted November 25, 2007 solved. the scrape of URL's produced URL's with special characters. I used htmlspecialchars_decode() to clean up the URL's. my use of substr_count works now. solved. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.