DJphp Posted November 21, 2007 Share Posted November 21, 2007 Hello, Whilst scraping a web page for URL's with this expression: $urlLink = "/<a[^>]+href=\"(showthread\.php\?s=[^\"]+)/i"; and using preg_match_all and printf to display the results i get the results (showing a subset): showthread.php?s=21be590fe8a4b4e317a7fa54b3ff8230&t=26041 showthread.php?s=21be590fe8a4b4e317a7fa54b3ff8230&t=26041&page=2 showthread.php?s=21be590fe8a4b4e317a7fa54b3ff8230&p=274081#post274081 and so on. what I would like to do is only capture URL's like the first line: showthread.php?s=21be590fe8a4b4e317a7fa54b3ff8230&t=26041 and exclude the results line 2 and 3 showthread.php?s=21be590fe8a4b4e317a7fa54b3ff8230&t=26041&page=2 showthread.php?s=21be590fe8a4b4e317a7fa54b3ff8230&p=274081#post274081 That is, I only want to exclude anything like the following: showthread.php?s=****&p=**** or showthread.php?s=****&t=****&page=**** and only display results with: showthread.php?s=****&t=**** I just cannot seem to do this. My thoughts were to exclude any results that included "&p=" or "&page=". I just cannot seem to do that. any help would be appreciated. DJphp Quote Link to comment Share on other sites More sharing options...
DJphp Posted November 22, 2007 Author Share Posted November 22, 2007 hi, would anyone have any ideas? thanks, DJphp Quote Link to comment Share on other sites More sharing options...
effigy Posted November 27, 2007 Share Posted November 27, 2007 Try something like this: <pre> <?php $data = <<<DATA <a href="showthread.php?s=21be590fe8a4b4e317a7fa54b3ff8230&t=26041"></a> <a href="showthread.php?s=21be590fe8a4b4e317a7fa54b3ff8230&t=26041&page=2"></a> <a href="showthread.php?s=21be590fe8a4b4e317a7fa54b3ff8230&p=274081#post274081"></a> DATA; preg_match_all('/<a[^>]+href="(showthread\.php\?s=(??!&p(?:age)?=)[^"])+)"/i', $data, $matches); array_shift($matches); print_r($matches); ?> </pre> Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.