DJphp Posted November 21, 2007 Share Posted November 21, 2007 Hello, Whilst scraping a web page for URL's with this expression: $urlLink = "/<a[^>]+href=\"(showthread\.php\?s=[^\"]+)/i"; and using preg_match_all and printf to display the results i get the results (showing a subset): showthread.php?s=21be590fe8a4b4e317a7fa54b3ff8230&t=26041 showthread.php?s=21be590fe8a4b4e317a7fa54b3ff8230&t=26041&page=2 showthread.php?s=21be590fe8a4b4e317a7fa54b3ff8230&p=274081#post274081 and so on. what I would like to do is only capture URL's like the first line: showthread.php?s=21be590fe8a4b4e317a7fa54b3ff8230&t=26041 and exclude the results line 2 and 3 showthread.php?s=21be590fe8a4b4e317a7fa54b3ff8230&t=26041&page=2 showthread.php?s=21be590fe8a4b4e317a7fa54b3ff8230&p=274081#post274081 That is, I only want to exclude anything like the following: showthread.php?s=****&p=**** or showthread.php?s=****&t=****&page=**** and only display results with: showthread.php?s=****&t=**** I just cannot seem to do this. My thoughts were to exclude any results that included "&p=" or "&page=". I just cannot seem to do that. any help would be appreciated. DJphp Link to comment https://forums.phpfreaks.com/topic/78321-limiting-results-using-regexp-from-a-url-scraping-query/ Share on other sites More sharing options...
DJphp Posted November 22, 2007 Author Share Posted November 22, 2007 hi, would anyone have any ideas? thanks, DJphp Link to comment https://forums.phpfreaks.com/topic/78321-limiting-results-using-regexp-from-a-url-scraping-query/#findComment-397030 Share on other sites More sharing options...
effigy Posted November 27, 2007 Share Posted November 27, 2007 Try something like this: <pre> <?php $data = <<<DATA <a href="showthread.php?s=21be590fe8a4b4e317a7fa54b3ff8230&t=26041"></a> <a href="showthread.php?s=21be590fe8a4b4e317a7fa54b3ff8230&t=26041&page=2"></a> <a href="showthread.php?s=21be590fe8a4b4e317a7fa54b3ff8230&p=274081#post274081"></a> DATA; preg_match_all('/<a[^>]+href="(showthread\.php\?s=(??!&p(?:age)?=)[^"])+)"/i', $data, $matches); array_shift($matches); print_r($matches); ?> </pre> Link to comment https://forums.phpfreaks.com/topic/78321-limiting-results-using-regexp-from-a-url-scraping-query/#findComment-400415 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.