fapapfap Posted December 28, 2011 Share Posted December 28, 2011 So I would like to scrape a page to make a list of URLs. The source has lots of URLs in the following format: <a href="http://www.site.com/page.php?action=p&id=99">TEXT</a>, <a href="http://www.site.com/page.php?action=p&id=97">TEXT</a>, <a href="http://www.site.com/page.php?action=p&id=98">TEXT</a> and carries on in exactly the same format. Can someone help me with the code I need to scrape the page and extract just the links into a list using regex? Link to comment https://forums.phpfreaks.com/topic/253924-how-extract-some-data-using-regex/ Share on other sites More sharing options...
scootstah Posted December 28, 2011 Share Posted December 28, 2011 Seems to work: <?php $pattern = '~<a href="http:\/\/www\.site\.com\/page\.php\?action=p\&id=[0-9]+">[a-zA-Z]+<\/a>~i'; $text = 'So I would like to scrape a page to make a list of URLs. The source has lots of URLs in the following format: <a href="http://www.site.com/page.php?action=p&id=99">TEXT</a>, <a href="http://www.site.com/page.php?action=p&id=97">TEXT</a>, <a href="http://www.site.com/page.php?action=p&id=98">TEXT</a> and carries on in exactly the same format. Can someone help me with the code I need to scrape the page and extract just the links into a list using regex?'; if (preg_match_all($pattern, $text, $matches)) { echo '<pre>' . print_r($matches) . '</pre>'; } Link to comment https://forums.phpfreaks.com/topic/253924-how-extract-some-data-using-regex/#findComment-1301752 Share on other sites More sharing options...
fapapfap Posted December 28, 2011 Author Share Posted December 28, 2011 Seems to work: <?php $pattern = '~<a href="http:\/\/www\.site\.com\/page\.php\?action=p\&id=[0-9]+">[a-zA-Z]+<\/a>~i'; $text = 'So I would like to scrape a page to make a list of URLs. The source has lots of URLs in the following format: <a href="http://www.site.com/page.php?action=p&id=99">TEXT</a>, <a href="http://www.site.com/page.php?action=p&id=97">TEXT</a>, <a href="http://www.site.com/page.php?action=p&id=98">TEXT</a> and carries on in exactly the same format. Can someone help me with the code I need to scrape the page and extract just the links into a list using regex?'; if (preg_match_all($pattern, $text, $matches)) { echo '<pre>' . print_r($matches) . '</pre>'; } Thank you so much, perfect! Link to comment https://forums.phpfreaks.com/topic/253924-how-extract-some-data-using-regex/#findComment-1301756 Share on other sites More sharing options...
fapapfap Posted December 28, 2011 Author Share Posted December 28, 2011 Another question please: Some of the data will be fortmatted in the following form, how to deal with this in the context of the answer above? <a href="http://www.site.com/page.php?action=p&uid=448"><span style="color:#0066FF;">TEXT</span></a> Link to comment https://forums.phpfreaks.com/topic/253924-how-extract-some-data-using-regex/#findComment-1301835 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.