shaunie Posted March 14, 2011 Share Posted March 14, 2011 Hi, I have a script that is scraping merchant names from Amazon market place, for example: http://www.amazon.co.uk/gp/offer-listing/B002PLB2F4/?condition=new The following code works where the seller has a logo: preg_match_all('/<ul class="sellerInformation">([\s]+)(.*?)<img src="(.*?)" width="(.*?)" alt="(.*?)" height="(.*?)" border="(.*?)" \/><\/a>/', $html, $merchants); and this works where merchants don't have a logo: preg_match_all('/<ul class="sellerInformation">([\s]+)<li><div class="seller"><span class="sellerHeader">Seller:<\/span>([\s]+)<a href="(.*?)"><b>(.*?)<\/b><\/a>/', $html, $merchants2); How can I combine these regular expressions so that I just get one array of merchant names? Many thanks for your advice... Quote Link to comment Share on other sites More sharing options...
sasa Posted March 15, 2011 Share Posted March 15, 2011 try <?php $url = 'http://www.amazon.co.uk/gp/offer-listing/B002PLB2F4/?condition=new'; $html = file_get_contents($url); preg_match_all('~<ul class="sellerInformation">.*?(alt="|<b>)([^"<]+)("|<)~is', $html, $matchesarray); print_r($matchesarray[2]); ?> Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.