natasha_thomas Posted July 23, 2010 Share Posted July 23, 2010 Friends, I want to scrape the product titles form a webpage in an Array, am not sure if its possible or not, but someone said it can be done with php dom or curl or something like that. http://www.nextag.com/serv/main/buyer/OutPDir.jsp?search=paintball&perpagePersistent=60 From this webpage, i want to scrape all the product listings (All in Blue color) in an array and then i want to Echo these product lisitngs from that array. So the output from the above link will be.. Timex T5K238 Men Digital Sport Resin Strap Watch New Field Ops Watch - Accessories Suunto Wrist-Top Computer Watch Kenneth Cole Leather Band Men Slim Watch EVISU Men's EV-7012-02 Habara Stainless Steel Organic Digital Watch . . . . Till 60th product Listing May anyone help me out to achieve this code? wish you all a wonderful weekend ahead, Natty Quote Link to comment https://forums.phpfreaks.com/topic/208703-scrapping-the-product-titles-from-a-webpage-in-an-array/ Share on other sites More sharing options...
natasha_thomas Posted July 23, 2010 Author Share Posted July 23, 2010 The Codes that i have come up with is: <?php $dom = new DomDocument();libxml_use_internal_errors(true); if ($dom->loadHtmlFile('http://www.nextag.com/serv/main/buyer/OutPDir.jsp?search=paintball&perpagePersistent=60')) { $xpath = new DomXPath($dom); $nodekw = stripslashes(str_replace('+', '-', urlencode($node->nodeValue))); echo "<a href='".($nodekw).".htm'>".Ucwords($node->nodeValue)."</a>". "<br/>\n"; } echo "<br/>\n"; // Product Titles [Requirements #2] foreach ($xpath->query('//a[@class="underline"]') as $node) { echo $node->nodeValue . "<br/>\n"; } ?> But the Output is: Field Armor Stalker Exoskeleton $170 Tippmann A5 Paintball Marker Gun 4+1 MEGA Set $214 Liquid Image VideoMask 311 Large $150 Kingman Spyder MR1 Military Tactical Paintball Marker Gun MEGA - Olive $135 to $143 4X30 Compact Illuminated Rubber Armor Scope $65 to $68 Tippmann 98 Custom ACT AK Tactical Paintball Gun w/Sling $180 Cabela's 1851 Navy .44 Caliber Revolver with Starter Kit $240 Classic Army Armalite M15 RIS Sportline $179 Tippmann Alpha Black Tactical Edition Paintball Marker -ARMY CAMO $150 Tippmann Custom 98 Paintball Marker Black . . . . . . . Till 60th product Problem is, i only want the Product Listings and not the "Price". Observe the price starting with $. So ideally the Output that i need is: Field Armor Stalker Exoskeleton Tippmann A5 Paintball Marker Gun 4+1 MEGA Set Liquid Image VideoMask 311 Large Kingman Spyder MR1 Military Tactical Paintball Marker Gun MEGA - Olive 4X30 Compact Illuminated Rubber Armor Scope Tippmann 98 Custom ACT AK Tactical Paintball Gun w/Sling Cabela's 1851 Navy .44 Caliber Revolver with Starter Kit Classic Army Armalite M15 RIS Sportline Tippmann Alpha Black Tactical Edition Paintball Marker -ARMY CAMO Tippmann Custom 98 Paintball Marker Black . . . . . Till 60th product And How to Jmble the Output, i mean, i want to Shuffle the array elements so that evey time the output will be different. What code changes do i need? Thanks Natty Quote Link to comment https://forums.phpfreaks.com/topic/208703-scrapping-the-product-titles-from-a-webpage-in-an-array/#findComment-1090322 Share on other sites More sharing options...
natasha_thomas Posted July 25, 2010 Author Share Posted July 25, 2010 Any one to help? Quote Link to comment https://forums.phpfreaks.com/topic/208703-scrapping-the-product-titles-from-a-webpage-in-an-array/#findComment-1090943 Share on other sites More sharing options...
wildteen88 Posted July 25, 2010 Share Posted July 25, 2010 Change $xpath->query('//a[@class="underline"]') to $xpath->query('//a[@class="underline" and starts-with(@id,"opPNLink")]') Quote Link to comment https://forums.phpfreaks.com/topic/208703-scrapping-the-product-titles-from-a-webpage-in-an-array/#findComment-1090958 Share on other sites More sharing options...
natasha_thomas Posted July 25, 2010 Author Share Posted July 25, 2010 hey dear wildTeen, (nice ID) Thanks for your reply @: Have a look at my Code, midified based on your Changes: <?php $nodekw = "diamond red gold ring"; $dom = new DomDocument();libxml_use_internal_errors(true); if ($dom->loadHtmlFile('http://www.nextag.com/serv/main/buyer/OutPDir.jsp?search='. $nodekw .'&perpagePersistent=60')) { $xpath = new DomXPath($dom); } foreach ($xpath->query('//a[starts-with(@id,"opPNLink")]') as $node) { $nodekw = stripslashes(str_replace('+', '-', urlencode($node->nodeValue))); echo "<a href='".($nodekw).".htm'>".Ucwords($node->nodeValue)."</a>". "<br/>\n"; } ?> It Echos Nothing.... Anything else i need to Change? Another thing, is here any way to Shuffle the Array Elements so it echos in ramdom order? I tired to use Shuffle(), but was not successful, how can it be achieved? Regards, Natasha T Quote Link to comment https://forums.phpfreaks.com/topic/208703-scrapping-the-product-titles-from-a-webpage-in-an-array/#findComment-1090978 Share on other sites More sharing options...
natasha_thomas Posted July 25, 2010 Author Share Posted July 25, 2010 UPDATE Thanks to WildTeen88, He suggested me one alternate way to parse Titles. Another thing, is here any way to Shuffle the Array Elements so it echos in ramdom order? I tired to use Shuffle(), but was not successful, how can it be achieved? Quote Link to comment https://forums.phpfreaks.com/topic/208703-scrapping-the-product-titles-from-a-webpage-in-an-array/#findComment-1090982 Share on other sites More sharing options...
wildteen88 Posted July 25, 2010 Share Posted July 25, 2010 Change foreach ($xpath->query('//a[@class="underline"]') as $node) to $products = $xpath->query('//a[@class="underline"]'); shuffle($products); foreach ($products as $node) Quote Link to comment https://forums.phpfreaks.com/topic/208703-scrapping-the-product-titles-from-a-webpage-in-an-array/#findComment-1090991 Share on other sites More sharing options...
natasha_thomas Posted July 25, 2010 Author Share Posted July 25, 2010 Change foreach ($xpath->query('//a[@class="underline"]') as $node) to $products = $xpath->query('//a[@class="underline"]'); shuffle($products); foreach ($products as $node) Dear WildTeen, as you said, i chaned my codes, and see this Error: Warning: shuffle() expects parameter 1 to be array, object given in I see the output, but its not Shuffling and its Static, so seems Shuffle() is not working... For Reference my Code is: $products = $xpath->query('//a[@class="underline"]'); shuffle($products); foreach ($products as $node) Am i doing anythign wrong? Thanks Quote Link to comment https://forums.phpfreaks.com/topic/208703-scrapping-the-product-titles-from-a-webpage-in-an-array/#findComment-1091012 Share on other sites More sharing options...
wildteen88 Posted July 25, 2010 Share Posted July 25, 2010 I've never used DOM/Xpath, so I'm learning too . Hum, This what I came up with, probably could of been done better by extending the DomXPath class maybe. $nodekw = "diamond red gold ring"; $dom = new DomDocument();libxml_use_internal_errors(true); if ($dom->loadHtmlFile('http://www.nextag.com/serv/main/buyer/OutPDir.jsp?search='. $nodekw .'&perpagePersistent=60')) { $xpath = new DomXPath($dom); $products = array(); // add to products array foreach ($xpath->query('//a[@class="underline"]') as $node) { if(substr($node->nodeValue, 0, 1) != '$') $products[] = $node->nodeValue; } // now we can shuffle the results shuffle($products); // output the products foreach($products as $product) { $nodekw = stripslashes(str_replace('+', '-', urlencode($product))); echo "<a href='".($nodekw).".htm'>".Ucwords($product)."</a>". "<br/>\n"; echo '<hr />'; } } Quote Link to comment https://forums.phpfreaks.com/topic/208703-scrapping-the-product-titles-from-a-webpage-in-an-array/#findComment-1091017 Share on other sites More sharing options...
natasha_thomas Posted July 25, 2010 Author Share Posted July 25, 2010 Many thanks to WildTeen88 Quote Link to comment https://forums.phpfreaks.com/topic/208703-scrapping-the-product-titles-from-a-webpage-in-an-array/#findComment-1091056 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.