Jump to content

How to sort data without javascript while Php scraping


Recommended Posts

I'm trying to scrape data from google.de/shopping.

 

Consider the following URL

 

http://www.google.de/products/catalog?hl=de&q=4242002690209&cid=2594634728159287170

 

I need to sort this product list with 'Endpreis' (if you don't translate German to English). I need the lowest of all. Clicking it on a normal browser gets the job done as it does this using javascript. However while scraping using Php it doesn't sort it.

 

Obviously, I need to check the javascript involved and i did. Here is my analysis. When i check the javascript document named ps-js.js i get the function logClick.

 

D("logClick",function(a,b,c,d,e,f){document.images&&((new Image).src=gb("/products/log","?ptab=pp_click","&pp_exp=",d,"&pp_vert=",b,"&pp_sec=",c,"&pp_lk=",f,"&cid=",e,"&pp_durl=",a));return j})

 

Corresponding html for it is

 

href="javascript:void(0);" onclick="reloadSection('#scoring=tps', 'ps-sellers');"
onmousedown="return logClick('\x2Fproducts\x2Fcatalog?hl=de\x26q=4242002690209\x26cid=2594634728159287170\x26scoring=tps', 'cc', 'Overview', 'tabless', '2594634728159287170', 'Endpreis')"
class="">Endpreis</a>

 

When i input the value

 

http://www.google.de/products/log?ptab=pp_click&pp_exp=tabless&pp_vert=cc&pp_sec=Overview&pp_lk=Endpreis&cid=2594634728159287170

 

Nothing happens.

 

Any workaround or help to get the lowest Endpreis in the product list would be really appreciated.

 

Thankyou

Code im using

 

<?php
  
  $get_EAN = '4242002690209';
  $url = "http://www.google.de/search?hl=de&tbm=shop&q=".$get_EAN."&oq=".$get_EAN;
   
  $ch = curl_init();
     
  curl_setopt ($ch, CURLOPT_URL, $url);
  curl_setopt ($ch, CURLOPT_USERAGENT, "spider");
  curl_setopt ($ch, CURLOPT_HEADER, 0);
  curl_setopt($ch, CURLOPT_AUTOREFERER, 1);
  curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
  curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
    
  $get_product = curl_exec ($ch);
  curl_close ($ch);
  
    
  preg_match_all('~<div class="pslimain"><h3 class="r"><a href="(.*?)"~s', $get_product, $get_price); 
  
  if(preg_match('#cid=(.*)#', $get_price[1][0], $r)) {
       $get_cid = trim($r[1]);       
   }
  
  $url = "http://www.google.de/products/catalog?hl=de&q=4242002690209&cid=".$get_cid;
  
  $ch = curl_init();
     
  curl_setopt ($ch, CURLOPT_URL, $url);
  curl_setopt ($ch, CURLOPT_USERAGENT, "spider");
  curl_setopt ($ch, CURLOPT_HEADER, 0);
  curl_setopt($ch, CURLOPT_AUTOREFERER, 1);
  curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
  curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
    
  $list_price = curl_exec ($ch);
  curl_close ($ch);
  
  print_r($get_price[1][0]);
  print_r($get_product);
  print_r($list_price);
  
  
?>

Mate, I think you are seeing something i'm not able to. What i'm seeing is pagination happens using the same js function and i'm not able to traverse through pages using Php. I thought about doing the same you suggested but i wasn't able to, thus i posted here  :-[

 

  <a id="next-n-start"
href="javascript:void(0);" onclick="reloadSection('#start=10', 'ps-sellers');"
onmousedown="return logClick('\x2Fproducts\x2Fcatalog?hl=de\x26q=4242002690209\x26cid=2594634728159287170\x26cpo=1\x26sa=N\x26start=10', 'cc', 'Overview', 'tabless', '2594634728159287170', 'ps-sellers-frame_Weiter \x26raquo\x3B')"
>Weiter »</a>

 

Little more help would be appreciated.

 

 

 

 

I went to the URL in the first post, opened Chrome's Developer Tools, clicked to the Network tab. Then, clicked the header in the table and looked for the address of the page requested by JavaScript.

 

5vawk.png

Thanks a lot psycho.

 

Actually, Salathe found the trick to get all the results on one page - which is definitely the best route.. But, what I provided earlier was still valid and would be necessary if the site didn't have the option to get all the results in one page. So, just to elaborate on what I was suggesting:

 

Click the Page Next button and look at how the URL changes.

 

You replied:

Mate, I think you are seeing something i'm not able to. What i'm seeing is pagination happens using the same js function and i'm not able to traverse through pages using Php. I thought about doing the same you suggested but i wasn't able to . . .

 

When I clicked a link to go to another page the URL would change as follows:

http://www.google.de/products/catalog?hl=de&q=4242002690209&cid=2594634728159287170
http://www.google.de/products/catalog?hl=de&q=4242002690209&cid=2594634728159287170#start=10
http://www.google.de/products/catalog?hl=de&q=4242002690209&cid=2594634728159287170#start=20

 

So, if you had to, you could increase the #start=nn and iteratively grab one page at a time using file_get_contents() until no new records were begin generated. until you had all the records. But, thankfully, you don't need to do that.

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.