Jump to content

PHP screen scraping


Makwana

Recommended Posts

Hi All, hope someone can help with this:

 

I am scraping a handful of sites for some information using the simplehtmldom extension which is working fine apart from when trying to access javaserver pages. Can anyone tell me why this isnt working and if there is someway round it please? I assume its something to do with missing cookie information but have no idea really. Currently I am trying to parse directly from the url but i could feasibly pass it as a file or string if I can get the pages into either.

Thanks in advance

Link to comment
Share on other sites

  • 2 weeks later...

Can you go to the site that isn't loading and get a valid cookie from your browser?

 

You could try setting the scraper's cookie value to the valid one from your browser, then run the script, I've done that before where logging in was required. Caveat, I did it using Ruby. It's particularly easy if you've got a browser-plugin that lets you view the cookie.

Link to comment
Share on other sites

Can you go to the site that isn't loading and get a valid cookie from your browser?

This tends to work, but there are occasionally places that are smarter and incorporate user-agent strings into the cookie and session values. If adding the cookie doesn't get it working then you may need to spoof the UA too.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.