Makwana Posted August 25, 2011 Share Posted August 25, 2011 Hi All, hope someone can help with this: I am scraping a handful of sites for some information using the simplehtmldom extension which is working fine apart from when trying to access javaserver pages. Can anyone tell me why this isnt working and if there is someway round it please? I assume its something to do with missing cookie information but have no idea really. Currently I am trying to parse directly from the url but i could feasibly pass it as a file or string if I can get the pages into either. Thanks in advance Quote Link to comment https://forums.phpfreaks.com/topic/245702-php-screen-scraping/ Share on other sites More sharing options...
Maq Posted August 25, 2011 Share Posted August 25, 2011 Can we see your code? The server-side language shouldn't really matter, the end result is HTML. Quote Link to comment https://forums.phpfreaks.com/topic/245702-php-screen-scraping/#findComment-1261978 Share on other sites More sharing options...
tehprofessor Posted September 2, 2011 Share Posted September 2, 2011 Can you go to the site that isn't loading and get a valid cookie from your browser? You could try setting the scraper's cookie value to the valid one from your browser, then run the script, I've done that before where logging in was required. Caveat, I did it using Ruby. It's particularly easy if you've got a browser-plugin that lets you view the cookie. Quote Link to comment https://forums.phpfreaks.com/topic/245702-php-screen-scraping/#findComment-1264904 Share on other sites More sharing options...
requinix Posted September 2, 2011 Share Posted September 2, 2011 Can you go to the site that isn't loading and get a valid cookie from your browser? This tends to work, but there are occasionally places that are smarter and incorporate user-agent strings into the cookie and session values. If adding the cookie doesn't get it working then you may need to spoof the UA too. Quote Link to comment https://forums.phpfreaks.com/topic/245702-php-screen-scraping/#findComment-1264921 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.