Jump to content

Screen scraping a dynamic website


jokerkiller

Recommended Posts

Hi..

 

I am really stuck with this problem, and hope that someone can help me.

I need to extract text (prices, etc.) from a danish website that sells cellphones. The site is dynamically written using javascripts, so I cant just extract directly from the HTML.

This is how the site looks like:

 

https://www.telmore.dk/t2/shop/shop.do?shopUrl=product.action&code=01007032857&minUsage=199&name=nokia-5800

 

Basically I need to get the "1299" and "199" so I can store them in a database for my price comparison site.

 

Any ideas how this can be done?

I thought it maybe could be done by simulating a browser in a php-script, and then "scrape" the content from the browser.

I am no expert programmer, so please explain yourself so I will understand.  ;)

And tell me if this should be posted in another forum.

 

Thanks, Lars

Link to comment
https://forums.phpfreaks.com/topic/180683-screen-scraping-a-dynamic-website/
Share on other sites

I've not looked at the site, but is it being generated from ajax? if so, just call the ajax the same way as the page does and get the content from that

 

Sorry JAY, But I dont know whether the site is being generated from ajax or not, since dont know anything about ajax programming. Do you have any idea how to check this? And how can I make the same ajax call as the website?

Well this is the page that actually contains that middle box:

 

view-source:https://shopmore.telmore.dk/product.action?telmore2SessionId=C51016F14B892EAB2CD43A3BF752F3D1&shopUrl=product.action&code=01007032857&minUsage=199&name=nokia-5800

 

And the initial variables (1299 and 199) are in that code. So you'll have to use file_get_contents or cURL on that page and use Regex Expressions to grab those 2 vars.

 

Good luck.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.