Scraping an Ajax website

solomos · January 12, 2013

Hi

i m quite new in php and web development as a total (but very old in programming :happy-04: ). So i ll need some help here please.

i m trying to scrap a website that is using Ajax (js script) so the content keeps changing every 30 mins.

I used firebug to find the source script but from there and on i m not sure how to continue, hot to catch the html content and parse it.

Any advice/ideas/example/help is really welcome.

Sol

Christian F. · January 12, 2013

To scrape the site you need a JS parser, and considering the content changes ever 30 mins via an AJAX call I don't think that'd feasible.

What I would do instead, if you have permission to do this in the first place, is to send a request to the same URL as the AJAX handler on that site is. Then use that return to grab the contents you need. No point in scraping the site, when you have an interface ready to hand you whatever you need.

Backslider · January 12, 2013

If you know the Ajax URL, then that is all you need to get the content. How exactly depends on how the site is structured.

If you have permission to use the content, why not just ask them for a database dump? If you don't have permission, then you are very likely breaking the law.

solomos · January 13, 2013

Thanks for your responses.

Yes i do know the Ajax URL, meaning through what firebug shows me.

Why am i breaking the law? I m not going to collect data. The data i m going to scrap will be destroyed every time and wont be stored anywhere since they will be useless. Also this is for strictly private use only, Finally i m getting some data that that is already public.

Unless scraping is against the law in general. If so i could grab a screenshot and using OCR methods i could get the results i d like. This way wouldn't i break the law...

MDCode · January 13, 2013

Just because you're not collecting, doesn't make it any less illegal. I don't seem to get the purpose of doing this method if you can get it legally.

DavidAM · January 13, 2013

Two terms to know: "Copyright" and "Terms of Service"

If you are scraping the site to present the data on another site, you are likely violating the Copyright of the site.

If you are scraping the site for any reason, you are likely violating the site's Terms of Service. Most of them say that you are not allowed to access the site through any automated method.

You need to read the site's policies and be sure that your actions are within the policies. You may want to contact the site administrator and see if they have an RSS feed or an API for people who want to retrieve the data for their own use.

solomos · January 13, 2013

ok i didnt aware of that.

but i checked and its not in their term of services so i suppose i m ok for the moment.

but i ll ask for an RSS feed as well.

Sign In

Scraping an Ajax website

Recommended Posts

solomos

Link to comment

Share on other sites

Christian F.

Link to comment

Share on other sites

Backslider

Link to comment

Share on other sites

solomos

Link to comment

Share on other sites

MDCode

Link to comment

Share on other sites

DavidAM

Link to comment

Share on other sites

solomos

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information