Jump to content

Scraping an Ajax website


solomos

Recommended Posts

Hi

i m quite new in php and web development as a total (but very old in programming :happy-04:). So i ll need some help here please.

 

i m trying to scrap a website that is using Ajax (js script) so the content keeps changing every 30 mins.

 

I used firebug to find the source script but from there and on i m not sure how to continue, hot to catch the html content and parse it.

 

Any advice/ideas/example/help is really welcome.

 

Sol

Link to comment
Share on other sites

To scrape the site you need a JS parser, and considering the content changes ever 30 mins via an AJAX call I don't think that'd feasible.

 

What I would do instead, if you have permission to do this in the first place, is to send a request to the same URL as the AJAX handler on that site is. Then use that return to grab the contents you need. No point in scraping the site, when you have an interface ready to hand you whatever you need. ;)

Link to comment
Share on other sites

If you know the Ajax URL, then that is all you need to get the content. How exactly depends on how the site is structured.

 

If you have permission to use the content, why not just ask them for a database dump? If you don't have permission, then you are very likely breaking the law.

Link to comment
Share on other sites

Thanks for your responses.

 

Yes i do know the Ajax URL, meaning through what firebug shows me.

 

Why am i breaking the law? I m not going to collect data. The data i m going to scrap will be destroyed every time and wont be stored anywhere since they will be useless. Also this is for strictly private use only, Finally i m getting some data that that is already public.

 

Unless scraping is against the law in general. If so i could grab a screenshot and using OCR methods i could get the results i d like. This way wouldn't i break the law...

Link to comment
Share on other sites

Two terms to know: "Copyright" and "Terms of Service"

 

If you are scraping the site to present the data on another site, you are likely violating the Copyright of the site.

 

If you are scraping the site for any reason, you are likely violating the site's Terms of Service. Most of them say that you are not allowed to access the site through any automated method.

 

You need to read the site's policies and be sure that your actions are within the policies. You may want to contact the site administrator and see if they have an RSS feed or an API for people who want to retrieve the data for their own use.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.