solomos Posted January 12, 2013 Share Posted January 12, 2013 Hi i m quite new in php and web development as a total (but very old in programming ). So i ll need some help here please. i m trying to scrap a website that is using Ajax (js script) so the content keeps changing every 30 mins. I used firebug to find the source script but from there and on i m not sure how to continue, hot to catch the html content and parse it. Any advice/ideas/example/help is really welcome. Sol Quote Link to comment Share on other sites More sharing options...
Christian F. Posted January 12, 2013 Share Posted January 12, 2013 To scrape the site you need a JS parser, and considering the content changes ever 30 mins via an AJAX call I don't think that'd feasible. What I would do instead, if you have permission to do this in the first place, is to send a request to the same URL as the AJAX handler on that site is. Then use that return to grab the contents you need. No point in scraping the site, when you have an interface ready to hand you whatever you need. Quote Link to comment Share on other sites More sharing options...
Backslider Posted January 12, 2013 Share Posted January 12, 2013 If you know the Ajax URL, then that is all you need to get the content. How exactly depends on how the site is structured. If you have permission to use the content, why not just ask them for a database dump? If you don't have permission, then you are very likely breaking the law. Quote Link to comment Share on other sites More sharing options...
solomos Posted January 13, 2013 Author Share Posted January 13, 2013 Thanks for your responses. Yes i do know the Ajax URL, meaning through what firebug shows me. Why am i breaking the law? I m not going to collect data. The data i m going to scrap will be destroyed every time and wont be stored anywhere since they will be useless. Also this is for strictly private use only, Finally i m getting some data that that is already public. Unless scraping is against the law in general. If so i could grab a screenshot and using OCR methods i could get the results i d like. This way wouldn't i break the law... Quote Link to comment Share on other sites More sharing options...
MDCode Posted January 13, 2013 Share Posted January 13, 2013 (edited) Just because you're not collecting, doesn't make it any less illegal. I don't seem to get the purpose of doing this method if you can get it legally. Edited January 13, 2013 by SocialCloud Quote Link to comment Share on other sites More sharing options...
DavidAM Posted January 13, 2013 Share Posted January 13, 2013 Two terms to know: "Copyright" and "Terms of Service" If you are scraping the site to present the data on another site, you are likely violating the Copyright of the site. If you are scraping the site for any reason, you are likely violating the site's Terms of Service. Most of them say that you are not allowed to access the site through any automated method. You need to read the site's policies and be sure that your actions are within the policies. You may want to contact the site administrator and see if they have an RSS feed or an API for people who want to retrieve the data for their own use. Quote Link to comment Share on other sites More sharing options...
solomos Posted January 13, 2013 Author Share Posted January 13, 2013 ok i didnt aware of that. but i checked and its not in their term of services so i suppose i m ok for the moment. but i ll ask for an RSS feed as well. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.