WinnieThePujols Posted June 26, 2007 Share Posted June 26, 2007 How do I go about collecting data from another website? Basically what I'd like to do is set up nightly retrieval of data from this site: http://web.minorleaguebaseball.com/milb/stats/stats.jsp?n=Jarrett%2520Hoffpauir&pos=&sid=milb&t=p_pbp&pid=459440 (and other pages on that site) And then have it input into a database on my server. I'd basically want to read everything on that page as a string, and then explode pieces of it into individual chunks that would then be added to a MySQL database... I have no idea where to even start. I asked someone in the past and he called it a "grepper" or something. Can someone please assist me in getting started? In a nutshell... how do I set up automatic retrieval from a separate site? Thanks for the help. Quote Link to comment Share on other sites More sharing options...
Dragen Posted June 26, 2007 Share Posted June 26, 2007 I think you might need to look into curl. I don't know much about it myself, but I'm trying to use it for some paypal payments.. sending the data to paypal and reading the result. It may be what you're wanting. Although you'd probably need cron jobs to activate it every 24 hours Quote Link to comment Share on other sites More sharing options...
WinnieThePujols Posted June 26, 2007 Author Share Posted June 26, 2007 Can you elaborate on that a little bit? "Curl?" Is that like a function or what is it, exactly? Edit: nevermind, I Googled it. Thanks for the suggestion -- I'll look into it! Quote Link to comment Share on other sites More sharing options...
redarrow Posted June 26, 2007 Share Posted June 26, 2007 fully tested ok. All you do is setup the array [numbers] you need and place them in the database and use a cron to run the sript at the times you need the database updated ok. I tried to post the rusult but the page has got 2013 single arrays so it was to big to show you the result. This process must have permission from the website owner as the information is copywrite to the website ok. if you get the permission to import all the information (via email) You could in essance make a mod for other clients to have the saved database information on there website and charge them for the use of the daily backed up results. baseball is huge and i smell money lol. good luck mate. <?php $url_collect_information_from="http://web.minorleaguebaseball.com/milb/stats/stats.jsp?n=Jarrett%2520Hoffpauir&pos=&sid=milb&t=p_pbp&pid=459440"; $get_the_information=file_get_contents("$url_collect_information_from"); $collect_information_in_array=explode(' ',$get_the_information); print_r($collect_information_in_array); ?> Quote Link to comment Share on other sites More sharing options...
WinnieThePujols Posted June 26, 2007 Author Share Posted June 26, 2007 Red, See, if you look at the page in the browser versus a page using curl, it's actually quite different. Based on the source code, it looks like milb.com employs a ton of Javascript... the actual stats and tables themselves don't show up in the source code. The only thing you see in the source code itself is (presumably this is what prints the stats): <script type="text/javascript">writeData();</script> So what I'm thinking is: <script type="text/javascript">writeData();</script> EQUALS this image here. Not sure how I'm going to work-around that... Edit: also, about the copyright thing. I think that MLB has conceded that Players and Stats cannot be copyrighted, so do I actually need permission (if just using it for a non-profit site)? Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.