Jump to content

Automatic Data Retrieval


WinnieThePujols

Recommended Posts

How do I go about collecting data from another website?

 

Basically what I'd like to do is set up nightly retrieval of data from this site:

http://web.minorleaguebaseball.com/milb/stats/stats.jsp?n=Jarrett%2520Hoffpauir&pos=&sid=milb&t=p_pbp&pid=459440

(and other pages on that site)

 

And then have it input into a database on my server. I'd basically want to read everything on that page as a string, and then explode pieces of it into individual chunks that would then be added to a MySQL database...

 

I have no idea where to even start. I asked someone in the past and he called it a "grepper" or something. Can someone please assist me in getting started?

 

In a nutshell... how do I  set up automatic retrieval from a separate site?

 

Thanks for the help.

Link to comment
Share on other sites

I think you might need to look into curl. I don't know much about it myself, but I'm trying to use it for some paypal payments.. sending the data to paypal and reading the result.

It may be what you're wanting.

 

Although you'd probably need cron jobs to activate it every 24 hours

Link to comment
Share on other sites

fully tested ok.

 

All you do is setup the array [numbers] you need and place them in the database and use a cron to run the sript at the times you need the database updated ok.

 

I tried to post the rusult but the page has got 2013 single arrays so it was to big to show you the result.

 

This process must have permission from the website owner as the information is copywrite to the website ok.

 

if you get the permission to import all the information (via email) You could in essance make a mod for other  clients to have the saved database information on there website and charge them for the use of the daily backed up results.

 

baseball is huge and i smell money lol.

 

good luck mate.

<?php

$url_collect_information_from="http://web.minorleaguebaseball.com/milb/stats/stats.jsp?n=Jarrett%2520Hoffpauir&pos=&sid=milb&t=p_pbp&pid=459440";

$get_the_information=file_get_contents("$url_collect_information_from");


$collect_information_in_array=explode(' ',$get_the_information);

print_r($collect_information_in_array);
?>

Link to comment
Share on other sites

Red,

 

See, if you look at the page in the browser versus a page using curl, it's actually quite different. Based on the source code, it looks like milb.com employs a ton of Javascript... the actual stats and tables themselves don't show up in the source code.

 

The only thing you see in the source code itself is (presumably this is what prints the stats):

<script type="text/javascript">writeData();</script>

 

 

So what I'm thinking is:

<script type="text/javascript">writeData();</script> EQUALS this image here.

 

Not sure how I'm going to work-around that...

 

 

 

Edit: also, about the copyright thing. I think that MLB has conceded that Players and Stats cannot be copyrighted, so do I actually need permission (if just using it for a non-profit site)?

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.