loren646 Posted April 30, 2013 Share Posted April 30, 2013 I'm sure it's possible. I want to go to a website. Input values then copy and paste data it found from that search. Not even sure where to start. Quote Link to comment Share on other sites More sharing options...
requinix Posted April 30, 2013 Share Posted April 30, 2013 Generally. What site and what are you grabbing? Quote Link to comment Share on other sites More sharing options...
loren646 Posted April 30, 2013 Author Share Posted April 30, 2013 (edited) http://a810-bisweb.nyc.gov/bisweb/bispi00.jsp example (on row 1 of property search): Select "manhattan" Input "22" "west 11 st" Click "GO" ---- Click "complaints" then click each complaint and copy the disposition i.e. "01/26/2004 - A9 - ECB & BUILDINGS VIOLATIONS SERVED" Edited April 30, 2013 by loren646 Quote Link to comment Share on other sites More sharing options...
requinix Posted April 30, 2013 Share Posted April 30, 2013 Okay, I can't find anything which prohibits you from doing this. What data are you trying to get and in what form? Quote Link to comment Share on other sites More sharing options...
akphidelt2007 Posted April 30, 2013 Share Posted April 30, 2013 It's actually much easier than it seems. Just look up file_get_contents. The only thing you will have to know is regex and how to manipulate the url to get the correct contents. Like I did a project for some guys where I scraped all of ESPN's baseball data for the past decade and that was simply just changing the date on the URL and parsing ESPN's structure. Quote Link to comment Share on other sites More sharing options...
cob05 Posted May 1, 2013 Share Posted May 1, 2013 It's actually much easier than it seems. Just look up file_get_contents. The only thing you will have to know is regex and how to manipulate the url to get the correct contents. Like I did a project for some guys where I scraped all of ESPN's baseball data for the past decade and that was simply just changing the date on the URL and parsing ESPN's structure. I'm looking to do something like that for football (NFL), you don't happen to have some sample code you could share do you? Quote Link to comment Share on other sites More sharing options...
loren646 Posted May 1, 2013 Author Share Posted May 1, 2013 Okay, I can't find anything which prohibits you from doing this. What data are you trying to get and in what form? just text data. I can either put it in a mysql database or excel. it doesn't matter. i just want to automate it - rather than do it manually. Quote Link to comment Share on other sites More sharing options...
loren646 Posted May 1, 2013 Author Share Posted May 1, 2013 It's actually much easier than it seems. Just look up file_get_contents. The only thing you will have to know is regex and how to manipulate the url to get the correct contents. Like I did a project for some guys where I scraped all of ESPN's baseball data for the past decade and that was simply just changing the date on the URL and parsing ESPN's structure. Thanks. I'm going to do some reading up on this right now. Quote Link to comment Share on other sites More sharing options...
akphidelt2007 Posted May 1, 2013 Share Posted May 1, 2013 I'm looking to do something like that for football (NFL), you don't happen to have some sample code you could share do you? Getting the contents part is easy, it's the parsing that takes some time. This was for mlb. It worked perfectly for me, but this was two years ago... and I know there's probably a lot of efficiencies you can add to it. But for time purposes I'll just post the simple code. This was to get individual game data for each game. //plug in a date here that you want to get the info for or to start your loop for tons of dates $date = '2013-05-01'; $unix = strtotime($date) $espnDate = date('Ymd',$unix); $url = 'http://scores.espn.go.com/mlb/scoreboard?date='.$espnDate; //here's how easy it is to get the file $handle = file_get_contents($url); $str = htmlentities($handle); //extract the game ids from the game date $pattern = '/(\d*)-gameDetails/'; preg_match_all($pattern, $str, $gameIDs); //now you have the divs that contain each of the games and you just loop through them and then go through the same process foreach($gameIDs[1] as $id) { $url = 'http://scores.espn.go.com/mlb/boxscore?gameId='.$id; $handle = file_get_contents($url); $str = htmlentities($handle); //now you have a mess of regex to parse the actual html to break up the actual data and store in a database } Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.