Jump to content

Recommended Posts

http://a810-bisweb.nyc.gov/bisweb/bispi00.jsp

 

example (on row 1 of property search):

 

Select "manhattan" 

Input "22" "west 11 st" 

Click "GO"

 

----

 

Click "complaints"

 

then click each complaint and copy the disposition i.e. "01/26/2004 - A9 - ECB & BUILDINGS VIOLATIONS SERVED"

Edited by loren646

It's actually much easier than it seems. Just look up file_get_contents. The only thing you will have to know is regex and how to manipulate the url to get the correct contents. Like I did a project for some guys where I scraped all of ESPN's baseball data for the past decade and that was simply just changing the date on the URL and parsing ESPN's structure.

It's actually much easier than it seems. Just look up file_get_contents. The only thing you will have to know is regex and how to manipulate the url to get the correct contents. Like I did a project for some guys where I scraped all of ESPN's baseball data for the past decade and that was simply just changing the date on the URL and parsing ESPN's structure.

 

I'm looking to do something like that for football (NFL), you don't happen to have some sample code you could share do you?

Okay, I can't find anything which prohibits you from doing this.

 

What data are you trying to get and in what form?

 

just text data. I can either put it in a mysql database or excel. it doesn't matter. i just want to automate it - rather than do it manually. 

It's actually much easier than it seems. Just look up file_get_contents. The only thing you will have to know is regex and how to manipulate the url to get the correct contents. Like I did a project for some guys where I scraped all of ESPN's baseball data for the past decade and that was simply just changing the date on the URL and parsing ESPN's structure.

 

Thanks. I'm going to do some reading up on this right now. 

I'm looking to do something like that for football (NFL), you don't happen to have some sample code you could share do you?

 

Getting the contents part is easy, it's the parsing that takes some time. This was for mlb. It worked perfectly for me, but this was two years ago... and I know there's probably a lot of efficiencies you can add to it. But for time purposes I'll just post the simple code. This was to get individual game data for each game.

 

 

//plug in a date here that you want to get the info for or to start your loop for tons of dates
$date = '2013-05-01';
$unix = strtotime($date)
 
$espnDate = date('Ymd',$unix);
 
$url = 'http://scores.espn.go.com/mlb/scoreboard?date='.$espnDate;
 
//here's how easy it is to get the file
$handle = file_get_contents($url);
$str = htmlentities($handle);
 
//extract the game ids from the game date
$pattern = '/(\d*)-gameDetails/';
preg_match_all($pattern, $str, $gameIDs);
 
//now you have the divs that contain each of the games and you just loop through them and then go through the same process
foreach($gameIDs[1] as $id)
{
   $url = 'http://scores.espn.go.com/mlb/boxscore?gameId='.$id;
  
   $handle = file_get_contents($url);
   $str = htmlentities($handle);
 
   //now you have a mess of regex to parse the actual html to break up the actual data and store in a database
}
 
This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.