Web scraping blocked

SammyP · July 19, 2007

I have been using some PHP code to get some football scores from a website for a while, but it has stopped working.

I can see the site myself, and the source code is as before.

When my PHP code tries to read it though, the page is just one line. I assume this is intentional by the site. (I'm not sure why, as the results aren't theirs, and there are plenty of other sites with them.)

Anyway, I am simply getting them from another place now, but I am wondering how they do this, and if it is avoidable.

GingerRobot · July 19, 2007

A lot of websites are available in differant formats. For instance, you can view a lot of websites from a mobile phone, but they will look completely differant to how they would look in your ordinary browser. The content of the website is therefore dependant on the information it can gather from the user.

If you were using something like file_get_contents, i think i would be right in saying that no information is passed to the site. This often results in a vastly cut down version of the site being retrieved. Usually you are better off using cURL. You can pass a lot of things in the request like a user-agent. This usually helps you get the content you require.

Could be something completely differant that caused your problems. But its a possibility.

chigley · July 19, 2007

It's more likely that they don't want you viewing their site remotely and using their content on your site, do you have their permission? They've probably blocked your server IP if it used to work and now it doesn't!

SammyP · July 19, 2007

No I don't have their permission, and I am happily using another site now. Don't worry that I'm doing anything I shouldn't, I'm not. I just want the results of football matches, and I can type them in from the paper or anywhere, but that requires me to be at my computer all weekend, which I'm not.

I am now just curious now about their methods. I do a lot of web scraping, and I like to know how these things work.

They might simply have blocked the IP address I suppose. That will be no fun, as the cURL functions won't work either. I am going to test them anyway. Thanks for that advice GingerRobot, I hadn't heard of those functions before. Will let you know how it goes.

SammyP · July 19, 2007

Didn't work, but I am not too worried. And learning about the cURL functions was worth it anyway.

Thanks.

Sign In

Web scraping blocked

Recommended Posts

SammyP

Link to comment

Share on other sites

GingerRobot

Link to comment

Share on other sites

chigley

Link to comment

Share on other sites

SammyP

Link to comment

Share on other sites

SammyP

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information