Jump to content

Web scraping blocked


SammyP

Recommended Posts

I have been using some PHP code to get some football scores from a website for a while, but it has stopped working.

 

I can see the site myself, and the source code is as before.

 

When my PHP code tries to read it though, the page is just one line. I assume this is intentional by the site. (I'm not sure why, as the results aren't theirs, and there are plenty of other sites with them.)

 

Anyway, I am simply getting them from another place now, but I am wondering how they do this, and if it is avoidable.

 

 

Link to comment
Share on other sites

A lot of websites are available in differant formats. For instance, you can view a lot of websites from a mobile phone, but they will look completely differant to how they would look in your ordinary browser. The content of the website is therefore dependant on the information it can gather from the user.

 

If you were using something like file_get_contents, i think i would be right in saying that no information is passed to the site. This often results in a vastly cut down version of the site being retrieved. Usually you are better off using cURL. You can pass a lot of things in the request like a user-agent. This usually helps you get the content you require.

 

Could be something completely differant that caused your problems. But its a possibility.

Link to comment
Share on other sites

It's more likely that they don't want you viewing their site remotely and using their content on your site, do you have their permission? They've probably blocked your server IP if it used to work and now it doesn't!

Link to comment
Share on other sites

No I don't have their permission, and I am happily using another site now. Don't worry that I'm doing anything I shouldn't, I'm not. I just want the results of football matches, and I can type them in from the paper or anywhere, but that requires me to be at my computer all weekend, which I'm not.

 

I am now just curious now about their methods. I do a lot of web scraping, and I like to know how these things work.

 

They might simply have blocked the IP address I suppose. That will be no fun, as the cURL functions won't work either. I am going to test them anyway. Thanks for that advice GingerRobot, I hadn't heard of those functions before. Will let you know how it goes.

 

 

 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.