Jump to content

Stop websites from crawling my site


watsmyname

Recommended Posts

Hello,

 

I had made a website (PHP) for a music company  few years back. They basically sell songs from their site. Now they've faced a problem. Their site has been crawled by abmp3.com and they not only let user download songs from our site but also give a full path of a song to download.

 

I wonder how this happened and how to stop them from crawling our site. And its likely some other site doing same too. So please anybody help me how to overcome this problem, may be some PHP code can do this??.

 

Thanks

watsmyname

Link to comment
Share on other sites

Well thanks,

 

I have two folders like say "demosongs" and "fullsongs". demosongs folder contains short version or low quality version songs which user has access and can listen for preview. fullsongs folder contain full songs. Users only get download link for full songs after their payment is successful. So user doesn't know the actual path of full songs.

 

thanks

 

Link to comment
Share on other sites

You need to control access to the files via an interface. Pass the file name, or even better an ID, through a script that will check the user has permission based on their purchase history. If they pass the script will look-up the file and allow the user to download it, which can be done using the MIME 'Content-disposition' header. Search Google for something like "php force file download".

 

After I would add a restriction to the Apache configuration to prevent direct access, meaning all requests will need to go through the interface:

 

<Directory "/path/to/songs/directory">
    Deny from all
</Directory>

Link to comment
Share on other sites

After I would add a restriction to the Apache configuration to prevent direct access, meaning all requests will need to go through the interface:

 

<Directory "/path/to/songs/directory">
    Deny from all
</Directory>

 

As an alternative, sticking the files outside of the webroot (/www/, /htdocs/) would provide the same functionality without .htaccess

Link to comment
Share on other sites

thanks xyph,

 

So is it google that makes possible for them to get my whole server's folder structure?? or there is some other way? I just want to know how did they do

(will be helpful obviously for the precautionary measure for future) before i move into the solution.

 

thanks

watsmyname

Link to comment
Share on other sites

After I would add a restriction to the Apache configuration to prevent direct access, meaning all requests will need to go through the interface:

 

<Directory "/path/to/songs/directory">
    Deny from all
</Directory>

 

As an alternative, sticking the files outside of the webroot (/www/, /htdocs/) would provide the same functionality without .htaccess

 

This is the best option. If you have files that should only be available after purchase then you'll want them safely stored out of the web root.

 

Robots.txt is no good because only "good" bots will obey it. Evil bots will simply ignore this.

Link to comment
Share on other sites

thanks xyph,

 

directory listing is not allowed!

 

watsmyname

 

It may be that "abmp3.com" have actually purchased a song from the site before, discovered the vulnerability and decided to exploit it. You may even unknowingly have a direct link to the file... It's complete guesswork without us being able to see the site.

Link to comment
Share on other sites

Although i have directory listing disabled, can we prevent user from downloading songs with direct link?? i.e. for example can we prevent user directly access http://www.mysite.com/files/looking_into_eyes.mp3 using htaccess?? if yes how? as i already stated that the purchaser gets indirect link after the purchase and from that link they get the forced download of the song they purchased.

Link to comment
Share on other sites

To summarize the points in this thread and add some of my own...

 

1)  The script they used to compromise your site was probably hand-rolled.  I wrote web spiders for years, they're very easy.

 

2)  Move your files or secure them with HTACESS so nobody, not even paying customers, can use download links to get to them. 

 

3)  Once you've moved them or secured them, rewrite your "download song" landing page so that it accepts a songID, checks whether that song has been purchased by the logged-in user, and then streams the proper song out to the user without directing them to the file itself. 

 

4)  Google has nothing to do with this.  Other Mp3 sites have nothing to do with this.  your site was insecure and followed a predictable pattern, so many people (including probably dozens of your customers) figured out how to get to your files without paying for them.

 

 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.