Jump to content

How to build search engine like www.torrentz.com in PHP ?


ankur0101

Recommended Posts

Hi friends,

I am a PHP student. Yesterday night, i came with an idea that how to create a search engine like www.torrentz.com

 

Actually on this site, there are no files, but it searches files from other torrent sites and shows like

 

Same with http://rapidshare-search-engine.com/

this website doesn't contains files but it shows files from rapidshare.com

 

How can i do that for www.download.com or softpedia.com ?? ???

 

Can anybody help me ? :) :)

Link to comment
Share on other sites

it looks like those sites already have search engines on them.  If a site doesn't have an api, you would have to figure out some way to spider it and extract the info you want.  I'm not sure of the legality of this though.  You may want to just contact the site and see if they're interested in collaborating with you....maybe you can come to some kind of mutually beneficial arrangement

Link to comment
Share on other sites

Unless you enter in some kind of arrangement with the other site, to which they give you access to a list of files (through ftp or some custom script they make), you have to rely on scraping their site. 

 

First thing you would do is submit the search phrase to their server, to get the page that shows the results.  You can use curl to submit data to their search form and grab the results output (the resulting rendered html page, not a nice and tidy list of files you want. No, it's not that easy).  Or if keywords can be sent via GET method, you can use file_get_contents with a dynamically generated url string.

 

Either way, you are then going to have to use regex to extract the list of files from the page so you can then present them to your user.

 

Oh and btw, though it's not illegal, most sites frown upon being scraped like this.  If you want to avoid potentially getting your server's ip address banned on their site, I suggest contacting them and at least getting their permission to scrape their pages.  If they say it's okay, they may even throw together something that allows you to skip the b.s. of having to regex the page.

Link to comment
Share on other sites

I always wondered about the legality issue of scraping...thanks crayon!

 

I don't like the idea of screen scraping b/c it seems you have to be constantly monitoring a site for changes to its layout.  So, that you can rewrite your regex's whenever the site changes.  It seems like if you're gonna end up giving the site more traffic they would want to cooperate with you.

 

Just out of your curiousity what would you do if your IP was banned?  I guess you could use a proxy, but it seems a lot of those are unreliable. 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.