ankur0101 Posted January 24, 2009 Share Posted January 24, 2009 Hi friends, I am a PHP student. Yesterday night, i came with an idea that how to create a search engine like www.torrentz.com Actually on this site, there are no files, but it searches files from other torrent sites and shows like Same with http://rapidshare-search-engine.com/ this website doesn't contains files but it shows files from rapidshare.com How can i do that for www.download.com or softpedia.com ?? ??? Can anybody help me ? :) Quote Link to comment https://forums.phpfreaks.com/topic/142243-how-to-build-search-engine-like-wwwtorrentzcom-in-php/ Share on other sites More sharing options...
rubing Posted January 24, 2009 Share Posted January 24, 2009 it looks like those sites already have search engines on them. If a site doesn't have an api, you would have to figure out some way to spider it and extract the info you want. I'm not sure of the legality of this though. You may want to just contact the site and see if they're interested in collaborating with you....maybe you can come to some kind of mutually beneficial arrangement Quote Link to comment https://forums.phpfreaks.com/topic/142243-how-to-build-search-engine-like-wwwtorrentzcom-in-php/#findComment-745228 Share on other sites More sharing options...
.josh Posted January 24, 2009 Share Posted January 24, 2009 Unless you enter in some kind of arrangement with the other site, to which they give you access to a list of files (through ftp or some custom script they make), you have to rely on scraping their site. First thing you would do is submit the search phrase to their server, to get the page that shows the results. You can use curl to submit data to their search form and grab the results output (the resulting rendered html page, not a nice and tidy list of files you want. No, it's not that easy). Or if keywords can be sent via GET method, you can use file_get_contents with a dynamically generated url string. Either way, you are then going to have to use regex to extract the list of files from the page so you can then present them to your user. Oh and btw, though it's not illegal, most sites frown upon being scraped like this. If you want to avoid potentially getting your server's ip address banned on their site, I suggest contacting them and at least getting their permission to scrape their pages. If they say it's okay, they may even throw together something that allows you to skip the b.s. of having to regex the page. Quote Link to comment https://forums.phpfreaks.com/topic/142243-how-to-build-search-engine-like-wwwtorrentzcom-in-php/#findComment-745235 Share on other sites More sharing options...
rubing Posted January 24, 2009 Share Posted January 24, 2009 I always wondered about the legality issue of scraping...thanks crayon! I don't like the idea of screen scraping b/c it seems you have to be constantly monitoring a site for changes to its layout. So, that you can rewrite your regex's whenever the site changes. It seems like if you're gonna end up giving the site more traffic they would want to cooperate with you. Just out of your curiousity what would you do if your IP was banned? I guess you could use a proxy, but it seems a lot of those are unreliable. Quote Link to comment https://forums.phpfreaks.com/topic/142243-how-to-build-search-engine-like-wwwtorrentzcom-in-php/#findComment-745296 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.