hno Posted January 17, 2010 Share Posted January 17, 2010 HI i want to make a seaech engine similar google or yahoo but just similar .just to the first step i want that it can detect web sites and index them. how it is possible ? how can It detects web's address ? please help me , it my thesis thanks Quote Link to comment https://forums.phpfreaks.com/topic/188781-how-to-build-a-search-engine/ Share on other sites More sharing options...
ignace Posted January 17, 2010 Share Posted January 17, 2010 http://www.amazon.com/Understanding-Search-Engines-Mathematical-Environments/dp/0898714370 Quote Link to comment https://forums.phpfreaks.com/topic/188781-how-to-build-a-search-engine/#findComment-996660 Share on other sites More sharing options...
teamatomic Posted January 17, 2010 Share Posted January 17, 2010 What do you mean by "detect"? If by detect you mean to find new domains, you dont. You can visit sites that people submit new domains to and scrape them or work with domains that are submitted to your engine. If a domain is asked for in your search engine that you dont have indexed or if someone submits a url then your bot,slurper,scraper, crawler or whatever you want to call it goes to the site and checks if its alive. If its alive it grabs the robots.txt file and goes from there. Some bots only index the root of the site, those are mostly called "directories" others will follow links while respecting the disallows of the robots.txt file, these are mostly termed "search engines". What you do is index the root then follow links and index them as you go along. If you want to do it like google you use the keywords to match against page heading<h1><h2> succession and then the text of the page against the <h> tags and keywords to come up with a ranking. Then, like google, you use secret incantations, voodoo dolls and bat blood drippings to decide the final rankings. As to the code. Use Curl or wget, or even just use file() or file_get_contgents(). From there on its up to you how complex you want your code to be and how you want to break up the work to be done. As to indexing itself. Whatever scheme you come up with that works. Start simple and develop from there. One thing. The more complex you make your script, the deeper you go into site, the more work your script does, the quicker you will need a dedicated server(s). Your hoster will quickly tire of the demands your script puts on your shared server and your script will not be able to function properly under the restriction set by a shared server. Not to mention you will, if you are serious about running a search engine, quickly run out of disk space and bandwidth. What comes into your site will also count against your bandwidth. HTH Teamatomic Quote Link to comment https://forums.phpfreaks.com/topic/188781-how-to-build-a-search-engine/#findComment-996677 Share on other sites More sharing options...
hno Posted January 18, 2010 Author Share Posted January 18, 2010 thanks for your help but i still have a question how can google index the web sites ? how it is recognize new web sites ? it's my problem thanks Quote Link to comment https://forums.phpfreaks.com/topic/188781-how-to-build-a-search-engine/#findComment-997016 Share on other sites More sharing options...
oni-kun Posted January 18, 2010 Share Posted January 18, 2010 thanks for your help but i still have a question how can google index the web sites ? how it is recognize new web sites ? it's my problem thanks IF: a) Your site is registered, it will come onto a newly registered domain list (usually the case) b) Your site is linked to, by any known site, the spider will crawl and discover it. c) Your site is on a shared host (more often the case), and Google will crawl the IP and find your site. Quote Link to comment https://forums.phpfreaks.com/topic/188781-how-to-build-a-search-engine/#findComment-997020 Share on other sites More sharing options...
greatstar00 Posted January 18, 2010 Share Posted January 18, 2010 Your site is registered, it will come onto a newly registered domain list (usually the case) how to find out this list? Quote Link to comment https://forums.phpfreaks.com/topic/188781-how-to-build-a-search-engine/#findComment-997039 Share on other sites More sharing options...
hno Posted January 20, 2010 Author Share Posted January 20, 2010 thanks for your help but i still have a question how can google index the web sites ? how it is recognize new web sites ? it's my problem thanks IF: a) Your site is registered, it will come onto a newly registered domain list (usually the case) b) Your site is linked to, by any known site, the spider will crawl and discover it. c) Your site is on a shared host (more often the case), and Google will crawl the IP and find your site. thanks for your help . Now , where I can find the new domains list that have been submitted ? Quote Link to comment https://forums.phpfreaks.com/topic/188781-how-to-build-a-search-engine/#findComment-998498 Share on other sites More sharing options...
redarrow Posted January 20, 2010 Share Posted January 20, 2010 Good question, but due to competition, you can not see who just submitted there info to Google to be listed in there search engine . There no list, for anybody except Google staff for new added domain names. or there no information on domain names waiting to be added via Google. building a search engine takes years(( a proper one and cost's thousands/millions to create, especially the infrastructure and net work behind it. People that create scripts, that scrape info from Google, are not real search engines. real search engines are massive very massive. The coding for indexing web sites on a search engines, is massive you have to be able to walk and talk mysql... Quote Link to comment https://forums.phpfreaks.com/topic/188781-how-to-build-a-search-engine/#findComment-998552 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.