kotty5 Posted November 12, 2013 Share Posted November 12, 2013 Hello, I made a search engine and now I am trying to find an open source spider for its. I have a database phpmyadmin, where are around 200 urls, descriptions, titles and keywords and now I want connect it with spider to add more results in it. Quote Link to comment https://forums.phpfreaks.com/topic/283850-web-crawler-for-search-engine/ Share on other sites More sharing options...
dalecosp Posted November 12, 2013 Share Posted November 12, 2013 (edited) http://en.wikipedia.org/wiki/Web_crawler#Open-source_crawlers But, really, you made a search engine but no spider? I should think the spider is the main thing! (Maybe I should go read the "History of Google" again...) Edited November 12, 2013 by dalecosp Quote Link to comment https://forums.phpfreaks.com/topic/283850-web-crawler-for-search-engine/#findComment-1458058 Share on other sites More sharing options...
QuickOldCar Posted November 18, 2013 Share Posted November 18, 2013 I can probably help you out with this. There is a lot more than meets the eye with all this, seems like a simple thing, but is many unforeseen obstacles. There are a few opensource ones out there can use http://www.sphider.eu/ http://cuab.de/ https://code.google.com/p/phpspider/ and tons more if looked for them... I used sphider quite a few years ago and seemed pretty good out of a lot of them I tried, but I wanted a lot more control and do it all different than what they do. Their system is that you add sites to a crawl list, and will keep hitting those sites looking if any new data. Which is fine if want a search for the sites you select. I wrote a few of my own crawler/scraper/spider or whatever someone wants to call it. I have a few ways to add new sites and links, is manual submission, pulling urls from lists or db, through my webcrawler or with my site or page scraper. I started out like 5 years ago and did more like what google does, even looked similar to them, but after doing it a while and seeing how long it takes to scrape entire sites...I changed it all around to pull in more data faster and simpler. But i can still scrape entire sites if wanted to. Basically I hit a url, grab any information want from it, scrape all their links from pages, and they get stored into my links search. The site itself does not get indexed into specific categories or tags, but the information of that site does get stored. I use a full text search to sort my results in the website index, and use sphinxsearch to handle my links results. The toughest part of all of this is not getting information, but to actually display it in a timely manner, once you get to a million+ results will quickly see what i mean. That's why you have to make sure you do indexing on the database so can return results faster. And is better to fetch and return exactly what you need. You can check my search engine and website index with the link in my signature, if any questions just ask. I could probably write a novel about search engines and indexing. Quote Link to comment https://forums.phpfreaks.com/topic/283850-web-crawler-for-search-engine/#findComment-1458731 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.