chaseman Posted January 28, 2011 Share Posted January 28, 2011 Ok this is not necessarily a coding help question, but there was no other place to post it since there's no PHP General talk section. I'm wondering would it be a disadvantage to use PHP for coding a search engine? I'm reading a lot how Python (Google) is faster than PHP, and Microsoft probably uses ASP.NET for their bing.com. Imagine a search engine that searches only for mp3 music files. Would PHP be a disadvantage in terms of speediness and efficiency? I'm talking about search engines which search the whole web and not a local area, I'm stating it here to avoid confusion. Quote Link to comment Share on other sites More sharing options...
QuickOldCar Posted January 28, 2011 Share Posted January 28, 2011 I believe they use python as their base code along with ajax, most likely to resolve urls better. As curl can not follow all javascript or ajax links. To follow all the links in ajax you need to render the page somehow just as a browser does. But I think they also use php,msql or sqlite as their db's Can look into maybe doing it with python and cassandra as the database http://cassandra.apache.org/ That's what digg, facebook and a few others use. Here's a faitly decent php open source search engine for you to try. http://www.sphider.eu/ Quote Link to comment Share on other sites More sharing options...
QuickOldCar Posted January 28, 2011 Share Posted January 28, 2011 You may also find this "make your own python search engine" source code and example interesting. http://www.zackgrossbart.com/hackito/search-engine-python/ Quote Link to comment Share on other sites More sharing options...
chaseman Posted January 29, 2011 Author Share Posted January 29, 2011 Thanks for your help mate, though I'm wondering if using PHP for the main part would put someone at a disadvantage than using a language like Python? What would you say? Quote Link to comment Share on other sites More sharing options...
QuickOldCar Posted January 29, 2011 Share Posted January 29, 2011 Well I use php, but I do it a bit different than normal search engines and made it an index. What I think you want to do is break it up maybe, just use python for the crawler in the background to fetch links, store it however you like, and for the display can use anything you want. Even change it later if didn't like it. I messed around a lot with this, the biggest things to worry about is being able to connect to the sites more and not being refused, able to recover their links along with titles. And your database not puking. I've found that using curl can help a lot for resolving the url's and secure connections. Once have the url can do whatever want with it, I would probably suggest just using dom xml. You'll get 99% of the links, is probably multi-trillions of links out there, not like you need to get every single one of them or anything. Quote Link to comment Share on other sites More sharing options...
The Little Guy Posted January 29, 2011 Share Posted January 29, 2011 From what I have heard/read php and python are not the fastest languages, but that doesn't matter when making a search engine. The most important part of making a search engine is your database. You need a speedy database. Such as: mongodb, casandra, hbase, hypertable, and depending on your server and queries and indexes mysql. You can use the sphinx engine or a highly optimized myasim database. If mysql is used right it can search surprisingly fast. Quote Link to comment Share on other sites More sharing options...
chaseman Posted January 29, 2011 Author Share Posted January 29, 2011 From what I have heard/read php and python are not the fastest languages, but that doesn't matter when making a search engine. The most important part of making a search engine is your database. You need a speedy database. Such as: mongodb, casandra, hbase, hypertable, and depending on your server and queries and indexes mysql. You can use the sphinx engine or a highly optimized myasim database. If mysql is used right it can search surprisingly fast. Thanks for the info, and Python is not considered as one of the fastest languages? I'm constantly reading how fast Python is and how much faster it is than PHP? I thought that's why Google uses it in the first place. As far as the database goes, it makes sense, because that's where the data gets accessed from at the end of the day. So I'm taking your guy's posts as, speediness of PHP is secondary there are much bigger problems to worry about like using the right database. Quote Link to comment Share on other sites More sharing options...
dreamwest Posted January 30, 2011 Share Posted January 30, 2011 I have a video database that has 500 million rows, and get matches in < 0.0006 secs purely because it inverted. Boolean was slowing down to 2 secs even with full indexing. The reason the inversion works faster is that your only searching the dictionary and already have the count total there, that way you can also avoid counting groups of data which save another second or so. Other than that php can compress your frontend data by up to 70% so a 40KB page will be around 6KB by the time you serve it then you can flush it @ini_set('implicit_flush',1); , this helps with slow user connection speeds and larger docs Quote Link to comment Share on other sites More sharing options...
QuickOldCar Posted January 30, 2011 Share Posted January 30, 2011 Some information on inverted indexes http://rosettacode.org/wiki/Inverted_Index Should look into sphinx as well http://sphinxsearch.com/ Quote Link to comment Share on other sites More sharing options...
The Little Guy Posted January 30, 2011 Share Posted January 30, 2011 I have been searching google for inverted indexs, but can't find much. Does anyone have a php/mysql example? Quote Link to comment Share on other sites More sharing options...
QuickOldCar Posted January 30, 2011 Share Posted January 30, 2011 http://code.google.com/p/inverted-index/ Quote Link to comment Share on other sites More sharing options...
The Little Guy Posted January 30, 2011 Share Posted January 30, 2011 I have looked at that, it is a little confusing, and there is no documentation on it. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.