Jump to content

Recommended Posts

Ok this is not necessarily a coding help question, but there was no other place to post it since there's no PHP General talk section.

 

I'm wondering would it be a disadvantage to use PHP for coding a search engine?

 

I'm reading a lot how Python (Google) is faster than PHP, and Microsoft probably uses ASP.NET for their bing.com.

 

Imagine a search engine that searches only for mp3 music files. Would PHP be a disadvantage in terms of speediness and efficiency?

 

I'm talking about search engines which search the whole web and not a local area, I'm stating it here to avoid confusion.

Link to comment
https://forums.phpfreaks.com/topic/225977-search-engine-in-php/
Share on other sites

I believe they use python as their base code along with ajax, most likely to resolve urls better. As curl can not follow all javascript or ajax links.

 

To follow all the links in ajax you need to render the page somehow just as a browser does.

 

But I think they also use php,msql or sqlite as their db's

 

Can look into maybe doing it with python and cassandra as the database

http://cassandra.apache.org/

 

That's what digg, facebook and a few others use.

 

Here's a faitly decent php open source search engine for you to try.

http://www.sphider.eu/

Well I use php, but I do it a bit different than normal search engines and made it an index.

 

What I think you want to do is break it up maybe, just use python for the crawler in the background to fetch links, store it however you like, and for the display can use anything you want. Even change it later if didn't like it.

 

I messed around a lot with this, the biggest things to worry about is being able to connect to the sites more and not being refused, able to recover their links along with titles. And your database not puking.

 

I've found that using curl can help a lot for resolving the url's and secure connections.

Once have the url can do whatever want with it, I would probably suggest just using dom xml. You'll get 99% of the links, is probably multi-trillions of links out there, not like you need to get every single one of them or anything.

From what I have heard/read php and python are not the fastest languages, but that doesn't matter when making a search engine. The most important part of making a search engine is your database. You need a speedy database. Such as: mongodb, casandra, hbase, hypertable, and depending on your server and queries and indexes mysql. You can use the sphinx engine or a highly optimized myasim database. If mysql is used right it can search surprisingly fast.

From what I have heard/read php and python are not the fastest languages, but that doesn't matter when making a search engine. The most important part of making a search engine is your database. You need a speedy database. Such as: mongodb, casandra, hbase, hypertable, and depending on your server and queries and indexes mysql. You can use the sphinx engine or a highly optimized myasim database. If mysql is used right it can search surprisingly fast.

Thanks for the info, and Python is not considered as one of the fastest languages? I'm constantly reading how fast Python is and how much faster it is than PHP? I thought that's why Google uses it in the first place.

 

As far as the database goes, it makes sense, because that's where the data gets accessed from at the end of the day.

 

So I'm taking your guy's posts as, speediness of PHP is secondary there are much bigger problems to worry about like using the right database.

 

 

 

I have a video database that has 500 million rows, and get matches in < 0.0006 secs purely because it inverted. Boolean was slowing down to 2 secs even with full indexing.

 

The reason the inversion works faster is that your only searching the dictionary and already have the count total there, that way you can also avoid counting groups of data which save another second or so.

 

Other than that php can compress your frontend data by up to 70% so a 40KB page will be around 6KB by the time you serve it then you can flush it @ini_set('implicit_flush',1);  , this helps with slow user connection speeds and larger docs

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.