anyone using lucene solr search engine server?

emehrkay · February 9, 2007

http://lucene.apache.org/solr/

looks very interesting, but it seems like they just take your sql database data and put it in xml files for this server to index and search. but a lot of people are talking about using it.

what do you guys think?

anyone using it?

effigy · February 9, 2007

...but it seems like they just take your sql database data and put it in xml files...what do you guys think?

I haven't read much about it, or used it, but I wanted to mention that "just" putting your data into XML can be a huge advantage when you bring XSLT into the picture. Its speed is impressive.

ober · February 9, 2007

Faster than a normal DBMS? I hardly see how.

effigy · February 9, 2007

It depends on what you're looking for. Databases are 2-dimensional and may require many joins just to get a fragment of information. In an XML document, after it's loaded into memory, everything is laid out in paths which make retrieval quick and powerful. A few months back I was working with a (I think) 15MB XML file. It took around 0.2 seconds to pass it through an XSLT stylesheet and give me an entirely new structure. I can verify these numbers on Monday.

emehrkay · February 10, 2007

the thing i dont understand with this approach is the duplication of your data. you can choose what to index, but what if you want the system to seach your whole db? that's essentially two db's you're running. this software must be something, digg is going to use it for their search. the waybackmachine uses it too

effigy · February 12, 2007

I can verify these numbers on Monday.

The test numbers I remembered were older and more impressive than my tests this morning. Here they are from the project's current state:

XSLT Processor: Xalan C++ version 1.1.0

Input file: 8.5MB

Stylesheet: 7KB

Result file: 4MB

Stylesheet parse time: 20 milliseconds

XML parse time: 6,230 milliseconds

Transformation time: 9,310 milliseconds

If the XML file was already loaded into memory, then you're down to a 9 second transformation time. That doesn't seem fast, but it's gutting an 8.5MB file, running calculations, looking around the relationship tree, calling recursive templates, and then outputting a very different 4MB file.

Yes, that still may not sound like a selling point, but the PHP and/or Perl I'd have to write to parse this file would be more complex to write and maintain, and would take longer to run. Also, this isn't the best example since the application you posted seems to be indexing and searching, not creating the entire book like I am.

the thing i dont understand with this approach is the duplication of your data. you can choose what to index, but what if you want the system to seach your whole db? that's essentially two db's you're running.

I think this comes down to balance. Databases are very important and powerful, but why put more stress on them if you don't need to? It depends on how static something is: if you have a page that doesn't need to be 100% real-time, why not set up a cron job to create the "dynamic" page and only hit the database that one time? If your data needs to be 100% real-time, then, yes, I don't see how doubling data can be beneficial.

Sign In

anyone using lucene solr search engine server?

Recommended Posts

emehrkay

Link to comment

Share on other sites

effigy

Link to comment

Share on other sites

ober

Link to comment

Share on other sites

effigy

Link to comment

Share on other sites

emehrkay

Link to comment

Share on other sites

effigy

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information