emehrkay Posted February 9, 2007 Share Posted February 9, 2007 http://lucene.apache.org/solr/ looks very interesting, but it seems like they just take your sql database data and put it in xml files for this server to index and search. but a lot of people are talking about using it. what do you guys think? anyone using it? Quote Link to comment https://forums.phpfreaks.com/topic/37836-anyone-using-lucene-solr-search-engine-server/ Share on other sites More sharing options...
effigy Posted February 9, 2007 Share Posted February 9, 2007 ...but it seems like they just take your sql database data and put it in xml files...what do you guys think? I haven't read much about it, or used it, but I wanted to mention that "just" putting your data into XML can be a huge advantage when you bring XSLT into the picture. Its speed is impressive. Quote Link to comment https://forums.phpfreaks.com/topic/37836-anyone-using-lucene-solr-search-engine-server/#findComment-181101 Share on other sites More sharing options...
ober Posted February 9, 2007 Share Posted February 9, 2007 Faster than a normal DBMS? I hardly see how. Quote Link to comment https://forums.phpfreaks.com/topic/37836-anyone-using-lucene-solr-search-engine-server/#findComment-181106 Share on other sites More sharing options...
effigy Posted February 9, 2007 Share Posted February 9, 2007 It depends on what you're looking for. Databases are 2-dimensional and may require many joins just to get a fragment of information. In an XML document, after it's loaded into memory, everything is laid out in paths which make retrieval quick and powerful. A few months back I was working with a (I think) 15MB XML file. It took around 0.2 seconds to pass it through an XSLT stylesheet and give me an entirely new structure. I can verify these numbers on Monday. Quote Link to comment https://forums.phpfreaks.com/topic/37836-anyone-using-lucene-solr-search-engine-server/#findComment-181113 Share on other sites More sharing options...
emehrkay Posted February 10, 2007 Author Share Posted February 10, 2007 the thing i dont understand with this approach is the duplication of your data. you can choose what to index, but what if you want the system to seach your whole db? that's essentially two db's you're running. this software must be something, digg is going to use it for their search. the waybackmachine uses it too Quote Link to comment https://forums.phpfreaks.com/topic/37836-anyone-using-lucene-solr-search-engine-server/#findComment-181138 Share on other sites More sharing options...
effigy Posted February 12, 2007 Share Posted February 12, 2007 I can verify these numbers on Monday. The test numbers I remembered were older and more impressive than my tests this morning. Here they are from the project's current state: XSLT Processor: Xalan C++ version 1.1.0 Input file: 8.5MB Stylesheet: 7KB Result file: 4MB Stylesheet parse time: 20 milliseconds XML parse time: 6,230 milliseconds Transformation time: 9,310 milliseconds If the XML file was already loaded into memory, then you're down to a 9 second transformation time. That doesn't seem fast, but it's gutting an 8.5MB file, running calculations, looking around the relationship tree, calling recursive templates, and then outputting a very different 4MB file. Yes, that still may not sound like a selling point, but the PHP and/or Perl I'd have to write to parse this file would be more complex to write and maintain, and would take longer to run. Also, this isn't the best example since the application you posted seems to be indexing and searching, not creating the entire book like I am. the thing i dont understand with this approach is the duplication of your data. you can choose what to index, but what if you want the system to seach your whole db? that's essentially two db's you're running. I think this comes down to balance. Databases are very important and powerful, but why put more stress on them if you don't need to? It depends on how static something is: if you have a page that doesn't need to be 100% real-time, why not set up a cron job to create the "dynamic" page and only hit the database that one time? If your data needs to be 100% real-time, then, yes, I don't see how doubling data can be beneficial. Quote Link to comment https://forums.phpfreaks.com/topic/37836-anyone-using-lucene-solr-search-engine-server/#findComment-182742 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.