NotionCommotion Posted May 18, 2023 Share Posted May 18, 2023 I am working on a SQL backed document management API where often the documents will reside in one of several categories (i.e. subject, type, project which created it, etc). I am looking into different ways to implement search functionality such as... Standard WHERE with LIKE and maybe regex. Full-Text queries (I happen to be using PostgreSQL) 3rd party Semantic search libraries such as https://github.com/neuml/txtai 3rd party machine learning libraries such as https://php-ml.readthedocs.io/en/latest/ API calls to some 3rd party webservice. Other? Is there any approach which is best for most applications? Or as I expect is the "best" approach based on the actual application requirements, and if so, can you please share your decision criteria? Quote Link to comment https://forums.phpfreaks.com/topic/316325-current-search-functionality-strategies/ Share on other sites More sharing options...
kicken Posted May 18, 2023 Share Posted May 18, 2023 Are you trying to search the contents of the documents, or just the metadata (name, category, etc)? I've never looked into them much, but there are dedicated search servers you could look at as yet another alternative (Apache Solr being one example I've heard of). Quote Link to comment https://forums.phpfreaks.com/topic/316325-current-search-functionality-strategies/#findComment-1608521 Share on other sites More sharing options...
NotionCommotion Posted May 18, 2023 Author Share Posted May 18, 2023 28 minutes ago, kicken said: Are you trying to search the contents of the documents, or just the metadata (name, category, etc)? I've never looked into them much, but there are dedicated search servers you could look at as yet another alternative (Apache Solr being one example I've heard of). Definitely, metadata will be searched and "maybe" content will be searched for some document types, but not sure. Per your link, "Solr is the popular, blazing-fast, open source enterprise search platform built on Apache Lucene", and it appears that Apache Lucene is "just?" a Java library used for the full text search of documents. If only full text searching, think it is necessary or just use PostgreSQL's? Also, not sure yet whether my referenced textai is much more, and really haven't spent much time learning about full text searching. Trying go understand how the app database and searching using a dedicated server work together. I't my understanding that I send stuff I wish to later search for to Solr/Lucerne/TextAi which in turn indexes it so it later be searched for. Seems like the same metadata would then be both in my SQL DB as well as the search engine DB which seems wierd. Quote Link to comment https://forums.phpfreaks.com/topic/316325-current-search-functionality-strategies/#findComment-1608522 Share on other sites More sharing options...
kicken Posted May 18, 2023 Share Posted May 18, 2023 The extent of my searching experience is pretty much just WHERE clauses targeting specific columns, sometimes using LIKE for substring matches. Haven't really had a need for anything more complex than that yet. I was mostly just pointing out that there are dedicated search servers around and might be worth checking. Maybe they would be useful, maybe they won't, but worth checking out. As far as I understand, you would have to setup some kind of sync so if your metadata changes in your DB you update the the search index as well, so in a way you do have the data duplicated. There's a cost-benefit analysis to be made there for your needs. Quote Link to comment https://forums.phpfreaks.com/topic/316325-current-search-functionality-strategies/#findComment-1608523 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.