Jump to content

Recommended Posts

I am working on a SQL backed document management API where often the documents will reside in one of several categories (i.e. subject, type, project which created it, etc).

I am looking into different ways to implement search functionality such as...

  1. Standard WHERE with LIKE and maybe regex.
  2. Full-Text queries (I happen to be using PostgreSQL)
  3. 3rd party Semantic search libraries such as https://github.com/neuml/txtai
  4. 3rd party machine learning libraries such as https://php-ml.readthedocs.io/en/latest/
  5. API calls to some 3rd party webservice.
  6. Other?

Is there any approach which is best for most applications?  Or as I expect is the "best" approach based on the actual application requirements, and if so, can you please share your decision criteria?

Link to comment
https://forums.phpfreaks.com/topic/316325-current-search-functionality-strategies/
Share on other sites

Are you trying to search the contents of the documents, or just the metadata (name, category, etc)? 

I've never looked into them much, but there are dedicated search servers you could look at as yet another alternative (Apache Solr being one example I've heard of).

28 minutes ago, kicken said:

Are you trying to search the contents of the documents, or just the metadata (name, category, etc)? 

I've never looked into them much, but there are dedicated search servers you could look at as yet another alternative (Apache Solr being one example I've heard of).

Definitely, metadata will be searched and "maybe" content will be searched for some document types, but not sure.

Per your link, "Solr is the popular, blazing-fast, open source enterprise search platform built on Apache Lucene", and it appears that Apache Lucene is "just?" a Java library used for the full text search of documents.  If only full text searching, think it is necessary or just use PostgreSQL's?  Also, not sure yet whether my referenced textai is much more, and really haven't spent much time learning about full text searching.

Trying go understand how the app database and searching using a dedicated server work together.  I't my understanding that I send stuff I wish to later search for to Solr/Lucerne/TextAi which in turn indexes it so it later be searched for.  Seems like the same metadata would then be both in my SQL DB as well as the search engine DB which seems wierd.

The extent of my searching experience is pretty much just WHERE clauses targeting specific columns, sometimes using LIKE for substring matches.  Haven't really had a need for anything more complex than that yet. 

I was mostly just pointing out that there are dedicated search servers around and might be worth checking.  Maybe they would be useful, maybe they won't, but worth checking out.  As far as I understand, you would have to setup some kind of sync so if your metadata changes in your DB you update the the search index as well, so in a way you do have the data duplicated.  There's a cost-benefit analysis to be made there for your needs.

 

 

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.