Jump to content

Recommended Posts

Hello Everyone,

 

This is my first question to this board. The only thing I know about PHP is that it has a framework to do everything in the world. So here is my question.

 

My organization archives past 5 years of emails. (Emails are stored in outlook .pst format). However time and again users (and sometimes auditors) want to search old emails. So I want to implement a search system on historical emails.

 

The email archive is terribly terribly large (10TB+). So we need a very robust and scalable infrastructure which can search data of this volume.

 

Another requirement is around security. when I search for emails, I should be able to search only what I am allowed to see. (I should not be able to search in old mails of someone else -- unless I belong to the auditor role-- ).

 

Please don't get angry if this is FAQ. I am ready to RTFM, if you tell me which M to R.

 

 

  • 2 weeks later...

Accessing the data will be the harder section (i've not dealt with .pst before) but for searching and indexing check out http://www.sphinxsearch.com/ depending on how the .pst files are made, you may have to build a program to parse them into a more usable form.

 

With 10TB+ of data - this would need alot of planning, but maybe a disk based folder structure giving each folder an id (linked to a user id stored in a database table) then on sphinx you can store the "keywords" for each email "subject - from - to", then before a search you will filter by the current users id.

 

Giving you a smaller index to search (quicker) and also you will know where to look for the data (useful folder structure).

 

It sounds like you have an interesting project on your hands. If you need any help,  drop me a PM

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.