abhishekphp6 Posted August 15, 2009 Share Posted August 15, 2009 Hello Everyone, This is my first question to this board. The only thing I know about PHP is that it has a framework to do everything in the world. So here is my question. My organization archives past 5 years of emails. (Emails are stored in outlook .pst format). However time and again users (and sometimes auditors) want to search old emails. So I want to implement a search system on historical emails. The email archive is terribly terribly large (10TB+). So we need a very robust and scalable infrastructure which can search data of this volume. Another requirement is around security. when I search for emails, I should be able to search only what I am allowed to see. (I should not be able to search in old mails of someone else -- unless I belong to the auditor role-- ). Please don't get angry if this is FAQ. I am ready to RTFM, if you tell me which M to R. Quote Link to comment Share on other sites More sharing options...
markwillis82 Posted August 23, 2009 Share Posted August 23, 2009 Accessing the data will be the harder section (i've not dealt with .pst before) but for searching and indexing check out http://www.sphinxsearch.com/ depending on how the .pst files are made, you may have to build a program to parse them into a more usable form. With 10TB+ of data - this would need alot of planning, but maybe a disk based folder structure giving each folder an id (linked to a user id stored in a database table) then on sphinx you can store the "keywords" for each email "subject - from - to", then before a search you will filter by the current users id. Giving you a smaller index to search (quicker) and also you will know where to look for the data (useful folder structure). It sounds like you have an interesting project on your hands. If you need any help, drop me a PM Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.