Was provided PSTs from email archiving system where each original message is wrapped in as an attachment to the parent journal email. Looking for suggestions to extract the original message or remove the journal parent email.
I can suggest a couple of options:
The journal PSTs are ingested as they are. You then tap into the backend database that houses the ingested data and tweak the records with a script to adjust attachment ranges so that each envelope message becomes a separate, distinct message and the payload becomes its own attachment family. This should allow the eDiscovery or review tool to deduplicate the actual messages inside the envelopes. While you are at it, you could also have the script tag the envelope messages so that they can easily be hidden from view.
If you have a mixture of regular data and journal PSTs, it would be helpful to keep them separate. Otherwise, you would have to account for different possibilities when scanning the database.
I have done this successfully in the past and I would recommend this approach as it doesn’t require any initial processing and results in fewer changes to the original data. Needless to say, you would want to have a proof of concept and end-to-end testing before you go through the trouble of ingesting everything.
You could also have the incoming PSTs scanned programmatically and new PSTs created without the envelope messages. Special care needs to be taken to make sure critical metadata doesn’t change, every item is accounted for, etc. You would also need to evaluate if there is any value in the envelope messages—that is, if they need to be tied to their children so that an ingested child message could be traced back to the parent envelope message if needed.
What a fast response. Thanks! We have a pre-processing programmatic solution utilizing Dmitry Streblechenko’s Redemption tool. However, given that it is custom code we were wondering if there was a better option such as “ACME corp” de-journaling software. Hadn’t thought of the post-processing option you propose. It is an interesting approach that we will definitely pursue.
I am not aware of a specialized de-journaling software but, hopefully, others in the Community with experience chime in
I recall one law firm dealing with this by using a generic IT tool designed to dump attachments from messages in a PST. If memory serves, the tool was dumping the attachments of each PST item in the PST’s original folder structure. So, the end result was folders full of MSGs that represent the child messages. There was no logging, batch processing, or special metadata handling, though. In your custom Redemption solution, you could scan the PSTs and generate logs showing before and after counts as well as any exceptions encountered.
Regardless of the tool you use, if you go with the pre-processing option, I would scrutinize the before (i.e., journal PST ingested as is) and after (i.e., pre-processed PST ingested) in your target review tool as well as in a forensic tool to make sure the pre-processing is not stomping on metadata.