Email Collection From Multiple Computers

caseyl4n6 · June 13, 2022, 5:32pm

I have to come up with a plan to collect emails from many computers (maybe 30 to 40?) for an onsite inspection, only within a date range. Not allowed to make full forensic images. Anyone have suggestions? Could I use robocopy, batch file, or something like that, go through each comp and copy eml,msg,ost,pst to a hdd for example?

agungor · June 14, 2022, 6:49pm

There are a few key considerations when collecting local emails the way you describe. For example, let’s say you were asked to collect emails between 1/1/2019 and 12/31/2019.

Identification of Email Sources

How are you identifying which file types contain email data? If you go by file extensions such as “*.eml, *.msg, *.ost, *.pst, *.mbox”, files without an extension or with an unexpected extension can easily fall into the cracks. Ideally, you would want your process to verify file types based on the binary contents of the files. The downside is that this often takes considerably longer than simply scanning the file system for certain file extensions.

Date Filtering Email Containers

Let’s say you are running a date filter at the file system level to identify files with certain extensions that fall into your 1/1/2019 - 12/31/2019 date range based on their file system creation and last modification timestamps. Something along the lines of:

(Created >= 1/1/2019 AND Created <= 12/31/2019) OR (Modified >= 1/1/2019 AND Modified <= 12/31/2019)

What happens in this scenario?

Path	File System Date Created	File System Date Last Modified
X:\Backups\RE_Unusual_Test_Data.eml	4/13/2019	4/13/2019
X:\Exported_Messages\RE_Daily_Summary.msg	6/21/2019	8/17/2019
X:\Email_Data\jdoe_example.com.ost	5/1/2018	6/13/2022

The email container, jdoe_example.com.ost, does not have a file system creation or last modification timestamp within the target date range. But, should it be excluded from the acquisition? Since it was created in 2018 and last modified in 2022, it could very well contain emails within the target date range.

Date Filtering Individual Emails

Similarly, it is essential to consider how the date filter should apply to an individual email.

Which timestamps should be taken into consideration? (e.g., internal creation date, internal last modification date, file system MAC times, origination date, delivery date, attachment timestamps, any other timestamps)
Do you only consider the top-level email timestamps or child items as well?
For instance:

Sample_Email.eml (All dates in 2022)
  |-> Attached_Email.eml (All dates in 2021)
       |-> PDF_Attachment_of_Attached_Email.pdf (Last modification date in 2019)

Does the above email family fall into the 1/1/2019 - 12/31/2019 date range because it contains an attachment whose last modification timestamp falls within the target date range?

File System Permissions

Are you targeting a file system (e.g., NTFS) where file system permissions can potentially prevent you from accessing all email data within all of the user profiles on each endpoint during a live acquisition? If so, you would want to find a mechanism to bypass such restrictions.

Level of Email Collection

When you identify an email container as potentially responsive, do you acquire it as a whole, or do you collect the specific emails within it that fall inside your target date range?

TL;DR

In my view, addressing these considerations ahead of time is much more productive than running into them after your preservation efforts (e.g., during forensic examination, document review, or ESI productions).

Once the preservation requirements are nailed down, you can look into potential tools that would fit your proposed workflow. If some of this detail is not needed, perhaps an agent-based acquisition over the network could help you cut to the chase very quickly. For instance, you could triage endpoints with Velociraptor which also supports identifying files via their binary signatures using the magic plugin.

On the other hand, if all of the above (and perhaps more) are applicable, your preservation efforts may require some individual attention at the endpoint level and perhaps some pre-acquisition parsing/processing.