Good point, Sean. It is common to run into X.500 and IMCEA-encapsulated addresses when dealing with Exchange. You can look into the details of this by checking out how the LegacyExchangeDN attribute is used in Exchange routing.
It is often not a viable option to perform any normalization during the forensic preservation as you would be modifying the original evidence. So, dealing with de-duplication and threading are issues to be handled by your downstream tools during eDiscovery. A few thoughts on how these could be accomplished:
In some cases, you will find that the contact associated with a message has both its X.500 address and SMTP address available—in the PR_EMAIL_ADDRESS_W and PR_SMTP_ADDRESS_W MAPI properties respectively. When this is the case, the processing tool can (perhaps optionally) favor the SMTP address for email hash calculation to normalize the email addresses.
When both addresses are not available, my suggestion would be to feed the tool a mapping of X.500 addresses to SMTP addresses to facilitate the normalization. I am not aware of any off-the-shelf eDiscovery tools that do this at the moment.
The conversation index (PR_CONVERSATION_INDEX MAPI property) is a good choice for threading Exchange emails. This not only identifies the header message, but also gives you the origination date of the header message, the position of your message in the thread, and time differences among the messages.
On a related note, Gmail / G Suite messages have a slightly similar threadId attribute. FEC captures this during acquisitions and includes it in its output (Downloaded Items log). You could use threadId for threading Google emails similar to how you would use the conversation index to thread Exchange emails.
I would be very curious to hear the experiences of others in this area as well!