Best tool for PDF Forensics

nnn · September 7, 2022, 8:20pm

Hello,
Looking for good PDF tool to analyze who is the author of the PDF file and modifications made for e-filling acknowledgement during few days span …

agungor · September 7, 2022, 8:47pm

I find PDF Stream Dumper by David Zimmer and PDF CanOpener quite useful in PDF analysis.

A powerful text editor is also a must—and, 010 Editor has a binary template for the PDF format by Didier Stevens and Christian Mehlmauer which makes it a good option for reviewing PDF structures.

PDF Stream Dumper

PDF CanOpener

010 Editor — PDF Binary Template

Didier Stevens also has a few Python scripts that target PDF documents here:

2024 Update:

This shell script can also be useful for automating some examination tasks on PDFs:

I hope other examiners in the Community have some additional recommendations for you

jbalfour · September 9, 2022, 8:57am

I Second Stream dumper and Didier Stevens work. Stream dumper doesn’t show/ deal with image masks. I’ve tried Jpedal without result. Anyone been able to extract an original jpeg image from multiple image masks in a pdf?

Greg · September 12, 2022, 4:35am

I haven’t had to do these sort of forensics outside a CTF. What’s the real world use case here? What are you seeing?

gmitchell · September 14, 2022, 10:51am

Because many PDFs come in via email I start with Metaspike FEI. It lets me see attachments (often other emails) and extract the attachments, looking for PDFs, from the EMLs.

Then I use PdfWalker which comes in in Lenny Zeltzer’s Remnux REMnux (REMnux® | SANS Institute) to pull PDFs apart and save embedded JFIFs.

Phil Harvey’s Exiftool gives useful info on the various extracted files and gives a good view of producing software, dates, etc. (https://exiftool.org/). For PDFs, the most stable dates are CreateDate and ModifyDate. A CSV output with rows/columns transformed in Excel gives a human readable summary of metadata.

Good hunting!

jbalfour · September 17, 2022, 6:55am

Hi. Thanks. I’ve requested a wet copy so the need has likely gone but out of interest, I’m seeing different color spaces in pdfs of the same ID document. I can only extract parts of the whole image via stream dumper. I am seeing artifacts In the pieces but think that is due to the image mask compression.

BrunoFischer · October 7, 2022, 9:40am

Hello.

look at this Tool: PDF Tools | Didier Stevens

oder This for Microsoft Docs GitHub - decalage2/oletools: oletools - python tools to analyze MS OLE2 files (Structured Storage, Compound File Binary Format) and MS Office documents, for malware analysis, forensics and debugging.

Greets