Hello,
Looking for good PDF tool to analyze who is the author of the PDF file and modifications made for e-filling acknowledgement during few days span …
I find PDF Stream Dumper by David Zimmer and PDF CanOpener quite useful in PDF analysis.
A powerful text editor is also a must—and, 010 Editor has a binary template for the PDF format by Didier Stevens and Christian Mehlmauer which makes it a good option for reviewing PDF structures.
010 Editor — PDF Binary Template
Didier Stevens also has a few Python scripts that target PDF documents here:
2023 Update:
As of version 2.1.11, Forensic Email Intelligence has powerful PDF analysis capabilities, including the ability to review multiple metadata streams and extract timing information from PDF components, such as annotations. Details and video walkthrough:
2024 Update:
This shell script can also be useful for automating some examination tasks on PDFs:
I hope other examiners in the Community have some additional recommendations for you ![]()
I Second Stream dumper and Didier Stevens work. Stream dumper doesn’t show/ deal with image masks. I’ve tried Jpedal without result. Anyone been able to extract an original jpeg image from multiple image masks in a pdf?
I haven’t had to do these sort of forensics outside a CTF. What’s the real world use case here? What are you seeing?
Because many PDFs come in via email I start with Metaspike FEI. It lets me see attachments (often other emails) and extract the attachments, looking for PDFs, from the EMLs.
Then I use PdfWalker which comes in in Lenny Zeltzer’s Remnux REMnux (REMnux® | SANS Institute) to pull PDFs apart and save embedded JFIFs.
Phil Harvey’s Exiftool gives useful info on the various extracted files and gives a good view of producing software, dates, etc. (https://exiftool.org/). For PDFs, the most stable dates are CreateDate and ModifyDate. A CSV output with rows/columns transformed in Excel gives a human readable summary of metadata.
Good hunting!
Hi. Thanks. I’ve requested a wet copy so the need has likely gone but out of interest, I’m seeing different color spaces in pdfs of the same ID document. I can only extract parts of the whole image via stream dumper. I am seeing artifacts In the pieces but think that is due to the image mask compression.
I know this is a rather old thread. But check out my PDFRecon.
Thanks for sharing @Rasmus_Riis_Kristens! We aim to keep this thread up-to-date and appreciate your contribution. ![]()