The above mentioned screencasts teach the viewer in the use of his PDF tools. In a nutshell:
- Its is important to notice that the objects can be compressed and encoded by making use of Filters. This technique can be used to obfuscate the contents of the malicious file and hide them from the view of the antivirus. pdf-parser.py permits to revert these filters by using specific flags in the command line.
- Another interesting feature is the name type normalization, since the PDF standard permits to encode the characters in the Hex equivalent. This trick would also be useful for antivirus evasion when the engine does not understand the PDF language.
I have also found this old post that Didier wrote in 2008 to explain how a PDF file is structured. If I am not wrong, the example used in the first exercice is a simplified version of the one appearing in the blog post.