Deep Inside Malicious PDF
Nowadays, people share documents all the time and most of the attacks are based on client side attacks and target applications that exist in the user’s, or employee’s OS. From one single file, the attacker can compromise a large network. PDF is the most common sharing file format, due to the fact that PDFs can include active content, and are passed within the enterprise and across networks. In this article, we will analyze ways to catch malicious PDF files.
When we start to check the PDF files that exist in our PC or laptop, we may use an antivirus scanner but these days it might not be good enough to detect a malicious PDF that contains a shell code because the attacker mostly encrypts its content to bypass the antivirus scanner and, many times, targets a zero day vulnerability that exists in Adobe Acrobat reader or a version that has not been updated. Figure 1 shows how PDF vulnerabilities are rising every year.
Before we start to analyze malicious PDFs, we are going to have a simple look at PDF structures so we can understand how the shell code works and where it is located.
PDF components
PDF documents contains four main parts (one-line header, body, cross-reference table and trailer).
PDF Header
The first line of the PDF shows the PDF format version, the most important line that gives you the basic information of the PDF file; for example, “%PDF-1.4 means that file fourth version.
PDF Body
The body of the PDF file consists of objects that compose the contents of the document. These objects include fonts, images, annotations, and text streams, and the user can include invisible objects or elements. These objects can interact with PDF features like animation and security features. The body of the PDF supports two types of numbers (integers, real numbers).
The Cross-Reference Table (xref table)
The cross- reference table contains links of all objects and elements that exist in the file format. You can use this feature to see content on other pages (when the users update the PDF, the cross-reference table gets updated automatically).
The Trailer
The trailer contains links to the cross-reference table and always ends up with %%EOF to identify the end of a PDF file. The trailer enables a user to navigate to the next page by clicking on the link provided.
Malicious PDF through Metasploit
Now after we have taken a tour inside PDF file format and what it contains, we will start to install an old version of Adobe Acrobat reader 9.4.6 and 10 through to 10.1.1 that will be vulnerable to Adobe U3D Memory Corruption Vulnerability.
These exploits exist in Metasploit framework so we are going to create the malicious PDF and analyze it in KALI Linux distribution. Start by opening the terminal and type msfconsole (Figure 2). As shown in the picture below, we are going to set some Metasploit variables to be sure that everything is working fine.
*After choosing the exploit type, we are going to choose the payload that will execute during exploitation in the remote target and open Meterpreter session.
*choose the LHOST which is our IP address and we can view through typing ifconfig in new terminal
*finally we type exploit to create the PDF file with configuration we created before
The file has been saved on /root/.msf4/local.
So we are going to move the file to the desktop to make it easier to locate when typing it in the terminal
root@kali :~# cd /root/.msf4/local
root@kali :~# mv msf.pdf /root/Desktop
PDFid
Now we are going to use pdfid to see what the PDF contains of elements and objects and JavaScript
and see if there is something interesting to analyze (Figure 3).
The PDF has only one page, maybe it’s normal. There are several JavaScript objects inside… this is very strange. There is also an OpenAction object which will execute this malicious JavaScript.
So we are going to use peepdf.
Peepdf
Peepdf is a Python tool that is very powerful for PDF analysis. The tool provides all the necessary
components that security researchers need for PDF analysis without using many tools. It supports
encryption, Object Streams, Shellcode emulation, Javascript Analysis, and for Malicious PDFs, it shows potential Vulnerabilities, Shows Suspicious Elements, Powerful Interactive Console, PDF Obfuscation (bypassing AVs), Decoding: hexadecimal – ASCII and HEX search (Figure 4).
Analysis
To start analysis, go to the directory of the PDF file then start with syntax /usr/bin/peepdf –f msf.pdf.
We use –f option to avoid errors and force the tool to ignore them (Figure 5).
This is the default output but we see some interesting things. The first one we see is the highlighted one,object 15 contains JavaScript code, and we have also one object 4 that contains two executing elements (/AcroForm & /OpenAction), and the last one is /U3D showing us a Known Vulnerability. For now we will start to explore these objects by getting an interactive console by typing syntax /usr/bin/peepdf –i msf.pdf (Figure 6).
The tree commands shows the logical structure of the file, and starting explore object 4 (/AcroForm) (Figure 7).
As we see in the picture above, when we type object 4, it gave you another object to explore. For now, we didn’t see any important information or anything that seems suspicious except object 2 (XFA array) that gave us the element <fjdklsaj fodpsaj fopjdsio> and seems to us not to contain anything special.
Let’s move to the another object (Open Action) (Figure 8).
Now we can see the JavaScript code that will be executed when the PDF file is opened.
The other part of the JavaScript code is barely obfuscated like writing some variables in hex and in this code we can see a heap spraying with shell code plus some padding bytes. The attackers typically use unicode to encode their shell code and then use the unescape function to translate the unicode representation to binary content (now we are sure that it is definitely a malicious PDF) (Figure 9).
Defend
We defend our network from that type of malicious file by providing strong e-mail and web filters, IPS and by application control: disable JavaScript and disable PDF rendering in browsers, block PDF readers from accessing the file system and network resources, and overall security awareness.
Conclusion
We’ve taken a tour of the PDF file format structure and what it contains and we’ve seen how to detect
a malicious PDF and know where and how to locate suspicious objects and show the JavaScript code,
and finally, know how to defend our network.
Nowadays, people share documents all the time and most of the attacks are based on client side attacks and target applications that exist in the user’s, or employee’s OS. From one single file, the attacker can compromise a large network. PDF is the most common sharing file format, due to the fact that PDFs can include active content, and are passed within the enterprise and across networks. In this article, we will analyze ways to catch malicious PDF files.
When we start to check the PDF files that exist in our PC or laptop, we may use an antivirus scanner but these days it might not be good enough to detect a malicious PDF that contains a shell code because the attacker mostly encrypts its content to bypass the antivirus scanner and, many times, targets a zero day vulnerability that exists in Adobe Acrobat reader or a version that has not been updated. Figure 1 shows how PDF vulnerabilities are rising every year.
Before we start to analyze malicious PDFs, we are going to have a simple look at PDF structures so we can understand how the shell code works and where it is located.
PDF components
PDF documents contains four main parts (one-line header, body, cross-reference table and trailer).
PDF Header
The first line of the PDF shows the PDF format version, the most important line that gives you the basic information of the PDF file; for example, “%PDF-1.4 means that file fourth version.
PDF Body
The body of the PDF file consists of objects that compose the contents of the document. These objects include fonts, images, annotations, and text streams, and the user can include invisible objects or elements. These objects can interact with PDF features like animation and security features. The body of the PDF supports two types of numbers (integers, real numbers).
The Cross-Reference Table (xref table)
The cross- reference table contains links of all objects and elements that exist in the file format. You can use this feature to see content on other pages (when the users update the PDF, the cross-reference table gets updated automatically).
The Trailer
The trailer contains links to the cross-reference table and always ends up with %%EOF to identify the end of a PDF file. The trailer enables a user to navigate to the next page by clicking on the link provided.
Malicious PDF through Metasploit
Now after we have taken a tour inside PDF file format and what it contains, we will start to install an old version of Adobe Acrobat reader 9.4.6 and 10 through to 10.1.1 that will be vulnerable to Adobe U3D Memory Corruption Vulnerability.
These exploits exist in Metasploit framework so we are going to create the malicious PDF and analyze it in KALI Linux distribution. Start by opening the terminal and type msfconsole (Figure 2). As shown in the picture below, we are going to set some Metasploit variables to be sure that everything is working fine.
*After choosing the exploit type, we are going to choose the payload that will execute during exploitation in the remote target and open Meterpreter session.
*choose the LHOST which is our IP address and we can view through typing ifconfig in new terminal
*finally we type exploit to create the PDF file with configuration we created before
The file has been saved on /root/.msf4/local.
So we are going to move the file to the desktop to make it easier to locate when typing it in the terminal
root@kali :~# cd /root/.msf4/local
root@kali :~# mv msf.pdf /root/Desktop
PDFid
Now we are going to use pdfid to see what the PDF contains of elements and objects and JavaScript
and see if there is something interesting to analyze (Figure 3).
The PDF has only one page, maybe it’s normal. There are several JavaScript objects inside… this is very strange. There is also an OpenAction object which will execute this malicious JavaScript.
So we are going to use peepdf.
Peepdf
Peepdf is a Python tool that is very powerful for PDF analysis. The tool provides all the necessary
components that security researchers need for PDF analysis without using many tools. It supports
encryption, Object Streams, Shellcode emulation, Javascript Analysis, and for Malicious PDFs, it shows potential Vulnerabilities, Shows Suspicious Elements, Powerful Interactive Console, PDF Obfuscation (bypassing AVs), Decoding: hexadecimal – ASCII and HEX search (Figure 4).
Analysis
To start analysis, go to the directory of the PDF file then start with syntax /usr/bin/peepdf –f msf.pdf.
We use –f option to avoid errors and force the tool to ignore them (Figure 5).
This is the default output but we see some interesting things. The first one we see is the highlighted one,object 15 contains JavaScript code, and we have also one object 4 that contains two executing elements (/AcroForm & /OpenAction), and the last one is /U3D showing us a Known Vulnerability. For now we will start to explore these objects by getting an interactive console by typing syntax /usr/bin/peepdf –i msf.pdf (Figure 6).
The tree commands shows the logical structure of the file, and starting explore object 4 (/AcroForm) (Figure 7).
As we see in the picture above, when we type object 4, it gave you another object to explore. For now, we didn’t see any important information or anything that seems suspicious except object 2 (XFA array) that gave us the element <fjdklsaj fodpsaj fopjdsio> and seems to us not to contain anything special.
Let’s move to the another object (Open Action) (Figure 8).
Now we can see the JavaScript code that will be executed when the PDF file is opened.
The other part of the JavaScript code is barely obfuscated like writing some variables in hex and in this code we can see a heap spraying with shell code plus some padding bytes. The attackers typically use unicode to encode their shell code and then use the unescape function to translate the unicode representation to binary content (now we are sure that it is definitely a malicious PDF) (Figure 9).
Defend
We defend our network from that type of malicious file by providing strong e-mail and web filters, IPS and by application control: disable JavaScript and disable PDF rendering in browsers, block PDF readers from accessing the file system and network resources, and overall security awareness.
Conclusion
We’ve taken a tour of the PDF file format structure and what it contains and we’ve seen how to detect
a malicious PDF and know where and how to locate suspicious objects and show the JavaScript code,
and finally, know how to defend our network.