How to Reverse the Code
Although revealing the secret is always an appealing topic for any audience, Reverse
Engineering is a critical skill for programmers. Very few information security
professionals, incident response analysts and vulnerability researchers have the ability
to reverse binaries efficiently. You will undoubtedly be at the top of your professional field
(Infosec Institute).
It is like finding a needle in a dark night. Not everyone can be good at decompiling or reversing the code. See the roadmap to successfully reverse the code with tools but reverse engineering requires more skills and techniques.
Software reverse engineering means different things to different people. Reversing the software actually depends on the software itself. It can be defined as unpacking the packed, disassembling the assembled or decompiling the complied piece of code termed as software. Some people have also named it as Auditing the Binary or Malware Analysis. This depends on the motive.
Approach: Different Reversing Approaches.
There are many different approaches for reversing, and choosing the right one depends on the target
program, the platform on which it runs and on which it was developed, and what kind of information you’re looking to extract. Generally speaking, there are two fundamental reversing methodologies: offline analysis and live analysis.
Offline Code Analysis (Dead-Listing)
Offline analysis of code means that you take a binary executable and use a disassembler or a decompiler to convert it into a human-readable form.
Reversing is then performed by manually reading and analysing parts of that output.
Offline code analysis is a powerful approach because it provides a good outline of the program and makes it easy to search for specific functions that are of interest.
The downside of offline code analysis is usually that a better understanding of the code is required
(compared to live analysis) because you can’t see the data that the program deals with and how it flows. You must guess what type of data the code deals with and how it flows based on the code. Offline analysis is typically a more advanced approach to reversing.
There are some cases (particularly cracking-related) where offline code analysis is not possible. This
typically happens when programs are “packed”, so that the code is encrypted or compressed and is only unpacked in runtime. In such cases only live code analysis is possible.
Live Code Analysis
Live Analysis involves the same conversion of code into a human-readable form, but here you don’t just statically read the converted code but instead run it in a debugger and observe its behaviour on a live system.
This provides far more information because you can observe the program’s internal data and how it affects the flow of the code. You can see what individual variables contain and what happens when the program reads or modifies that data.
Generally, it is said that live analysis is the better approach for beginners because it provides a lot more data to work with. The section on “Need for Tools” discusses tools that can be used for live code analysis.
Need for Tools: which tool to select is based on the piece of software code you’re trying to reverse. There are many tools available on internet but key tools are IDA Pro & OllyDbg. IDA Pro is a wonderful tool with a number of functionalities; it can be used as debugger as well as disassembler.
On the other side OllyDbg is an assembler level analysing debugger for Microsoft® Windows®. Emphasis on binary code analysis makes it particularly useful in cases where source is unavailable.
Highlights of IDA Pro Functionalities
In my opinion IDA Pro is most powerfull tool and is mostly used in reverse engineering, its functionalities are vast in number, however, should highlight the key one:
Adding Dynamic Analysis to IDA
In addition to being a disassembler, IDA is also a powerful and versatile debugger. It supports multiple debugging targets and can handle remote applications, via a „remote debugging server”.
Power Cross-platform Debugging:
• Instant debugging, no need to wait for the analysis to be complete to start a debug session.
• Easy connection to both local and remote processes.
• Support for 64 bits systems and new connection possibilities.
Highlights of OllyDbg Functionalities
• It debugs multithread applications.
• Attaches to running programs
• Configurable disassembler supports both MASM and IDEAL formats
• MMX, 3DNow! And SSE data types and instructions, including Athlon extensions.
• It recognizes complex code constructs, like call to jump to procedure.
• Decodes calls to more than 1900 standard API and 400 C functions.
High Level Reverse Engineering Methodology
As per Information Risk Management PLC, high level Reverse Engineering can be divided into three quick steps. This methodology is the culmination of exiting tools and techniques within the IT Security research community, presenting the ways to identify process operation at a higher-level of abstraction than traditional binary reversing.
In this methodological approach attention is on application DLLs and functions implemented. Following this approach the researcher is free to explore and take any further steps as desired.
When analysing this way the researcher can focus attention on functions that appear more “interesting” from information security point of view.
A Practical Example
A practical example while working on this methodology as explained below.
• Functionality Explored: Microsoft Fingerprint Reader (manufactured by Digital Persona)
• Tools Required: Universal Hooker (uhooker by Core Security Technologies), Interactive Disassembler (IDA) and the OllyDbg debugger.
It is assumed that the reader is familiar with these tools; further information on how to use these tools can be obtained on the vendor website. I have already explained a bit about IDA and OllyDbg, Uhooker is a tool to intercept execution of programs. It enables the user to intercept calls to API Functions inside the DLL and also arbitrary addresses within the executable file in the Memory. Uhooker builds on the idea that the function handling the hook is the one with knowledge about parameter types of the function it is handling. Uhooker is implemented as an OllyDbg plug-in, which takes care of function hooking using software breakpoints.
Phase 1: Identify Relevant Components
This first phase demands the investigation of the core component of the target; in this case it is Microsoft Fingerprint Reader. A number of methods can be applied for identifying core components of Microsoft Fingerprint Reader at this level. The noticeable start point for us would be to include the device drivers that are used, in Windows case the operating system itself provides much information on the device drivers and their system location, it’s only the matter of knowing it as shown in Figure 5.
Here we can identify different DLLs and device drivers that are used to control the device, this will serve as a good starting point to our High Level understanding of device and the system operation.
Typically, the next step includes examination of system interaction with the underlying operating system. Again, a number of tools exists for this purpose – well known tools such as Sysinternal tools, regmon,filemon and process explorer, provide great deal of possibility for exploring process interaction with registry, file system and the other processes respectively. Here, knowledge about DLL Mapping is the essential, which I highlighted in the beginning refer 003 – DLL Mapping.
Note
Findings from this step should be documented by the researcher as they will form the basis of later phases. In the above example the following table presents some of the findings (Table 1).
The minor information leakages in the filenames can be very useful for identifying the functionality of the system, and in this case DPHost.exe looks like the core process. We will further proceed by attaching the debugger to the interesting process. OllyDbg’s Executable Modules Window will list all executable modules currently loaded by the debugged process. Figure 6 is an example for this.
Phase 2: Identifying Relevant Component Functions
This is the analysis of components identified in the previous phase to dig out function level information from the components. We will again need help of various tools for this. Here, we are interested in identifying named and exported functions and the virtual memory addresses for specified DLL files .DLL Export View can be used as presented in Figure 7.
Phase 3: High Level Functional Analysis
This is nothing but the high level analysis of the function code that you should be able to obtain in the form of assembly language. For this OllyDbg is the best tool. By using such tools it’s all GUI. A simple click can quickly put machine language in front of you. However, you must be experienced with assembly language to make it useful.
A quick snapshot of Functional Analysis I have taken for from OllyDbg tool is presented in Figure 8.
Next Steps
You can further extend your study to parameter analysis of functions, variable analysis and then input
validation and boundary checks. However, you should be good enough in performing 005 – Crash Analysis. This analysis forms the basis for vulnerability analysis resulting in identification of loop holes in the software code.
Conclusion
Reverse engineering is a critical skill, and this article just highlights the steps, approach and a high-level methodology of how to kick off reverse engineering of the software code. Remember that all code was created by a brain, and only a brain can decode it; tools are the hands on the typewriter.
Although revealing the secret is always an appealing topic for any audience, Reverse
Engineering is a critical skill for programmers. Very few information security
professionals, incident response analysts and vulnerability researchers have the ability
to reverse binaries efficiently. You will undoubtedly be at the top of your professional field
(Infosec Institute).
It is like finding a needle in a dark night. Not everyone can be good at decompiling or reversing the code. See the roadmap to successfully reverse the code with tools but reverse engineering requires more skills and techniques.
Software reverse engineering means different things to different people. Reversing the software actually depends on the software itself. It can be defined as unpacking the packed, disassembling the assembled or decompiling the complied piece of code termed as software. Some people have also named it as Auditing the Binary or Malware Analysis. This depends on the motive.
Approach: Different Reversing Approaches.
There are many different approaches for reversing, and choosing the right one depends on the target
program, the platform on which it runs and on which it was developed, and what kind of information you’re looking to extract. Generally speaking, there are two fundamental reversing methodologies: offline analysis and live analysis.
Offline Code Analysis (Dead-Listing)
Offline analysis of code means that you take a binary executable and use a disassembler or a decompiler to convert it into a human-readable form.
Reversing is then performed by manually reading and analysing parts of that output.
Offline code analysis is a powerful approach because it provides a good outline of the program and makes it easy to search for specific functions that are of interest.
The downside of offline code analysis is usually that a better understanding of the code is required
(compared to live analysis) because you can’t see the data that the program deals with and how it flows. You must guess what type of data the code deals with and how it flows based on the code. Offline analysis is typically a more advanced approach to reversing.
There are some cases (particularly cracking-related) where offline code analysis is not possible. This
typically happens when programs are “packed”, so that the code is encrypted or compressed and is only unpacked in runtime. In such cases only live code analysis is possible.
Live Code Analysis
Live Analysis involves the same conversion of code into a human-readable form, but here you don’t just statically read the converted code but instead run it in a debugger and observe its behaviour on a live system.
This provides far more information because you can observe the program’s internal data and how it affects the flow of the code. You can see what individual variables contain and what happens when the program reads or modifies that data.
Generally, it is said that live analysis is the better approach for beginners because it provides a lot more data to work with. The section on “Need for Tools” discusses tools that can be used for live code analysis.
Need for Tools: which tool to select is based on the piece of software code you’re trying to reverse. There are many tools available on internet but key tools are IDA Pro & OllyDbg. IDA Pro is a wonderful tool with a number of functionalities; it can be used as debugger as well as disassembler.
On the other side OllyDbg is an assembler level analysing debugger for Microsoft® Windows®. Emphasis on binary code analysis makes it particularly useful in cases where source is unavailable.
Highlights of IDA Pro Functionalities
In my opinion IDA Pro is most powerfull tool and is mostly used in reverse engineering, its functionalities are vast in number, however, should highlight the key one:
Adding Dynamic Analysis to IDA
In addition to being a disassembler, IDA is also a powerful and versatile debugger. It supports multiple debugging targets and can handle remote applications, via a „remote debugging server”.
Power Cross-platform Debugging:
• Instant debugging, no need to wait for the analysis to be complete to start a debug session.
• Easy connection to both local and remote processes.
• Support for 64 bits systems and new connection possibilities.
Highlights of OllyDbg Functionalities
• It debugs multithread applications.
• Attaches to running programs
• Configurable disassembler supports both MASM and IDEAL formats
• MMX, 3DNow! And SSE data types and instructions, including Athlon extensions.
• It recognizes complex code constructs, like call to jump to procedure.
• Decodes calls to more than 1900 standard API and 400 C functions.
High Level Reverse Engineering Methodology
As per Information Risk Management PLC, high level Reverse Engineering can be divided into three quick steps. This methodology is the culmination of exiting tools and techniques within the IT Security research community, presenting the ways to identify process operation at a higher-level of abstraction than traditional binary reversing.
In this methodological approach attention is on application DLLs and functions implemented. Following this approach the researcher is free to explore and take any further steps as desired.
When analysing this way the researcher can focus attention on functions that appear more “interesting” from information security point of view.
A Practical Example
A practical example while working on this methodology as explained below.
• Functionality Explored: Microsoft Fingerprint Reader (manufactured by Digital Persona)
• Tools Required: Universal Hooker (uhooker by Core Security Technologies), Interactive Disassembler (IDA) and the OllyDbg debugger.
It is assumed that the reader is familiar with these tools; further information on how to use these tools can be obtained on the vendor website. I have already explained a bit about IDA and OllyDbg, Uhooker is a tool to intercept execution of programs. It enables the user to intercept calls to API Functions inside the DLL and also arbitrary addresses within the executable file in the Memory. Uhooker builds on the idea that the function handling the hook is the one with knowledge about parameter types of the function it is handling. Uhooker is implemented as an OllyDbg plug-in, which takes care of function hooking using software breakpoints.
Phase 1: Identify Relevant Components
This first phase demands the investigation of the core component of the target; in this case it is Microsoft Fingerprint Reader. A number of methods can be applied for identifying core components of Microsoft Fingerprint Reader at this level. The noticeable start point for us would be to include the device drivers that are used, in Windows case the operating system itself provides much information on the device drivers and their system location, it’s only the matter of knowing it as shown in Figure 5.
Here we can identify different DLLs and device drivers that are used to control the device, this will serve as a good starting point to our High Level understanding of device and the system operation.
Typically, the next step includes examination of system interaction with the underlying operating system. Again, a number of tools exists for this purpose – well known tools such as Sysinternal tools, regmon,filemon and process explorer, provide great deal of possibility for exploring process interaction with registry, file system and the other processes respectively. Here, knowledge about DLL Mapping is the essential, which I highlighted in the beginning refer 003 – DLL Mapping.
Note
Findings from this step should be documented by the researcher as they will form the basis of later phases. In the above example the following table presents some of the findings (Table 1).
The minor information leakages in the filenames can be very useful for identifying the functionality of the system, and in this case DPHost.exe looks like the core process. We will further proceed by attaching the debugger to the interesting process. OllyDbg’s Executable Modules Window will list all executable modules currently loaded by the debugged process. Figure 6 is an example for this.
Phase 2: Identifying Relevant Component Functions
This is the analysis of components identified in the previous phase to dig out function level information from the components. We will again need help of various tools for this. Here, we are interested in identifying named and exported functions and the virtual memory addresses for specified DLL files .DLL Export View can be used as presented in Figure 7.
Phase 3: High Level Functional Analysis
This is nothing but the high level analysis of the function code that you should be able to obtain in the form of assembly language. For this OllyDbg is the best tool. By using such tools it’s all GUI. A simple click can quickly put machine language in front of you. However, you must be experienced with assembly language to make it useful.
A quick snapshot of Functional Analysis I have taken for from OllyDbg tool is presented in Figure 8.
Next Steps
You can further extend your study to parameter analysis of functions, variable analysis and then input
validation and boundary checks. However, you should be good enough in performing 005 – Crash Analysis. This analysis forms the basis for vulnerability analysis resulting in identification of loop holes in the software code.
Conclusion
Reverse engineering is a critical skill, and this article just highlights the steps, approach and a high-level methodology of how to kick off reverse engineering of the software code. Remember that all code was created by a brain, and only a brain can decode it; tools are the hands on the typewriter.