Skip to main content

Hacking - Best OF Reverse Engineering - Part22

Next Generation of Automated Malware Analysis and Detection

In the last ten years, malicious software – malware – has become increasingly sophisticated,
both in terms of how it is used and what it can do. This rapid evolution of malware is
essentially a cyber “arms race” run by organizations with geopolitical agendas and profit
motives. The resulting losses for victims have run to billions of dollars.

The global move to digitize personal and sensitive information as well as to computerize and interconnect critical infrastructure has far outpaced the capabilities of the security measures that have been put into place. As a result, cyber criminals can act with near impunity as they break into networks to steal data and hijack resources. It is difficult to stop their criminal malware and nearly impossible to track them down after an attack has been perpetrated. What we see is that today’s network defenses are aggressively evaded by malware that is even moderately advanced. Why is this? In order to answer this question, we first have to define advanced malware. The table below describes four key characteristics to explore in classifying malware.






























Based on these characteristics, we can now profile specific malware. The following chart illustrates the characteristics that separate today’s advanced malware from conventional malware (Figure 1).

If we look at an example like Operation Aurora, we see stealthy malware attacking a previously unknown vulnerability in Internet Explorer. Further, the criminals behind Aurora targeted a well- defined set of organizations and had a clear goal: the theft of email archives and other information. When it comes to the definitions of advanced malware, Aurora clearly meets all the criteria.

The scary part is that Aurora is not the most advanced example of today’s malware. Stuxnet and Zeus
showcase the continued refinement of malware tactics, leveraging multiple zero-day vulnerabilities and evolving over time.

For many organizations, IT security is made up of layers of firewalls, intrusion prevention systems (IPS) and antivirus software, deployed both in network gateways and desktops. Today, there are many variations of these technologies, including cloud-based alternatives. So why do today’s defenses fail when confronted with advanced malware, zero-day, and targeted APT attacks? The short answer for this question is “because they leverage insufficient malware analysis methods”.

Automated malware analysis – various approaches

Every protection solution present in our networks uses some methods of automated malware analysis.
They are designed to detect, classify and sometime to prevent malware. Of course one can ask about role of malware researchers. For the sake of this article I focus on automated systems while not forgetting about role of malware researchers and their difficult, strenuous work!

The very common categorization of automated malware analysis technologies is depicted in the Figure 2.




















The most important differentiator between static and dynamic approaches is knowledge about particular threat.

Static methods base on previous knowledge about attack while dynamic approach tries to find out whether the protected resources are under attack without previous experience.

Here are some examples of specific countermeasure products which leverage various malware analysis methods (Table 2).










Signatures and heuristics

The most popular method of malware detection is static analysis based on signatures. By signatures one should understand patterns like: hashes of files, regex definitions, SNORT rules, proprietary formats developed by security vendors. But not only those. Definition of signatures consists also of all types of lists – whitelists, blacklists, URL categories as well as static policies which define what has to be blocked and what is allowed based on specific parameters of traffic, processes, applications, etc. It is really broad scope of definitions of describing what exactly we are looking for.

Popularity of signatures results from:

• their simplicity – it is rather not big effort to create SHA-1 hash of known malware, of course after    maybe hours or days of discovering the malware. It is also relatively easy to accelerate speed of          analysis by implementing patterns in hardware

• accuracy – we get detailed description of what we are looking for

• long history of the technology development

• broad range of implementations in various types of security solutions.

Signatures are present in network protection layers, in the clouds as well as at endpoints. Signs of limitations of signatures were observed some longer time ago, though. The exponential growth of number of threats and their evolving nature using more sophisticated evasions techniques created a huge challenge for signaturebased only products.

Some vendors have tried to close the coverage gap outlined above by layering on heuristics-based filtering. Heuristics are essentially “educated guesses” based on behaviors or statistical correlations. They require fine-tuning to account for specific circumstances and to reduce error rates (or to increase confidence levels, statistically speaking).

Examples of the heuristics are reputation services, host intrusion prevention based on vulnerability description, static analysis of suspicious file, network anomaly detection, etc. Even if heuristics tend to be a good approach it has multiple limitations and usually causes high probability of false positives.

Let’s forget the limitations of heuristics for a while – even now we have to admit that heuristic in its nature is still very close to signature’s approach. Both technologies assume previous knowledge of the attacks or vulnerability... Without that knowledge we cannot describe rules for heuristics engine. It is important to get a sample of malware and details of vulnerability, analyse them (usually manually by malware researcher) and produce “description” of the threat which has to be distributed among security products finally. Less knowledge means more guessing and this approach leads us quickly to dead end of unacceptable number of false positives.

The following chart depicts the categories and interrelationships between various static analysis methods used by today’s malware network defense alternatives (Figure 3).























Heuristics it is not enough by itself, or even when layered with signature-based or list-based techniques. Because advanced malware shares some characteristics common to all modern software, heuristic developers are faced with a fundamental trade-off. To trigger on (or positively identify) the growing types of malware code, developers create broader sets of heuristics that will, by definition, increasingly encompass benign “good” software code.

Discrete objects analysis

It is by comparing the malware characteristics and the available malware defense mechanisms that the shortcomings become clear. As shown in the chart below advanced malware operates at the top of the malware chart, while the current generation of defenses operates at the bottom. Signature-based mechanisms react to known attacks and fail against unknown and stealthy attacks. Further, reputation, heuristics, and other correlating techniques cannot guard against targeted attacks, because, given the nature of these attacks, there is no existing data to correlate (Figure 4).

Quite simply, we are using outdated, conventional defenses to guard against cutting-edge, innovative
malware. In order to respond to growth of attacks and their complexity another approach came to the play some time ago.

It is known as sandboxing and for the sake of this article it is called “discrete objects analysis”.

The challenge addressed by this technique is as follow: let’s assume we don’t have any details about particular malware sample, so how can we determine if it is malicious or not in automated way? Discrete object analysis responds by running the sample in controlled environment to observe and detect its behavior. Based on the output from the sample’s behavior system is able to classify the object as malicious or not. It looks promising and in fact it is. However one should be aware of various constraints and challenges of this technique:

• problem of getting the right, most interesting sample to analyze – yes, we have to determine first        what is more suspicious and what is less at least in order to balance resources of our system and          allow as much as possible real-time response. Second – how to obtain the sample from the real            network connections and put  it properly for analysis? It requires at least some network awareness      and real-time traffic filtering in place.Sandboxes usually lack an efficient and automated way of          obtaining samples from the real network

•  virtualization of the analysis environment – is it really a constraint of the system or rather an
   advantage? Both. Virtualization allows more efficient usage of hardware platform. It simplifies
   management of analysis processes – a virtual machine can be quickly and easily created, run and
   stopped. However, as sandboxes leverage usually off-the-shelf hypervisors, it is impossible to
   incorporate malware analysis into them and look at the malware behavior from the “hardware”
   perspective. And it really matters! Especially as we are facing malware which does everything to         hide itself from being analyzed and detected by any other process running in the operating system.     We are also losing control over malware’s attempts to recognize the type of environment and to           evade detection by using system dependent functions. We observe many advanced attacks doing         this nowadays. If the sample recognizes a known virtual environment, it changes its behavior and       hides the real nature of the attack, thus is not detected as malicious.

• it cannot analyze ANY file type – and the problem is not only related to missing appropriate                application which is needed to open the file. The most important concern is related to well known      file types but obfuscated to avoid their recognition and opening. From the discrete object analysis        perspective they cannot be determined as malicious or not in reliable way. It causes false negatives –  malware is not detected. Unfortunately obfuscation of malware files is broadly used technique by        advanced threats nowadays and it really impacts usability of such detection methods.

So how to address the challenges of discrete objects analysis and allow efficient method of protection
against modern malware? To answer this question let’s return to the roots of the advanced malware.

Operation Aurora – father of advanced threats

Most of the readers of the article are aware of the Operation Aurora attack. It is one of the most
famous attacks detected in last few years. Detailed descriptions of the Aurora attack are available in the Internet. Aurora was detected in the end of 2009 and its details were disclosed in the beginning of 2010. Since that time public awareness of so called Advanced Persistent Threats (APT) or Targeted Persistent Threats (TPT) raises.

Surprisingly or not but variations of the original Aurora attacks are still in use, are very popular and are still very challenging to discover. Characteristics of Aurora attack, including the attack stages and exploitation through obfuscated Java Script, define advanced malware nowadays.

Anatomy of the attack

The anatomy of advanced persistent threats varies just as widely as the victims they target. However,
cybersecurity experts researching APTs over the past five years have unveiled a fairly consistent attack life cycle consisting of five distinct stages:

• Stage 1: Initial intrusion through system exploitation

• Stage 2: Malware is installed on compromised system

• Stage 3: Outbound connection (callback) is initiated

• Stage 4: Attacker spreads laterally

• Stage 5: Compromised data is extracted

The most effective methods to discover and prevent attack focus on stages 1-3. Later stages could lead to another challenges like encryption of extracted data, scale of investigation needed when malware exists on multiple hosts, etc.

Exploitation

System exploitation is the first stage of an APT attack to compromise a system in the targeted organization. By successfully detecting when a system exploitation attempt is underway, identification and mitigation of the APT attack is much more straightforward. If your malware analysis system cannot detect the initial system exploitation, mitigating the APT attack becomes more complicated because the attacker has now successfully compromised the endpoint, can disrupt endpoint security measures, and hide his actions as malware spreads within the network and calls back out of the network. System exploits are typically delivered through the Web (remote exploit) or through email (local exploit) as an attachment. The exploit code compromises the vulnerable OS or application enabling an attacker to run code, such as connect-back shellcode to call back to CnC servers and download more malware which moves the attack to second stage. In case of Aurora attack the exploit was based on obfuscated Java Script which leveraged IE 6 vulnerability.

Malware installation

Once a victim system is exploited, arbitrary code is executed enabling malware to be installed on the
compromised system. In case of Aurora attack and many nowadays attacks the downloaded malware is obfuscated. Even if they use just XOR function, the deobfuscation requires knowledge about an algorithm and keys used to evade file recognition. In real attack scenario the deobfuscation is typically initiated by the exploit which emphasizes even more the importance of exploit detection.

Callbacks

The malware installed during the prior stage often contains a remote administration tool, or RAT. Once up and running, the RAT “phones home” by initiating an outbound connection (callback) between the infected computer and a CnC server operated by the APT threat actor. Such callbacks are made often over widely allowed protocols like HTTP thus bypassing firewalls. Once the RAT has successfully connected to the CnC server, the attacker has full control over the compromised host. Future instructions from the attacker are conveyed to the RAT through one of two means – either the CnC server connects to the RAT or vice versa. The latter is usually preferred as a host initiating an external connection from within the network is far less suspicious. The Figure 5 and Figure 6 depict details of behavior analysis of Aurora attack in automated malware analysis system.











































Following the output from automated analysis system we can identify stages of the attack since initial
exploitation. How is it possible that the system is able to detect and correlate information from various stages of attack? The answer is related to Next Generation Threat Protection tools which bring automated malware analysis to higher level of efficiency and accuracy.

Next generation of automated malware analysis and detection

Next generation of automated malware analysis (so called Next Generation Threat Protection – NGTP) was developed to overcome discrete object analysis problems. It targets modern malware without using signatures. The key differentiators of NGTP are described in Figure 7.











Aggressive packet capturing

Direct access to network traffic for automated analysis system allows aggressive packet capturing, deep packet inspection and traffic recognition. Based on the collected packets system combines sessions and provide them to further steps of analysis.

Proprietary virtual environment

Multiple virtual machines run over proprietary hypervisor designed to analyze malware behavior from “hardware” perspective in real time. This solution minimizes the risk of “abnormal” malware’s behavior when virtual environment is discovered but also increases accuracy of “zero-day” attack recognition which can use new methods of hiding its presence in breached system.

Analysis of attack stages in opposite to discrete object analysis

The sessions collected during aggressive packet capturing phase are replayed in the virtual environment. As a result the analysis engine can control all stages of the attacks – from exploit detection, through malware payload download and start up to callback attempts recognition. In short, the attack, not only the discrete object, is executed in an instrumented environment allowing analysis from the same perspective as a “real user” opening a connection and downloading content. It also becomes possible now to analyse the obfuscated malicious file as it is unhidden by the exploit phase in the same way as it would happen on a real host.

Discovery of callbacks

In addition to analysis of attack attempts the system leverages aggressive packet capturing and deep
inspection to filter out outbound communications across multiple protocols. It complements the attacks analysis by discovering hosts which are already infected. Callbacks are identified as malicious based on the unique characteristics of the communication protocols employed, rather than just the destination IP or domain name.
























Offer a Cohesive View of Protocols and Threat Vectors

To effectively combat next-generation threats, NGTP has the intelligence to assess threats across vectors, including Web and email. It is possible through real-time analysis of URLs, email attachments, binaries transiting over multiple protocols, and Web objects. This is a critical requirement for guarding against spear phishing.

Yield Timely, Actionable Malware Intelligence and Threat Forensics

Once malicious code has been analyzed in detail, the information gathered can be fully leveraged
in order to identify infection of particular hosts and shared the knowledge about new threat (Figure 8).

The above diagram depicts main components of Next Generation malware analysis system. One can
find out quickly that the new approach extends to discrete object analysis by adding sessions replaying, direct collection of the traffic from protected network and leveraging instrumented environment based on a proprietary hypervisor. It should be pointed out here that almost all kinds of Dynamic Malware Analysis are focused on specific incidents related to advanced malware technologies. They complement existing legacy protection systems instead of replacing them. We are all aware of static analysis limitations, however, signature-based solutions still play their role of filtering out volume-based, already-known attacks.

Conclusion

The common approach of malware detection systems based on static analysis leveraging signatures has led to their collective collapse underneath the avalanche of vulnerabilities and exploit techniques. It is clear that the threat landscape will continue to change at a rapid pace, in ways we cannot dream of, just as we cannot dream of all the ways technology will be used in the future. Malware analysis and protection against attacks is a never-ending game of cat and mouse. Thanks to the evolution of malware analysis systems and better understanding of modern threats we are much better equipped for successfully chasing the mouse. Next Generation Threat Protection systems are available in the market already bringing sophisticated tools of malware detection and prevention to every organization. I treat deployment of NGTP solutions as a next step in evolution of security systems like other important extensions which happened in the past. And this is really important step to take in order to be prepared for modern attacks and avoid becoming next victim.

Popular posts from this blog

Haking On Demand_WireShark - Part 5

Detect/Analyze Scanning Traffic Using Wireshark “Wireshark”, the world’s most popular Network Protocol Analyzer is a multipurpose tool. It can be used as a Packet Sniffer, Network Analyser, Protocol Analyser & Forensic tool. Through this article my focus is on how to use Wireshark to detect/analyze any scanning & suspect traffic. Let’s start with Scanning first. As a thief studies surroundings before stealing something from a target, similarly attackers or hackers also perform foot printing and scanning before the actual attack. In this phase, they want to collect all possible information about the target so that they can plan their attack accordingly. If we talk about scanning here they want to collect details like: • Which IP addresses are in use? • Which port/services are active on those IPs? • Which platform (Operating System) is in use? • What are the vulnerabilities & other similar kinds of information. • Now moving to some popular scan methods and ho

Bypassing Web Application Firewall Part - 2

WAF Bypassing with SQL Injection HTTP Parameter Pollution & Encoding Techniques HTTP Parameter Pollution is an attack where we have the ability to override or add HTTP GET/POST parameters by injecting string delimiters. HPP can be distinguished in two categories, client-side and server-side, and the exploitation of HPP can result in the following outcomes:  •Override existing hardcoded HTTP parameters  •Modify the application behaviors   •Access and potentially exploit uncontrollable variables  • Bypass input validation checkpoints and WAF rules HTTP Parameter Pollution – HPP   WAFs, which is the topic of interest, many times perform query string parsing before applying the filters to this string. This may result in the execution of a payload that an HTTP request can carry. Some WAFs analyze only one parameter from the string of the request, most of the times the first or the last, which may result in a bypass of the WAF filters, and execution of the payload in the server.  Let’s e

Bypassing Web Application Firewall Part - 4

Securing WAF and Conclusion DOM Based XSS DOM based XSS is another type of XSS that is also used widely, and we didn’t discuss it in module 3. The DOM, or Document Object Model, is the structural format used to represent documents in a browser. The DOM enables dynamic scripts such as JavaScript to reference components of the document such as a form field or a session cookie, and it is also a security feature that limits scripts on different domains from obtaining cookies for other domains. Now, the XSS attacks based on this is when the payload that we inject is executed as a result of modifying the DOM environment in the victim’s browser, so that the code runs in an unexpected way. By this we mean that in contrast with the other two attacks, here the page that the victim sees does not change, but the injected code is executed differently because of the modifications that have been done in the DOM environment, that we said earlier. In the other XSS attacks, we saw the injected code was