Detecting Attacks and Threats in Elastic Cloud Infrastructures: the Case of Sidechannel Attacks
Cloud computing adoption is rising fast. Flexibility, pay-per-use and available resources ondemand
with the promise of lower ownership costs are a very attractive value proposition.
Virtualization is just a part of Cloud Computing, which leverages virtualization in order to provide
resources in an on-demand and pay-per-use fashion.
According to the official NIST definition, Cloud Computing can be classified into three main models:
• IaaS (Infrastructure as a Service): users have access to on-demand virtual machines and storage which are provided from large shared data centers;
• PaaS (Platform as a Service): what is offered by IaaS plus ready-to-use development environment;
• SaaS (Software as a Service): on-demand software hosted on a remote server.
In this article we will focus on the main security concerns arising in IaaS environment, in particular multitenancy.
Multi-tenancy
The wide adoption of Cloud technologies has brought enormous advantages but also several new security concerns. One of the main causes for these new threats is multi-tenancy. Cloud Computing widespreads new scenarios in which users’ applications and data share the same physical host.
Because of co-residency, an attacker has a new way to access the victim’s data: he can leverage the co-residency factor and infer victim’s data by observing the activity of a shared component (e.g. the processor cache) on the physical host.
This new class of attacks is called “Side-channel attacks”. While side-channel attacks have been developed for years, for example on the smartcard field, their application to cloud computing infrastructures is just starting.
Problem Statement
Side-channel attacks
Before performing a side-channel attack, the attacker needs to create a virtual machine and determine if it is running on the same physical host of the victim. In order to do so, he has to perform the so called “coresidency”. The authors of [9] defined three possible ways to detect co-residency:
• matching Dom0 IP address: if the Dom0 IP address of two machines is equal, then co-residency has been achieved;
• measuring the round-trip time: if the round-trip time between two machines is very small, there is a high chance of co-residency;
• observing IP addresses: if two IP addresses are close enough, there is a high chance of co-residency.
On Amazon EC2, check #1 proved to be the most effective with a false-positive rate of zero. This means that check #1 is sufficient to determine co-residency.
If the result of the co-residency check is positive, the attacker can go ahead with the side-channel attack. Many techniques belonging to this class have been shown in the context of Cloud Computing Infrastructures, but until now, just one has been implemented and actually proved [8]. This technique can be classified as an “access-driven” attack since it manages to retrieve a given information which belongs to another user by constantly observing (and accessing) the activity of a shared physical component, the processor cache.
However, even if this kind of attacks has been widely treated in the literature, performing such an attack is very hard and there are several challenges to take into account:
• the attacker-VM needs to frequently monitor the status of the processor cache, so it needs to go in
execution often enough in order to make the observation granularity as fine-grained as possible;
• the attacker-VM has to clean the results and reduce the noise introduced by hardware and software
sources;
• last but not least, the attacker-VM has to be able to detect when the target VM is not running anymore on the same physical host.
The strategy adopted by the authors [8] to extract the wanted information can be summarized as follows:
• PRIME: the attacker fills the processor cache;
• IDLE: the attacker waits for a pseudo-random interval. During this interval the target VM is supposed to access the cache and thus change the content of some blocks;
• PROBE: when the attacker resumes the execution, it refills the cache in order to learn the activity of the target VM on the cache.
Once the attacker has collected a sufficiently high number of measurements, it can finally analyze these measurements and infer the encryption key used by the target VM.
During the analysis, the measurements achieved during the previous phase are converted to basic operations (operations performed during the execution of the target VM). This phase is called “cache pattern classifier”. Of course, doing this requires to know in advance the algorithm which is being executed on the target VM.
After classifying each operation, a special Markov Model (Hidden Markov Model) is used to remove
noises and apply some heuristics (based on knowledge of the algorithm) in order to reconstruct the possible execution paths. Finally, the attacker obtains a set containing all the possible encryption keys. This set is composed by few thousands of keys, so the attacker can use these keys to perform a brute-force attack.
The existence of such an attack is the proof that traditional security systems such as intrusion detection systems, antivirus, firewalls, etc. are not effective anymore in today’s virtualized and multi-tenant environment, that is the Cloud Infrastructure. In order to protect users against Cloud threats, a new approach is required and new techniques need to be developed.
Solution
In this section we will describe in detail all the characteristics of our solution and the rationale behind it. At a high level of abstraction, the architecture we propose is composed by three main components: a cloud provider, Elastic Detector and a SIEM.
The cloud provider provides an API which allows Elastic Detector to retrieve information on the status of the infrastructure and events which are useful for the detection of threats and attacks.
Elastic Detector operates as an intermediary and its duty is to handle elasticity and make it transparent to the SIEM system. Indeed, existing SIEM solutions do not take into account the elasticity of modern cloud infrastructures.
Thanks to Elastic Detector, the SIEM system can work as usual to analyze and correlate logs.
The final goal of such an architecture is to catch security-relevant events in order to detect threats and
ongoing attacks.
Elastic Detector
Cloud Computing brought, along with several benefits, a set of problems that changed the way we handle security and need to be addressed in order to meet security needs:
• Lack of visibility. IaaS is more dynamic than classical infrastructures, since servers, network and storage are launched for temporary usage or automatically. This makes it difficult to keep track of the availability of each server, network and storage as well as their security status.
• Security degradation over time. Modifications to an IaaS environment, such as starting new services, tests and starting new machines, generally reduce the level of protection of a system over time, which increases the risk of external and internal attacks.
• Manual configuration errors. Today, due to the complexity and dynamic nature of cloud computing
infrastructures security in such environments can no longer be handled manually.
• New attack vectors and threats. The capabilities and the flexibility of IaaS brings as well new threats
as the nefarious use of resources by malicious insiders or threats related to the virtualization and APIs
technologies.
Moreover, Cloud Computing brings also a new way to build and manage IT infrastructures. Compared to the traditional approaches, thanks to cloud technologies, such as APIs, infrastructures can become highly elastic. This means that the set of active virtual machines, the storage and the network topology can change very fast. As a consequence, the security system needs as well to react to these changes as fast as possible and adapt its configuration to the new scenario.
Due to the highly elastic and easy-to-use nature of Cloud Computing technologies, it’s getting easier for attackers to find vulnerable or not properly protected resources. As a proof, researchers [2] managed to retrieve a large amount of private data from public Amazon EC2 AMIs. The same kind of vulnerabilities has been found on Amazon S3 buckets by Rapid7 staff [3].
Our proposed solution to fulfill these requirements can be summarized as “virtual machine cloning”. Bycloning the virtual machine to be analyzed, we can perform very deep and intrusive security checks without impacting the performance of the applications in production. This way, the vulnerability assessment process can run smoothly and in a completely automatic way. It is worth pointing out that this solution is feasible thanks to the features of Cloud Computing. Indeed, in a traditional environment (e.g. on-the-premises data centers), it would not be possible to clone a machine on the fly and destroy the clone after performing all the required security checks.
From a practical point of view, cloning is the easiest solution to deploy. Indeed, no software (e.g. agent) has to be installed on virtual machines or within your cloud infrastructure. Every action can be performed by taking advantage of the API provided by the IaaS provider.
Cloning is also cost-effective since the cost of an additional virtual machine for a short period of time time is very low within IaaS infrastructures. Moreover, as the whole analysis is performed on a different machine, cloning avoids the risks of breaking applications and losing data.
Furthermore, new elastic and pay per use infrastructures bring higher percentages of stopped servers. These “dormant” servers constitute potential threats to the infrastructure as acknowledged by the Cloud Security Alliance. While stopped, the servers are not surveilled by agents or agentless solutions and they are not patched. They become weak links of your infrastructure when started. That’s why we propose to test and raise alerts in case of vulnerabilities in your dormant servers.
Auto-checks should be automatically set in order to monitor your IT cloud infrastructure. This is mandatory on a continuously changing infrastructure. Therefore, while your IT infrastructure evolves to answer your business needs, the right security checks are automatically set.
Our vision is that only a fully automated approach to security can cope with the elastic nature of new cloud infrastructures and their new threats.
SIEM
Detecting attacks in distributed environments, such as cloud infrastructures, often requires the capability of analyzing logs and correlating several events from different sources. Nowadays, the best approach to threat detection in distributed environments is to employ a SIEM (Security Information and Event Management) system.
A SIEM system is much more than an analyzer of logs. A SIEM system takes care of several aspects which can be summarized as follows:
• log and context data collection;
• normalization and categorization;
• correlation;
• notification/alerting;
• prioritization;
• dashboards and visualization;
• reporting and report delivery.
A SIEM system works with different kinds of data coming from different sources within one or more
networks. Starting from 2006 [6], when Cloud Computing appeared for the first time, a decentralization process has started and several companies have decided to move their own data and infrastructures to the cloud. Latest trend analysis [7] show that this tendency is not going to stop, so in the next few years, the number of companies adopting cloud computing technologies will be even higher than now. Due to the remote nature of the cloud computing, several new security concerns arise and companies are worried about the protection of their own data and infrastructures.
In this scenario, SIEM plays a very important role and needs to effectively operate in new cloud networks. Unfortunately, adopting SIEM systems in elastic cloud infrastructures is not an easy task. Indeed, SIEM systems were designed to operate in traditional (static) environments. Therefore, we need to adapt existing solutions in order to deal with elasticity. This means that a modern SIEM system should have the capability of automatically detecting the virtual machines running in the cloud, performing security checks on them and logging any security-relevant activity.
Nowadays, a system administrator needs to reconfigure his own SIEM system every time there is a
change (e.g. a new virtual machine starts running) in the cloud infrastructure. Configuring a SIEM
system is a very difficult task which requires the experience and skills of a security expert. Therefore, in highly dynamic infrastructures, reconfiguring the SIEM system every time there is a change in the cloud infrastructure would not be practical.
A possible alternative could be to install agents on each virtual machine running in the cloud but this
approach has various drawbacks affecting especially performance.
Elastic Detector, which is our flagship product, can be an easy and effective solution to make the deployment of a SIEM for the cloud much faster. Moreover, Elastic Detector could easily aggregate all the data retrieved by performing auto-checks in the cloud network and forward it to the SIEM system, where it can be properly processed together with data collected locally.
From a practical point of view, integrating Elastic Detector with any SIEM system would not require any complex integration platform. In particular, we analyzed the integration of Elastic Detector with OSSIM [4], an open-source and widely adopted system for SIEM. One of the benefits brought by OSSIM is the possibility of easily developing and integrating custom components (plugins).
Architecture
As we mentioned above, we want to take advantage of the capabilities of a SIEM system in order to correlate logs and detect a potential side-channel attack. In our experiments, we employed OSSIM [10], a well-known and open-source (with a commercial extension) SIEM system which is easily expandable with custom plugins (Figure 7).
Our strategy consist of forwarding all the security-relevant logs to OSSIM, where correlation can take
place. In order to collect and forward logs, we used Elastic Detector [11], which is our flagship product. Thanks to Elastic Detector, we can smoothly and automatically detect changes (e.g. a virtual machine has been launched/stopped) in the cloud infrastructure and communicate these changes to OSSIM, by forwarding NAGIOS logs. Elastic Detector employs NAGIOS for performing automatic checks on the cloud infrastructure. This way, every change concerning the status of the user’s virtual machines is automatically detected and logged. OSSIM does not provide any remote API, however, every OSSIM installation includes a RSyslog server, which can receive logs from remote machines. For this reason, the best way to forward our logs is to configure configure NAGIOS so it can send logs to a remote server, that is the server on which OSSIM is running.
Detailed Description
In order to test our solution, we developed a Python script which makes use of AWS standard APIs to
emulate a side-channel attack. In particular, this script emulates the first phase of a side-channel attack, which is called “Placement”. During this phase, the attacker launches a high number of virtual machines until he gets one virtual machine running on the same physical host of the victim. It is worth pointing out that this is not a simulation of the real scenario, this is exactly what an attacker would do.
We think that our approach is the best one in terms of security, since the potential attacker is stopped before performing the side-channel attack. Indeed, while the attacker creates and destroys virtual machine, logs are collected, analyzed and correlated so the attack can be detected before it takes place.
Once logs have been delivered to OSSIM, thanks to our custom plugin, we can parse and convert them into security-relevant events. After that, we are interested in correlating two kinds of events, the creation and the termination of a virtual machine.
In order to parse logs generated by Elastic Detector, we needed to define a plugin and, as part of this plugin, a regular expression which allows us to extract the specific information we need from a given log line. Below we can see a typical log line generated when a virtual machine is launched and the regular expression used to capture the event and extract the meaningful information.
Log:
Aug 19 15:51:32 debian-secludit nagios3: SERVICE NOTIFICATION: event@551;72-us-east-1;722;notifyservice-
by-cloutomate;Found new Instance: i-f0ad689c
Regular expression:
^(?P<date>\w{3}\s\d{1,2}\s\d\d:\d\d:\d\d)\sdebian-secludit\snagios3:\sSERVICE\sNOTIFICATION:\
sevent@\d{3}\;(?P<account>\d{2,3})-(?P<region>\w{2}-\w{4,9}-\d)\;\d{3}\;notify-service-bycloutomate\;
Found\snew\sInstance:\s(?P<instanceid>i-[a-z,0-9]{8})$
where 72 is the account identifier of the user who launched the virtual machine, us-east-1 is the region in which the virtual machine is running and i-f0ad689c is the virtual machine ID.
Thanks to these information, we can proceed to the next stage, that is correlating these events and
determining if a user (the potential attacker) is performing a side-channel attack.
In Listing 1 we can have a closer look at the code we wrote for the plugin.
Listing 1. The code of the plugin
;; Elastic Detector
;; plugin_id: 9001
;; type: detector
;;
[DEFAULT]
plugin_id=9001
[config]
type=detector
enable=yes
source=log
location=/var/log/nagios3/nagios.log
create_file=false
process=
start=no
stop=no
startup=
shutdown=
[elastic-detector-found-new-instance]
#Aug 19 15:51:32 debian-secludit nagios3: SERVICE NOTIFICATION : event@551;72-us-east-
1;722;UNKNOWN;notify-service-by-cloutomate;Found new Instance: i-f0ad698c
event_type=event
regexp=”^(?P<date>\w{3}\s\d{1,2}\s\d\d:d\d:\d\d)\sdebian-secludit\snagios3:\sSERVICE\
sNOTIFICATION:\sevent@\d{3}\;(?P<account>\d{2,3})-(?P<region>\w{2}-\w{4,9}-\d)\;\
d{3}\;UNKNOWN\;notify-service-by-cloutomate\;Found\snew\sInstance\:s(?P(instanceid>i-[a-z,0-9]
{8})$”
date={normalize_date($date)}
#sensor={resolv($sensor)}
plugin_sid=1
#src_ip={$src}
userdata1={$region}
userdata2={$instanceid}
userdata3={$account}
[elastic-detector-instance-not-running]
#Aug 20 15:50:26 debian-secludit nagios3: SERVICE NOTIFICATION : event@551;72-us-east-
1;722;UNKNOWN;notify-service-by-cloutomate;Instance Terminated: i-7824db12
event_type=event
regexp=”^(?P<date>\w{3}\s\d{1,2}\s\d\d:d\d:\d\d)\sdebian-secludit\snagios3:\sSERVICE\
sNOTIFICATION:\sevent@\d{3}\;(?P<account>\d{2,3})-(?P<region>\w{2}-\w{4,9}-\d)\;\
d{3}\;UNKNOWN\;notify-service-by-cloutomate\;Instance\sTerminated:\s(?P(instanceid>i-[a-z,0-9]
{8})$”
date={normalize_date($date)}
#sensor={resolv($sensor)}
plugin_sid=2
#src_ip={$src}
userdata1={$region}
userdata2={$instanceid}
userdata3={$account}
As we can see above, defining a plugin means to define a set of events and a set of regular expressions to convert a log line into an event.
Any SIEM system is based on correlation rules. In OSSIM, correlation rules define a set of conditions that, if met, can raise an alarm. In our experiment, we defined a simple and effective correlation rule which can be seen below.
When a new virtual machine is created, OSSIM starts the evaluation of the correlation rule. If in the
following 60 seconds, 9 more virtual machines are created in the same region, by the same user, then the evaluation proceeds to the next level. Finally, if within 15 minutes (approximation of the time required to create a virtual machine and perform a co-residency check) all those virtual machines are terminated, then an alarm is raised. At this stage, several countermeasures could be taken. For instance, a notification or an email could be sent to the security administrator or a program/script could be executed. As an example, a realistic reaction to such an alarm could be the temporary revocation of the user’s account.
In our experiments, we successfully managed to detect several potential side-channel attacks during the placement phase. Potential attacks are detected as soon as virtual machines are terminated so that the proper countermeasures can be taken before the attacker can try to steal any information.
A comprehensive DEMO of the above-mentioned solution is available at http://youtu.be/3NacQOksyJo.
Discussion
Our proposed solution does not require any change in the way local information are collected. The main advantage of our approach is that the system administrator does not need anymore to re-configure the SIEM system every time there is a change in the public cloud infrastructure. Therefore, Elastic Detector is able to automatically detects changes in the cloud infrastructure and its configuration. This way, event information and logs can be collected according to the current configuration and settings. The only thing the administrator has to do is to provide his API credentials in order to enable Elastic Detector to connect to the cloud.
When a relevant event occurs or a log file needs to be forwarded to the SIEM system, Elastic Detector can seamlessly for the user communicate with it and deliver the required information. Once the SIEM system has successfully communicated with Elastic Detector, it can finally process the events occurred and perform the typical actions for SIEM: normalization, correlation, notification and alerting, prioritization, visualization and reporting.
Furthermore, our solution does not even need the collaboration of the hypervisor, so it is fully compatible with any existing platform. This is the main reason why we decided to detect side-channel attacks during the placement phase. Indeed, detecting a side-channel attack during the following phases would require the collaboration of other components of the infrastructure such as routers (co-residency check) and hypervisors (side-channel attack), resulting in a more complex integration.
The main benefits of our strategy are:
• Full Automation. Keeping operating costs under control means being able to automate the security
management by eliminating the majority of manual setup, security monitoring, and corrective actions.
• Agentless. The performance footprint of agents on servers and potential conflicts with applications are sources of problems. Moreover, agents are OS dependent and have vulnerabilities as well. Through the virtualization layer, and using APIs such as VMware vShield or Amazon EC2 security groups, security solutions can analyze resource information and enforce security with no agents.
• Comprehensive Security Assessment. The traditional layered approach, where each security component takes care of a specific layer such as the network, is not enough. For this reason, there is a need for tools that tackle the new security challenges brought by IaaS, such as multi-tenancy and side-channel attacks.
• No Lock-in. In some scenarios, it is important to have the ability to use different IaaS offerings for
reliability and flexibility. However, this should not compromise the effectiveness of security tools and the ability to have a full visibility of the security of your infrastructure.
Conclusion
In this article we have presented a new class of attacks, the so called side-channel attacks, and a prototype for protecting cloud users from these attacks. The main reason why we decided to focus on this use case is that nowadays there is no solution to detect side-channel attacks in the cloud environment.
Deploying our solution would be extremely easy and safe for a cloud provider since it should just allow Elastic Detector to access a very small subset of events (virtual machine creation/termination). Also, this solution would be completely transparent to the user.
Furthermore, we provided also a solid architecture for adapting existing SIEM systems in cloud
infrastructures. In this particular case we have presented the integration of our main product, Elastic
Detector, and OSSIM, an open-source and widely adopted solution for SIEM.
Cloud infrastructures have the advantage of being highly elastic. Unfortunately, their elasticity also brings several security concerns. Among these concerns, from a SIEM point of view, there is the necessity for the system administrator to constantly monitor the public cloud infrastructure in order to detect changes and re-configure the SIEM system accordingly. In most of the cases, this approach is unfeasible since configuring a SIEM system can be very complex and take a long time.
We strongly believe that our proposed solution can help system administrators and IT professionals to easily and safely manage their infrastructures without having the hassle of re-configuring their SIEM every time there is a change in the public cloud.
Elastic Detector is an optimal and smart solution to overcome this serious issue by delegating the
monitoring activity of remote virtual machines to an external, automatic and elastic tool. In addition,
deploying Elastic Detector does not require any change in the infrastructure and no software agent has to be installed on virtual machines.
Cloud computing adoption is rising fast. Flexibility, pay-per-use and available resources ondemand
with the promise of lower ownership costs are a very attractive value proposition.
Virtualization is just a part of Cloud Computing, which leverages virtualization in order to provide
resources in an on-demand and pay-per-use fashion.
According to the official NIST definition, Cloud Computing can be classified into three main models:
• IaaS (Infrastructure as a Service): users have access to on-demand virtual machines and storage which are provided from large shared data centers;
• PaaS (Platform as a Service): what is offered by IaaS plus ready-to-use development environment;
• SaaS (Software as a Service): on-demand software hosted on a remote server.
In this article we will focus on the main security concerns arising in IaaS environment, in particular multitenancy.
Multi-tenancy
The wide adoption of Cloud technologies has brought enormous advantages but also several new security concerns. One of the main causes for these new threats is multi-tenancy. Cloud Computing widespreads new scenarios in which users’ applications and data share the same physical host.
Because of co-residency, an attacker has a new way to access the victim’s data: he can leverage the co-residency factor and infer victim’s data by observing the activity of a shared component (e.g. the processor cache) on the physical host.
This new class of attacks is called “Side-channel attacks”. While side-channel attacks have been developed for years, for example on the smartcard field, their application to cloud computing infrastructures is just starting.
Problem Statement
Side-channel attacks
Before performing a side-channel attack, the attacker needs to create a virtual machine and determine if it is running on the same physical host of the victim. In order to do so, he has to perform the so called “coresidency”. The authors of [9] defined three possible ways to detect co-residency:
• matching Dom0 IP address: if the Dom0 IP address of two machines is equal, then co-residency has been achieved;
• measuring the round-trip time: if the round-trip time between two machines is very small, there is a high chance of co-residency;
• observing IP addresses: if two IP addresses are close enough, there is a high chance of co-residency.
On Amazon EC2, check #1 proved to be the most effective with a false-positive rate of zero. This means that check #1 is sufficient to determine co-residency.
If the result of the co-residency check is positive, the attacker can go ahead with the side-channel attack. Many techniques belonging to this class have been shown in the context of Cloud Computing Infrastructures, but until now, just one has been implemented and actually proved [8]. This technique can be classified as an “access-driven” attack since it manages to retrieve a given information which belongs to another user by constantly observing (and accessing) the activity of a shared physical component, the processor cache.
However, even if this kind of attacks has been widely treated in the literature, performing such an attack is very hard and there are several challenges to take into account:
• the attacker-VM needs to frequently monitor the status of the processor cache, so it needs to go in
execution often enough in order to make the observation granularity as fine-grained as possible;
• the attacker-VM has to clean the results and reduce the noise introduced by hardware and software
sources;
• last but not least, the attacker-VM has to be able to detect when the target VM is not running anymore on the same physical host.
The strategy adopted by the authors [8] to extract the wanted information can be summarized as follows:
• PRIME: the attacker fills the processor cache;
• IDLE: the attacker waits for a pseudo-random interval. During this interval the target VM is supposed to access the cache and thus change the content of some blocks;
• PROBE: when the attacker resumes the execution, it refills the cache in order to learn the activity of the target VM on the cache.
Once the attacker has collected a sufficiently high number of measurements, it can finally analyze these measurements and infer the encryption key used by the target VM.
During the analysis, the measurements achieved during the previous phase are converted to basic operations (operations performed during the execution of the target VM). This phase is called “cache pattern classifier”. Of course, doing this requires to know in advance the algorithm which is being executed on the target VM.
After classifying each operation, a special Markov Model (Hidden Markov Model) is used to remove
noises and apply some heuristics (based on knowledge of the algorithm) in order to reconstruct the possible execution paths. Finally, the attacker obtains a set containing all the possible encryption keys. This set is composed by few thousands of keys, so the attacker can use these keys to perform a brute-force attack.
The existence of such an attack is the proof that traditional security systems such as intrusion detection systems, antivirus, firewalls, etc. are not effective anymore in today’s virtualized and multi-tenant environment, that is the Cloud Infrastructure. In order to protect users against Cloud threats, a new approach is required and new techniques need to be developed.
Solution
In this section we will describe in detail all the characteristics of our solution and the rationale behind it. At a high level of abstraction, the architecture we propose is composed by three main components: a cloud provider, Elastic Detector and a SIEM.
The cloud provider provides an API which allows Elastic Detector to retrieve information on the status of the infrastructure and events which are useful for the detection of threats and attacks.
Elastic Detector operates as an intermediary and its duty is to handle elasticity and make it transparent to the SIEM system. Indeed, existing SIEM solutions do not take into account the elasticity of modern cloud infrastructures.
Thanks to Elastic Detector, the SIEM system can work as usual to analyze and correlate logs.
The final goal of such an architecture is to catch security-relevant events in order to detect threats and
ongoing attacks.
Elastic Detector
Cloud Computing brought, along with several benefits, a set of problems that changed the way we handle security and need to be addressed in order to meet security needs:
• Lack of visibility. IaaS is more dynamic than classical infrastructures, since servers, network and storage are launched for temporary usage or automatically. This makes it difficult to keep track of the availability of each server, network and storage as well as their security status.
• Security degradation over time. Modifications to an IaaS environment, such as starting new services, tests and starting new machines, generally reduce the level of protection of a system over time, which increases the risk of external and internal attacks.
• Manual configuration errors. Today, due to the complexity and dynamic nature of cloud computing
infrastructures security in such environments can no longer be handled manually.
• New attack vectors and threats. The capabilities and the flexibility of IaaS brings as well new threats
as the nefarious use of resources by malicious insiders or threats related to the virtualization and APIs
technologies.
Moreover, Cloud Computing brings also a new way to build and manage IT infrastructures. Compared to the traditional approaches, thanks to cloud technologies, such as APIs, infrastructures can become highly elastic. This means that the set of active virtual machines, the storage and the network topology can change very fast. As a consequence, the security system needs as well to react to these changes as fast as possible and adapt its configuration to the new scenario.
Due to the highly elastic and easy-to-use nature of Cloud Computing technologies, it’s getting easier for attackers to find vulnerable or not properly protected resources. As a proof, researchers [2] managed to retrieve a large amount of private data from public Amazon EC2 AMIs. The same kind of vulnerabilities has been found on Amazon S3 buckets by Rapid7 staff [3].
Our proposed solution to fulfill these requirements can be summarized as “virtual machine cloning”. Bycloning the virtual machine to be analyzed, we can perform very deep and intrusive security checks without impacting the performance of the applications in production. This way, the vulnerability assessment process can run smoothly and in a completely automatic way. It is worth pointing out that this solution is feasible thanks to the features of Cloud Computing. Indeed, in a traditional environment (e.g. on-the-premises data centers), it would not be possible to clone a machine on the fly and destroy the clone after performing all the required security checks.
From a practical point of view, cloning is the easiest solution to deploy. Indeed, no software (e.g. agent) has to be installed on virtual machines or within your cloud infrastructure. Every action can be performed by taking advantage of the API provided by the IaaS provider.
Cloning is also cost-effective since the cost of an additional virtual machine for a short period of time time is very low within IaaS infrastructures. Moreover, as the whole analysis is performed on a different machine, cloning avoids the risks of breaking applications and losing data.
Furthermore, new elastic and pay per use infrastructures bring higher percentages of stopped servers. These “dormant” servers constitute potential threats to the infrastructure as acknowledged by the Cloud Security Alliance. While stopped, the servers are not surveilled by agents or agentless solutions and they are not patched. They become weak links of your infrastructure when started. That’s why we propose to test and raise alerts in case of vulnerabilities in your dormant servers.
Auto-checks should be automatically set in order to monitor your IT cloud infrastructure. This is mandatory on a continuously changing infrastructure. Therefore, while your IT infrastructure evolves to answer your business needs, the right security checks are automatically set.
Our vision is that only a fully automated approach to security can cope with the elastic nature of new cloud infrastructures and their new threats.
SIEM
Detecting attacks in distributed environments, such as cloud infrastructures, often requires the capability of analyzing logs and correlating several events from different sources. Nowadays, the best approach to threat detection in distributed environments is to employ a SIEM (Security Information and Event Management) system.
A SIEM system is much more than an analyzer of logs. A SIEM system takes care of several aspects which can be summarized as follows:
• log and context data collection;
• normalization and categorization;
• correlation;
• notification/alerting;
• prioritization;
• dashboards and visualization;
• reporting and report delivery.
A SIEM system works with different kinds of data coming from different sources within one or more
networks. Starting from 2006 [6], when Cloud Computing appeared for the first time, a decentralization process has started and several companies have decided to move their own data and infrastructures to the cloud. Latest trend analysis [7] show that this tendency is not going to stop, so in the next few years, the number of companies adopting cloud computing technologies will be even higher than now. Due to the remote nature of the cloud computing, several new security concerns arise and companies are worried about the protection of their own data and infrastructures.
In this scenario, SIEM plays a very important role and needs to effectively operate in new cloud networks. Unfortunately, adopting SIEM systems in elastic cloud infrastructures is not an easy task. Indeed, SIEM systems were designed to operate in traditional (static) environments. Therefore, we need to adapt existing solutions in order to deal with elasticity. This means that a modern SIEM system should have the capability of automatically detecting the virtual machines running in the cloud, performing security checks on them and logging any security-relevant activity.
Nowadays, a system administrator needs to reconfigure his own SIEM system every time there is a
change (e.g. a new virtual machine starts running) in the cloud infrastructure. Configuring a SIEM
system is a very difficult task which requires the experience and skills of a security expert. Therefore, in highly dynamic infrastructures, reconfiguring the SIEM system every time there is a change in the cloud infrastructure would not be practical.
A possible alternative could be to install agents on each virtual machine running in the cloud but this
approach has various drawbacks affecting especially performance.
Elastic Detector, which is our flagship product, can be an easy and effective solution to make the deployment of a SIEM for the cloud much faster. Moreover, Elastic Detector could easily aggregate all the data retrieved by performing auto-checks in the cloud network and forward it to the SIEM system, where it can be properly processed together with data collected locally.
From a practical point of view, integrating Elastic Detector with any SIEM system would not require any complex integration platform. In particular, we analyzed the integration of Elastic Detector with OSSIM [4], an open-source and widely adopted system for SIEM. One of the benefits brought by OSSIM is the possibility of easily developing and integrating custom components (plugins).
Architecture
As we mentioned above, we want to take advantage of the capabilities of a SIEM system in order to correlate logs and detect a potential side-channel attack. In our experiments, we employed OSSIM [10], a well-known and open-source (with a commercial extension) SIEM system which is easily expandable with custom plugins (Figure 7).
Our strategy consist of forwarding all the security-relevant logs to OSSIM, where correlation can take
place. In order to collect and forward logs, we used Elastic Detector [11], which is our flagship product. Thanks to Elastic Detector, we can smoothly and automatically detect changes (e.g. a virtual machine has been launched/stopped) in the cloud infrastructure and communicate these changes to OSSIM, by forwarding NAGIOS logs. Elastic Detector employs NAGIOS for performing automatic checks on the cloud infrastructure. This way, every change concerning the status of the user’s virtual machines is automatically detected and logged. OSSIM does not provide any remote API, however, every OSSIM installation includes a RSyslog server, which can receive logs from remote machines. For this reason, the best way to forward our logs is to configure configure NAGIOS so it can send logs to a remote server, that is the server on which OSSIM is running.
Detailed Description
In order to test our solution, we developed a Python script which makes use of AWS standard APIs to
emulate a side-channel attack. In particular, this script emulates the first phase of a side-channel attack, which is called “Placement”. During this phase, the attacker launches a high number of virtual machines until he gets one virtual machine running on the same physical host of the victim. It is worth pointing out that this is not a simulation of the real scenario, this is exactly what an attacker would do.
We think that our approach is the best one in terms of security, since the potential attacker is stopped before performing the side-channel attack. Indeed, while the attacker creates and destroys virtual machine, logs are collected, analyzed and correlated so the attack can be detected before it takes place.
Once logs have been delivered to OSSIM, thanks to our custom plugin, we can parse and convert them into security-relevant events. After that, we are interested in correlating two kinds of events, the creation and the termination of a virtual machine.
In order to parse logs generated by Elastic Detector, we needed to define a plugin and, as part of this plugin, a regular expression which allows us to extract the specific information we need from a given log line. Below we can see a typical log line generated when a virtual machine is launched and the regular expression used to capture the event and extract the meaningful information.
Log:
Aug 19 15:51:32 debian-secludit nagios3: SERVICE NOTIFICATION: event@551;72-us-east-1;722;notifyservice-
by-cloutomate;Found new Instance: i-f0ad689c
Regular expression:
^(?P<date>\w{3}\s\d{1,2}\s\d\d:\d\d:\d\d)\sdebian-secludit\snagios3:\sSERVICE\sNOTIFICATION:\
sevent@\d{3}\;(?P<account>\d{2,3})-(?P<region>\w{2}-\w{4,9}-\d)\;\d{3}\;notify-service-bycloutomate\;
Found\snew\sInstance:\s(?P<instanceid>i-[a-z,0-9]{8})$
where 72 is the account identifier of the user who launched the virtual machine, us-east-1 is the region in which the virtual machine is running and i-f0ad689c is the virtual machine ID.
Thanks to these information, we can proceed to the next stage, that is correlating these events and
determining if a user (the potential attacker) is performing a side-channel attack.
In Listing 1 we can have a closer look at the code we wrote for the plugin.
Listing 1. The code of the plugin
;; Elastic Detector
;; plugin_id: 9001
;; type: detector
;;
[DEFAULT]
plugin_id=9001
[config]
type=detector
enable=yes
source=log
location=/var/log/nagios3/nagios.log
create_file=false
process=
start=no
stop=no
startup=
shutdown=
[elastic-detector-found-new-instance]
#Aug 19 15:51:32 debian-secludit nagios3: SERVICE NOTIFICATION : event@551;72-us-east-
1;722;UNKNOWN;notify-service-by-cloutomate;Found new Instance: i-f0ad698c
event_type=event
regexp=”^(?P<date>\w{3}\s\d{1,2}\s\d\d:d\d:\d\d)\sdebian-secludit\snagios3:\sSERVICE\
sNOTIFICATION:\sevent@\d{3}\;(?P<account>\d{2,3})-(?P<region>\w{2}-\w{4,9}-\d)\;\
d{3}\;UNKNOWN\;notify-service-by-cloutomate\;Found\snew\sInstance\:s(?P(instanceid>i-[a-z,0-9]
{8})$”
date={normalize_date($date)}
#sensor={resolv($sensor)}
plugin_sid=1
#src_ip={$src}
userdata1={$region}
userdata2={$instanceid}
userdata3={$account}
[elastic-detector-instance-not-running]
#Aug 20 15:50:26 debian-secludit nagios3: SERVICE NOTIFICATION : event@551;72-us-east-
1;722;UNKNOWN;notify-service-by-cloutomate;Instance Terminated: i-7824db12
event_type=event
regexp=”^(?P<date>\w{3}\s\d{1,2}\s\d\d:d\d:\d\d)\sdebian-secludit\snagios3:\sSERVICE\
sNOTIFICATION:\sevent@\d{3}\;(?P<account>\d{2,3})-(?P<region>\w{2}-\w{4,9}-\d)\;\
d{3}\;UNKNOWN\;notify-service-by-cloutomate\;Instance\sTerminated:\s(?P(instanceid>i-[a-z,0-9]
{8})$”
date={normalize_date($date)}
#sensor={resolv($sensor)}
plugin_sid=2
#src_ip={$src}
userdata1={$region}
userdata2={$instanceid}
userdata3={$account}
As we can see above, defining a plugin means to define a set of events and a set of regular expressions to convert a log line into an event.
Any SIEM system is based on correlation rules. In OSSIM, correlation rules define a set of conditions that, if met, can raise an alarm. In our experiment, we defined a simple and effective correlation rule which can be seen below.
When a new virtual machine is created, OSSIM starts the evaluation of the correlation rule. If in the
following 60 seconds, 9 more virtual machines are created in the same region, by the same user, then the evaluation proceeds to the next level. Finally, if within 15 minutes (approximation of the time required to create a virtual machine and perform a co-residency check) all those virtual machines are terminated, then an alarm is raised. At this stage, several countermeasures could be taken. For instance, a notification or an email could be sent to the security administrator or a program/script could be executed. As an example, a realistic reaction to such an alarm could be the temporary revocation of the user’s account.
In our experiments, we successfully managed to detect several potential side-channel attacks during the placement phase. Potential attacks are detected as soon as virtual machines are terminated so that the proper countermeasures can be taken before the attacker can try to steal any information.
A comprehensive DEMO of the above-mentioned solution is available at http://youtu.be/3NacQOksyJo.
Discussion
Our proposed solution does not require any change in the way local information are collected. The main advantage of our approach is that the system administrator does not need anymore to re-configure the SIEM system every time there is a change in the public cloud infrastructure. Therefore, Elastic Detector is able to automatically detects changes in the cloud infrastructure and its configuration. This way, event information and logs can be collected according to the current configuration and settings. The only thing the administrator has to do is to provide his API credentials in order to enable Elastic Detector to connect to the cloud.
When a relevant event occurs or a log file needs to be forwarded to the SIEM system, Elastic Detector can seamlessly for the user communicate with it and deliver the required information. Once the SIEM system has successfully communicated with Elastic Detector, it can finally process the events occurred and perform the typical actions for SIEM: normalization, correlation, notification and alerting, prioritization, visualization and reporting.
Furthermore, our solution does not even need the collaboration of the hypervisor, so it is fully compatible with any existing platform. This is the main reason why we decided to detect side-channel attacks during the placement phase. Indeed, detecting a side-channel attack during the following phases would require the collaboration of other components of the infrastructure such as routers (co-residency check) and hypervisors (side-channel attack), resulting in a more complex integration.
The main benefits of our strategy are:
• Full Automation. Keeping operating costs under control means being able to automate the security
management by eliminating the majority of manual setup, security monitoring, and corrective actions.
• Agentless. The performance footprint of agents on servers and potential conflicts with applications are sources of problems. Moreover, agents are OS dependent and have vulnerabilities as well. Through the virtualization layer, and using APIs such as VMware vShield or Amazon EC2 security groups, security solutions can analyze resource information and enforce security with no agents.
• Comprehensive Security Assessment. The traditional layered approach, where each security component takes care of a specific layer such as the network, is not enough. For this reason, there is a need for tools that tackle the new security challenges brought by IaaS, such as multi-tenancy and side-channel attacks.
• No Lock-in. In some scenarios, it is important to have the ability to use different IaaS offerings for
reliability and flexibility. However, this should not compromise the effectiveness of security tools and the ability to have a full visibility of the security of your infrastructure.
Conclusion
In this article we have presented a new class of attacks, the so called side-channel attacks, and a prototype for protecting cloud users from these attacks. The main reason why we decided to focus on this use case is that nowadays there is no solution to detect side-channel attacks in the cloud environment.
Deploying our solution would be extremely easy and safe for a cloud provider since it should just allow Elastic Detector to access a very small subset of events (virtual machine creation/termination). Also, this solution would be completely transparent to the user.
Furthermore, we provided also a solid architecture for adapting existing SIEM systems in cloud
infrastructures. In this particular case we have presented the integration of our main product, Elastic
Detector, and OSSIM, an open-source and widely adopted solution for SIEM.
Cloud infrastructures have the advantage of being highly elastic. Unfortunately, their elasticity also brings several security concerns. Among these concerns, from a SIEM point of view, there is the necessity for the system administrator to constantly monitor the public cloud infrastructure in order to detect changes and re-configure the SIEM system accordingly. In most of the cases, this approach is unfeasible since configuring a SIEM system can be very complex and take a long time.
We strongly believe that our proposed solution can help system administrators and IT professionals to easily and safely manage their infrastructures without having the hassle of re-configuring their SIEM every time there is a change in the public cloud.
Elastic Detector is an optimal and smart solution to overcome this serious issue by delegating the
monitoring activity of remote virtual machines to an external, automatic and elastic tool. In addition,
deploying Elastic Detector does not require any change in the infrastructure and no software agent has to be installed on virtual machines.