Many network attacks gaining attention today are advanced persistent threats (APT) that aim to maintain access for long-term data exfiltration. The advent of cloud infrastructure provides a new avenue of defense against persistent network attacks. The ability to plan the destruction and re-creation of virtual machines in the cloud environment can dramatically reduce the time an attacker has in your infrastructure. Groups like Netflix and others have been pioneering activities like this for some years now. Organizations migrating to the cloud should consider ways to use the unique benefits of virtual machine management to reduce their risk from APT attacks.
Advanced Persistent Threats Are Not Going Away
Attacks against the Office of Personnel Management (OPM), Target, Saudi Aramco, Sony, and others are proving that APTs are becoming the new normal for nation-state and organized crime cyber-attacks. For purposes of this discussion, we’ll define an advanced persistent threat as “a network attack in which an unauthorized person [or group] gains access to a network and stays there undetected for a long period of time.”1 APTs can be difficult to detect, often requiring deep investigation to recognize all of the systems and files that the APT has compromised. An APT goal is to maintain persistence on the network. To that end, APTs can implement many back-door communication channels on different systems in the network, making these attacks difficult to remediate. As a result, organizations must make a near term decision between eliminating the threat or allowing an attacker to remain active in the network so defenders can determine the full extent of the compromise. Regardless of the near term decision, recovery from an APT attack typically requires complete rebuilds of compromised systems after details of the attack are understood and the attackers purged from the network.2
With the power and cost efficiencies of the cloud, these servers are now virtual machines that can be spun up or destroyed in minutes.
The Cloud Is Changing More Than Just Finances
Most public and private organizations move to the cloud for rapid scalability (think the Internal Revenue Service during tax season) and to reduce management costs. For applications that are optimized to take advantage of cloud capabilities, cloud migration can create significant savings. Cloud computing also allows for quick, automated creation and destruction of virtual machines (VMs). This capability provides interesting possibilities for cybersecurity defense in the cloud.
Take a Page from Wildfire Management—Use the Cloud to Reduce Risk
Every year, many homes and countless acres of wildlife habitat across the western United States are devastated by wildfires. These fires are difficult to predict (many are started by lightning strikes or careless campers) and are hard to control. To reduce the fires’ impact, federal, state, and local groups focus part of their efforts in the off-season on conducting “prescribed burns”3 to reduce the impact that wildfires can have during the dry season. These prescribed burns create natural fire breaks and destroy dead trees and brush to reduce the amount of fuel available to a fire. This advance activity helps limit a wildfire’s spread.
In a similar way, conducting regular culls of your cloud VMs also limits intruders’ ability to remain persistent in the environment (i.e., dead wood) and impacts their ability to spread malware to other VMs in the environment (fire break). If idle hands are the devil’s workshop, then idle, persistent VMs in our cloud environments are a hacker’s dream. Frustrating attackers should always be a priority.
Stop Treating Your Servers (VMs) Like Children!
There was a time when we bought a rack-mounted server and used it for four or more years to run everything from e-mail to FTP. With the power and cost efficiencies of the cloud, these servers are now virtual machines that can be spun up or destroyed in minutes. As a result, we need to stop treating our servers as though they were precious, unique and irreplaceable children. In the new world of the cloud, VMs are like trees. Just as wildland fire workers cull some of the trees to reduce the risk of wildfires, culling VMs in the cloud can prevent attackers from remaining persistently in the network or moving effectively within your environment. In many cases, APT attacks involve downloading software to a penetrated system to establish persistence in the network. Random or scheduled VM culling shortens the lifespan of infected VMs (often less than 12 hours), frustrating attackers’ attempts at persistence within the network. As long as your cloud-based system is designed to automatically load balance and operate in a modular way without hard-coded server connections, applications will remain responsive even during constant attack. Companies like Netflix and Facebook have already proven that this type of cloud architecture is both possible and amazingly scalable.
Designing for Constant Destruction isn’t New…
The concept of “designing for constant destruction” isn’t really new, it’s just an extension of the recovery-oriented computing4 concepts that many academic institutions and private sector companies5 have been espousing for years. Netflix was one of the earlier adopters of these ideas in the cloud. In 2011, Netflix began experimenting with cloud applications that were inherently fault-tolerant in the face of constant attack or random failure. They named the test tool the “Chaos Monkey” as it was designed to create random component failures throughout the application and network. Netflix’s follow up work in this space developed the “Simian Army” – a set of tools to automate the creation of ongoing, random failures due to things like latency or even the failure of an entire Amazon availability zone.6
Cloud Can Make It Better but Nothing is Perfect…
Cloud provides many significant advantages over previous system and network architectures, but it is not without its weaknesses. Constant destruction can make it more difficult for an attacker to stay in your network but it also makes it critical to protect your VM master images. Attackers know that if they compromise your master images, they will automatically propagate their malicious content into every new VM that is spun up. This creates network persistence by using your own design against you.
Authentication is still an effective attack vector in the cloud. While deleting and rebuilding VMs can be quickly and effectively managed in the cloud, the same can’t be said of user and administrative accounts. If legitimate accounts are compromised and/or privileges escalated, compromised users in the environment can steal sensitive data or take other malicious actions in your cloud environment.
These weaknesses aren’t showstoppers, but they must be considered and managed to effectively reduce risk in a cloud environment. As Elon Musk said, “It’s OK to have your eggs in one basket as long as you control what happens to that basket.”7
That’s Nice … but I’m not Netflix …
Cloud technology can make the creation of fault-tolerant application designs more achievable than ever before, but what if your business isn’t like Netflix? Most federal agencies are initially moving to Infrastructure as a Service (IaaS) cloud infrastructures in an effort to reduce their infrastructure operations costs. This is consistent with federal goals like OMB’s “cloud first” directive and datacenter consolidation mandates, but most agencies have yet to redesign or recode their applications to take advantage of the full capabilities of the cloud. “Where it makes sense” is a critical analysis metric since not all systems can or should be redesigned/recoded to take full advantage of cloud capabilities.
Constant destruction can make it more difficult for an attacker to stay in your network but it also makes it critical to protect your VM master images.
Organizations should first look to design new systems to take advantage of cloud capabilities (including constant destruction/interruption cycles) wherever possible and where cloud capabilities can provide significant mission/business advantages. Once new systems have been designed to take advantage of these new capabilities in the cloud, planning and budgeting for the redesign of existing applications should begin in earnest. Prioritization for redesign/recoding activities should be given to systems with the greatest mission impact.
Systems that are not readily migrated should be considered for process or design modifications to take advantage of what cloud capabilities they can. In some cases this might require creating standardized VMs with configurations to allow system components to be replaced with clean VMs during low system usage times (think replacing web servers or application servers on some recurring basis). In other cases it may mean looking at ways to redesign portions of the system to improve redundancy and load balancing capabilities to successfully handle different types of failures or more quickly recover when failures do occur.
Finally, there will always be a need for contingency planning exercises for systems whether they are in the cloud or not. Murphy’s Law8 will always apply, but with the advent of the capabilities cloud now provides us, we can make our attackers’ lives more difficult and our systems more fault-tolerant.
- See http://searchsecurity.techtarget.com/definition/advanced-persistent-threat-APT
- See http://www.infoworld.com/article/2624454/intrusion-detection/protect-your-network-security–how-to-get-rid-of-advanced-persistent-threats.html?page=2
- See https://www.nps.gov/fire/wildland-fire/what-we-do/wildfires-prescribed-fires-and-fuels.cfm
- See http://roc.cs.berkeley.edu/
- See https://www.mitre.org/capabilities/cybersecurity/resiliency
- See http://techblog.netflix.com/2011/07/netflix-simian-army.html
- See http://www.inc.com/larry-kim/50-innovation-amp;-success-quotes-from-spacex-founder-elon-musk.html
- See http://www.murphys-laws.com/murphy/murphy-true.html