Cloud IAM Lessons From The Capital One Breach

Cloud infrastructure is the foundation of more companies than ever. As with any foundation, any crack can lead to significant damage to the infrastructure. One potential crack is a trusted identity with unnecessary and excessive privileges.

A “trusted identity” is invariably associated with people — employees, contractors or other insiders. But identity in the cloud is no longer just about humans. The proliferation of modern infrastructure driven by accelerated levels of automation and innovation have led to an exponential rise in machine identities, such as service accounts, bots, API keys, servers and applications, and cloud resources. 

In the case of Capital One, the identity that ultimately played a key role in the breach was a machine identity in the form of an EC2 instance. Because the EC2 instance had an over-provisioned identity and access management (IAM) role attached to it (and many do), once the credentials were compromised, so were all the privileges assigned to it.

That’s why it’s so important to implement least privilege policies for all identities, but especially for machine identities. Human identities can adapt their behaviors to changing scenarios, but machine identities are not designed to do so and any deviation in their behavior could indicate   privileged credential misuse.

Hard Lessons from Capital One

As laid out in an FBI indictment, the Capital One hacker accessed an EC2 instance (the identity) via a misconfigured firewall and gained the ability to assume a role on the machine. That role had the privileges to enumerate and download over 100 million customer records. Without those privileges, the damage would have had a marginal impact.

How did the EC2 instance become over-provisioned? Why did the assumed role have so many high-risk privileges? It is pretty clear that Capital One’s authorization model failed. The reason for this is twofold:

1.        The Fear of Under-Provisioning

Roles are created with a broad set of assumed privileges based on a job description or function within an organization. The problem is that it is almost impossible to know or predict what privileges an identity actually needs, so most enterprises err on the side of over-provisioning because they are afraid of negatively impacting productivity. The problem is compounded as organizations change but seldom update roles properly. The temptation is always to add a little more into an existing role, rather than redesign the role completely. That is how we end up with grossly over-provisioned machine and human identities.

2.     One to Many

When one role is assigned to many identities (e.g. multiple EC2 instances), it almost always implies over-provisioning. This happens because most identities will only need a few privileges to perform their day-to-job. However, every time a new identity is assigned that role, new privileges are added as well to accommodate its function. The problem is that roles are seldom monitored and reviewed and therefore privileges are rarely deleted. Without continuous oversight, over time, a role will become massively over-provisioned.

The delta between the privileges that identities need to successfully perform their day-to-day jobs and the privileges they are granted is what we refer to as an avoidable risk.

This is not just a Capital One problem. Companies around the world are vulnerable to the same threats, but we’ll use this example to illustrate how companies are exposing their cloud infrastructure to excessive risk from insider threats.

So what can we learn from Capital One and similar incidents?

1.        Every machine identity in your environment needs to be carefully examined to assess its potential risk to your organization. How many high-risk privileges are assigned to the identity? What privileges have they used over the last 90 days? What resources are they accessing? Has the identity performed an unusual action on a new resource?

It’s also not enough to assess once and move on. It must be continuous. If at any time there is evidence that the machine identity is over-provisioned or is displaying unusual behavior, take immediate action to prune or right size those privileges. Continuously implementing and enforcing the principle of least privilege across your cloud environment is the best way to keep it safe.

2.        When designing roles, don’t make assumptions on what the identity might need to perform their day-to-day job. Look for actual data to support your decisions such as identity activity attributes. If an identity has not used a privilege for 90 days, there is a good chance it doesn’t need it. In the Capital One case, two of the commands that led to the exfiltration of data had never been used by the machine identity (the EC2 instance). A data-driven approach gives security IT teams an infinitely better picture of how to set up and maintain least privilege policies.

3.        When creating a new role for an existing identity, provision the set of privileges based on past usage. If it is a new identity, start with just enough privileges of a similar identity or start with a minimal set of privileges and gradually adjust. Review and right-size the privileges continuously.

For example, with this approach, the EC2 instance would not have had unfettered access to the S3 buckets, preventing the hacker from downloading so many files at once. Even if she had the ability to download a few files, the magnitude of the incident would have been significantly smaller. To contain privilege creep with roles, we need to either break down roles with a minimal common set of privileges or create a unique role for each identity based on past usage.

Following these best practices will help improve your cloud security risk posture and significantly limit the blast radius of an attempted breach such as what Capital One experienced. Prevention is paramount, especially with cloud infrastructure and it all starts with properly managing identity activity.