Adding Guardrails To A Cloud Account After The Fact

May 11, 2022 TH Author

The Challenge

With a brand-new account, your initial configuration sets the tone. With existing accounts, the challenge is twofold.

The first is the team working with that account will already be used to operating under the existing configuration. And since they’ve been doing it this way for a while and things are working, there’s no motivation to change.

The second challenge is on the technical side. Can these guardrails be implemented without breaking anything inside the active account? What level of testing will be required? How much work is involved overall?

Boiling it down, this is a security feature request that needs to be prioritized. How can we approach this challenge?

Getting The Team Onboard

Everyone wants their systems to be more secure. But security is just one of the pillars of building well in the cloud. When faced with deploying a new feature that directly helps customers or deploying security guardrails that may help in the future, it’s hard to argue against the customer.

That’s completely understandable and one of the key reasons the centralized security monitoring structure is so hard to put in place in an environment that is already working.

The story usually proceeds like this:

Security determines they need visibility into every account now
Security decrees from up high that this work must be done immediately for “compliance” reasons
A few teams comply grumpily, others dig their heels in and slow down the work

No one likes being told they must drop their work and do something different that doesn’t directly advance their goals. This is squarely on the security teams shoulders. They need to adjust their approach.

Until they do, let’s look at this from your team’s point of view. How can centralized security monitoring and audit help you meet your goals?

As much as auditing sounds scary, it’s really just having someone double check your work. If you’re able to get feedback (preferably automated) that your workloads are configured in a strong manner, isn’t that a positive thing?

Similarly, while centralized monitoring always has challenges with context, having another team looking for security issues can add a layer of assurance that your team hasn’t missed anything.

Additionally, centralized monitoring can have added benefits like spotting larger patterns that aren’t visible with only one accounts data.

Evidently, there are positives for your team. They just aren’t as direct or impactful as you may want…which is fine as long as the cost or effort to implement isn’t too high.

That leads to the technical implementation of these guardrails and the associated risks.

Digging Up Roots

The first step:

The root account is locked down, using multi-factor authentication, and not used for anything but the initial configuration of the account (AWS, Microsoft Azure, Google Cloud Platform™)

This is probably the trickiest step to back away from. If you’ve used the root account to create resources or run workloads in your account, you may have to re-launch them with a less privileged account or re-assign ownership.

The good news? Most cloud resources don’t have ownership assigned to a user but to the account. That means any account with sufficient permissions should be able to maintain or remove those resources.

Backing away from root ownership is more an exercise in reducing permissions, not changing ownership. Still, there is potential for downtime here, but the risk of those elevated privileges usually justifies moving this work up as a high priority.

The one area that might be a “gotcha” is if someone is using the root account credentials on their workstation or has them embedded somewhere else like a deployment server.

Use the API call audit tool available in each of the big three clouds to find that access if it does exist.

Estimated time to resolve? An hour.

Level of effort? High due to log searches required and possible permission changes.

Return on investment? Very high. Root accounts are the keys to the kingdom and should be protected at all costs.

API Call Auditing

Of course, in order to check the API call logs, those logs have to be enabled.

The good news is that for most accounts, those logs have been enabled by default since the account was created. That’s true for Azure, Google, and AWS.

But each of the clouds does have an exception (or three) that might apply here. There was a time when API calls were either not logged by default or used a different system.

With Azure, “Classic” resources may or may not log to the activity log. For Google, some services use the activity logs and not the newer audit logs. In AWS, older accounts simply didn’t have AWS CloudTrail enabled and weren’t logging those calls in any form.

For older accounts, taking a few minutes to enable this logging is a smart move.

The configuration is minimal and essentially boils down to providing a place to store the logs. This should not impact any production resources or result in any downtime.

The only downside is the possible costs associated with storing the logs. Though, again, all of the clouds have ways to easily reduce that cost over time.

Estimated time to resolve? Five minutes.

Level of effort? Minimal. These features are probably already one.

Return on investment? High. These logs are a fantastic source of troubleshooting information for any operational issue (including security).

You Spent What?

Billing alerts are something that should be enabled on all cloud accounts by default. The CSPs won’t enable them by default because what I am willing to spend on the account hosting my personal website is significantly different from what I’m willing to spend on my workload supporting paying customers.

That means it’s up to you to setup billing alerts that match your risk tolerance.

Again, the good news here is that this is a non-breaking change. These alerts don’t stop resources in your accounts, they highlight spending that might be higher than you expect.

Ask any team out there, it’s always better to get a notification early in the month that something is off versus a bill that is thousands and thousands of dollars higher than you expect.

A simple billing alert can help avoid that disaster, and alert you to any suspiciously high charges due to an attack like crypto mining. There’s no reason not to apply these to your account immediately. It’s five minutes that could save you thousands.

Estimated time to resolve? Ten minutes.

Level of effort? Moderate. You have to decide not only where to send the alerts but what to do if you receive one.

Return on investment? High. It doesn’t take a lot of searching to find horror stories of very large and very unexpected cloud spending bills.

Centralized Visibility

This is the step that typically meets with the most pushback. The truly interesting part of that is the reason for the pushback. This step is usually fought against because of the idea of someone looking over your shoulder.

The technical side of this step is relatively simple. The centralized accounts need to be already setup and then provided a role in your accounts that has read access only.

This means there won’t be any production impact and this setup should be completely automated. The centralized teams should be able to provide a cloud-specific script that sets up the needed permissions.

The true issue here is the relationship between your team and the centralized services. This can be tricky waters.

Estimated time to resolve? Five minutes.

Level of effort? Minimal. This should be completely scripted and have zero production impact.

Return on investment? Low for your team. High for the overall organization. The idea behind centralized security and audit accounts is to get a handle on the overall risk the organization faces. This is one you take for the team.

Organizing Access Control Permissions

Despite the high level of pushback in the previous step, this recommendation is by far the hardest to pull off.

For some reason, permissions almost always gradually drift towards “administrator” levels.

It’s often little changes here and there over time and before you know it, a resource needlessly had full administrator access to your cloud account. Therefore you need to regularly review and maintain the permissions in your cloud account.

Remember, the goal is to manage these permissions using a higher-level abstraction. Creating policies or roles for various tasks is a great first step.

There’s a lot of information out there to help get you started. Here are a few examples:

Unfortunately, the tooling that would help you monitor which permissions are actually being used isn’t nearly as mature as I’d like to see. Leading the way is the AWS IAM Access Analyzer which I’m hoping other clouds will copy.

It should be very simple to find out which permissions assigned have never been used. Sadly, it still takes a lot of effort.

Estimated time to resolve? Ongoing.

Level of effort? Hard. This is a complicated and constant activity and if you remove a critical permission, the consequences could be dire.

Return on investment? High. Almost all the public security breaches in the cloud stem from misconfigured permissions. This is the top security issue by far.

What’s Next?

We have gone through each of the sample checklist ideas and determined the level of effort required to implement them along with a ballpark return. Check out this relevant article on how to set up guardrails to avoid cloud misconfigurations to continue to build the foundation of great architecture.

You May Also Like

The Well-Architected Framework Guide

Event-Driven Architectures & the Security Implications

5 #TrendTips To Build Better in the Cloud Solution Engineer