A Guide to the Well-Architected Framework

February 16, 2022 TH Author

Not so easy, huh? Luckily, Microsoft Azure and AWS have created several white papers on the Well-Architected Framework to explain cloud architectural design principals that can help guide you through the process. For example, in the case of an Amazon S3 bucket, you need to remember to disallow public read access, ensure logging is enabled, use customer-provided keys to ensure encryption is on, and so on.

With so many cloud services and resources, it can be a lot to remember what to do and what configurations should be there. However, as you can see from the links to the articles on infrastructure configuration, Trend Micro has lots of information about what should be done to build cloud architecture to best practice levels. The Trend Micro Cloud One™ – Conformity Knowledge Base contains 1,000 best practice articles to help you understand each cloud best practice, how to audit, and how to remediate the misconfiguration.

Cloud infrastructure misconfiguration automation
Automation is an essential step to minimize the risk of a breach, always scanning and providing feedback to stay ahead of the hackers. For anyone building in the cloud, having an automated tool that continuously scans your cloud infrastructure for misconfigurations is a thing of beauty, as it can ensure you are always complying with those 1,000 best practices without the heavy lifting. If you would like to be relieved from manually checking for adherence to well-architected design principals, sign up for a free trial of Conformity. Or, if you’d to see how well-architected your infrastructure is, check out the free guided public cloud risk self-assessment to get personalized results in minutes.

The Six Pillars of a Well-Architected Framework
Conformity and its Knowledge Base are based on the AWS and Azure Well-Architected Frameworks, which are defined by six pillars:

Operational excellence: focus on running and monitoring systems
Security: focus on protecting information and systems
Reliability: focus on ensuring a workload performs as it should
Performance efficiency: focus on efficient use of IT
Cost optimization: focus on avoiding unnecessary costs
Sustainability: focus on environmental impacts

Each of these pillars has its own set of design principals, which are extremely useful for evaluating your architecture and determining if you have implemented design principles that allow you to scale over time.

Operational Excellence Pillar

Starting with the Operational Excellence pillar, creating the most effective and efficient cloud infrastructure is a natural goal. So, when creating or changing the infrastructure, it is critical to follow the path of best practices outlined in the AWS Operational Excellence pillar.

The Operational Excellence pillar focuses on two business objectives:

1. Running workloads in the most efficient way possible.
2. Understanding your efficiency to be able to improve processes and procedures on an ongoing basis.

The five design principles within the Operational Excellence pillar
To achieve these objectives, there are five critical design principles can be utilized:

Perform operations as code, so you can apply engineering principles to your entire cloud environment. Applications, infrastructure, and so on, can all be defined as code and updated as code.
Make frequent, small, reversible changes, as opposed to large changes that make it difficult to determine the cause of the failure—if one were to occur. It also requires development and operations teams to be prepared to reverse the change that was just made in the event of a failure.
Refine operations procedures frequently by reviewing them with the entire team to ensure everyone is familiar with them and determine if they can be updated.
Anticipate failure to ensure that the sources of future failures are found and removed. A pre-mortem exercise should be conducted to determine how things can go wrong to be prepared..
Learn from all operational failures and share them across all teams. This allows teams to evolve and continue to increase procedures and skills.

CI/CD is good, but to ensure operational excellence, there must be proper controls on the process and procedures for building and deploying software, which include a plan for failure. It is always best to plan for the worst, and hope for the best, so if there is a failure, we will be ready for it.

Security Pillar

With data storage and processing in the cloud, especially in today’s regulatory environment, it is critical to ensure we build security into our environment from the beginning.

The seven design principles within the Security pillar
There are several critical design principles that strengthen our ability to keep our data and business secure, however, here are the seven recommended based on the Security pillar:

Implement a strong identity foundation to control access using core security concepts, such as the principle of least privilege and separation of duties.
Enable traceability through logging and metrics throughout the cloud infrastructure. It is only with logs that we know what has happened.
Apply security at all layers throughout the entire cloud infrastructure using multiple security controls with defense in depth. This applies to compute, network, and storage.
Automate security best practices to help scale rapidly and securely in the cloud. Utilizing controls managed as code in version-controlled templates makes it easier to scale securely.
Always protect data in transit and at rest, using appropriate controls based on sensitivity. These controls include access control, tokenization, encryption, and etc.
Keep people away from data to reduce the chance of mishandling, modification, or human error.
Prepare for security events by having incident response plans and teams in place. Incidents will occur and it is essential to ensure that a business is prepared.

Five areas to configure in the cloud to help achieve a well-architected infrastructure
There are several security tools that enable us to fulfill on the design principles, above. AWS has broken security into five areas that we should configure in the cloud:

Identity and access management (IAM), which involves the establishment of identities and permissions for humans and machines. It is critical to manage this through the life cycle of the identity.
Detection of an attack. The challenges most businesses face is detection attacks. Even though an attack may not be malicious, it could simply be a user making a mistake, it can be costly. Enablement of logging features, as well as the delivery of those logs to the SIEM is essential. Once the SIEM has detected something bad has happened, alerts should be sent out.
Infrastructure protection of the network and the compute resources is critical. This is done through a variety of tools and mechanisms that are either infrastructure tools or code protection, such as virtual private clouds (VPCs), code review, vulnerability assessments, gateways, firewalls, load balancers, hardening, code signing, and more.
Data protection in transit and at rest is critical. This is primarily done with IAM and encryption. Most discussions of encryption review what algorithms are used and what the key size is. The most important piece to discuss, in relationship to encryption, is where is the key and who has control over it. It is also important to be able to determine the authenticity of the public key certificates.
Incident response is the ability to respond immediately and effectively when an adverse agent occurs. The saying goes “failing to plan is planning to fail”. If we do not have incident responses planned and practiced, an incident could destroy the business.

What is essential to remember is that security of a cloud ecosystem is a split responsibility. AWS and Azure have defined where responsibility lies with them versus where it lies with the consumer. It is good to review the AWS and/or Azure shared responsibility models to ensure you are upholding your end of the deal.

Reliability Pillar

Reliability is important to think about for any IT-related system. IT must provide the services users and customers need, when they need it. This involves understanding the level of availability that your business requires from any given system.

The five design principles within the Reliability pillar
When it comes to the Reliability pillar, just like the others, AWS has defined critical design principles:

Automatically recover from failure. Depending on the business needs, it might be essential that there are automated recovery controls in place, as the time it takes a human to intervene may be longer than a business can tolerate.
Test recovery procedures. Backing up the data from an Amazon S3 bucket is good first step, but the process is not complete until the restoration procedure is verified. If the data cannot be restored, then it has not been successfully backed up.
Scale horizontally to increase aggregate workload availability as an alternate way to envision a cloud infrastructure. If the business is using a virtual machine with a large amount of CPU capacity to handle all user requests, you may want to consider breaking it down into multiple, smaller virtual machines that are load balanced. That way, if a machine failed, the impact is not a denial of service, and if planned well, the users may never know there was a failure at all.
Stop guessing capacity. Careful planning and capacity management is critical to the reliability of an IT environment, and may just save you money where you are spending on unnecessary capacity needs.
Manage change and automation so alternations to the cloud do not interfere with the reliability of the infrastructure. Change management is core to ITIL. Changes should not be made unless they are planned, documented, tested, and approved. There must also be a backup plan for if/when a change breaks your environment.

With availability being at the core of this pillar, it is good to understand its definition. AWS defines availability as:

You May Also Like

The Difference Between Virtual Machines and Containers

A Complete Guide to Cloud-Native Application Security

Micro Frontend Guide: Technical Integrations Sr. Software Engineer