AWS Certified Solutions Architect - Professional (SAP-C02)
Comprehensive Study Guide
Complete Learning Path for Certification Success
Overview
This study guide provides a structured learning path from fundamentals to exam readiness for the AWS Certified Solutions Architect - Professional (SAP-C02) certification. Designed for complete novices, it teaches all concepts progressively while focusing exclusively on exam-relevant content. Extensive diagrams and visual aids are integrated throughout to enhance understanding and retention.
Target Audience: Complete beginners with little to no AWS experience who need to learn everything from scratch.
Study Approach: Self-sufficient textbook replacement - you should NOT need external resources to understand concepts. Everything is explained from first principles with extensive examples and visual diagrams.
This certification assumes you understand certain foundational concepts. This chapter builds that foundation from scratch, assuming you're a complete novice. If you're already familiar with AWS basics, you can skim this chapter, but don't skip it entirely - it establishes the mental models used throughout the guide.
Prerequisites Checklist
Before diving into professional-level architecture, you should understand:
Basic Cloud Computing Concepts - What cloud computing is and why organizations use it
Networking Fundamentals - IP addresses, subnets, routing, DNS basics
High Availability Concepts - Redundancy, failover, disaster recovery basics
Basic AWS Services - EC2, S3, VPC, IAM at a conceptual level
If you're missing any: Don't worry! This chapter will teach you everything you need. Take your time and work through each section carefully.
Section 1: Cloud Computing Fundamentals
What is Cloud Computing?
Simple Definition: Cloud computing means using computers, storage, and services over the internet instead of owning and maintaining your own physical servers.
Why it exists: Traditionally, companies had to buy servers, set up data centers, hire staff to maintain them, and predict future capacity needs years in advance. This was expensive, inflexible, and risky. Cloud computing solves these problems by letting you rent computing resources on-demand, paying only for what you use.
Real-world analogy: Think of cloud computing like electricity. You don't build your own power plant - you plug into the grid and pay for what you use. Similarly, you don't build your own data center - you use AWS's infrastructure and pay for what you consume.
How it works (Detailed step-by-step):
AWS builds massive data centers around the world with thousands of servers, storage systems, and networking equipment. These facilities have redundant power, cooling, security, and internet connections.
AWS virtualizes the hardware using software that divides physical servers into many virtual machines (VMs). This means one physical server can run dozens of isolated virtual servers for different customers.
You request resources through AWS's web interface or APIs. For example, you might request "I need 2 virtual servers with 4GB RAM each, running Linux, in the US East region."
AWS provisions your resources in seconds. The virtualization software carves out the requested resources from available physical hardware and makes them available to you.
You use the resources to run your applications, store data, or provide services to your customers. You access everything over the internet.
AWS meters your usage and bills you based on what you consume - compute hours, storage gigabytes, data transfer, etc.
You can scale up or down instantly. Need more servers? Request them. Don't need them anymore? Shut them down and stop paying.
Why this matters for the exam: The SAP-C02 exam tests your ability to design solutions that leverage cloud advantages - elasticity, pay-per-use, global reach, and managed services. Understanding WHY cloud exists helps you make better architectural decisions.
Cloud Service Models
There are three main service models in cloud computing. Understanding these helps you choose the right AWS services for different scenarios.
Infrastructure as a Service (IaaS)
What it is: You rent virtual servers, storage, and networking. You're responsible for everything else - operating system, applications, data, security configurations.
You need standard business applications without customization
You want zero infrastructure or platform management
You need quick deployment with minimal setup
Real-world analogy: Staying in a hotel. Everything is provided and managed. You just show up and use the services.
ā Must Know: The exam frequently tests whether you understand when to use IaaS vs PaaS. Generally, PaaS is preferred for operational efficiency, but IaaS is needed when you require specific control or have legacy application requirements.
AWS Global Infrastructure
Understanding AWS's physical infrastructure is critical for designing resilient, performant, and compliant solutions.
Regions
What it is: A Region is a physical geographic area where AWS has multiple data centers. Each Region is completely independent and isolated from other Regions.
Why it exists:
Data sovereignty: Some countries require data to stay within their borders
Latency: Placing resources closer to users reduces response times
Disaster recovery: Regions are far apart, so a natural disaster in one won't affect others
Service availability: New AWS services often launch in specific Regions first
How it works:
AWS selects geographic locations based on customer demand, connectivity, and regulatory requirements
Each Region has a unique identifier (e.g., us-east-1, eu-west-1, ap-southeast-2)
Regions are connected by AWS's private global network backbone
You explicitly choose which Region(s) to deploy resources in
Data doesn't leave a Region unless you explicitly configure replication or transfer
Examples of Regions:
us-east-1 (N. Virginia): Oldest and largest AWS Region, most services available
eu-west-1 (Ireland): Primary European Region for many customers
sa-east-1 (SĆ£o Paulo): Serves South American customers
When to choose a Region:
ā Choose based on: User location (latency), compliance requirements, service availability, cost
ā Don't choose based on: Random selection, always using us-east-1 by default
ā Must Know: As of 2025, AWS has 30+ Regions globally. Not all services are available in all Regions. Always verify service availability in your target Region.
Availability Zones (AZs)
What it is: An Availability Zone is one or more discrete data centers within a Region, each with redundant power, networking, and connectivity.
Why it exists: To provide high availability and fault tolerance within a Region. If one AZ fails (power outage, network issue, natural disaster), applications in other AZs continue running.
Real-world analogy: Think of a Region as a city, and Availability Zones as different neighborhoods in that city. Each neighborhood has its own power grid and infrastructure. If one neighborhood loses power, the others keep running.
How it works (Detailed):
Each Region has multiple AZs (minimum 3, typically 3-6)
AZs are physically separated by meaningful distances (miles/kilometers apart)
Each AZ has independent power sources, cooling, and physical security
AZs are connected by high-speed, low-latency private fiber networks
You deploy resources across multiple AZs for redundancy
AWS automatically handles failover for many managed services
Architecture Example:
Region: us-east-1
āāā AZ: us-east-1a (Data Center Group 1)
āāā AZ: us-east-1b (Data Center Group 2)
āāā AZ: us-east-1c (Data Center Group 3)
āāā AZ: us-east-1d (Data Center Group 4)
āāā AZ: us-east-1e (Data Center Group 5)
āāā AZ: us-east-1f (Data Center Group 6)
Detailed Example 1: Multi-AZ Web Application
Imagine you're running an e-commerce website. Here's how Multi-AZ deployment works:
Setup: You deploy web servers in us-east-1a, us-east-1b, and us-east-1c
Normal Operation: A load balancer distributes traffic across all three AZs. Each AZ handles roughly 33% of requests.
Failure Scenario: At 2 PM, us-east-1a experiences a power failure
Automatic Response:
The load balancer detects failed health checks from us-east-1a servers
Within 30 seconds, it stops sending traffic to us-east-1a
Traffic is redistributed to us-east-1b and us-east-1c (now 50% each)
Users experience no downtime - they're automatically routed to healthy AZs
Recovery: When us-east-1a comes back online, the load balancer automatically includes it again
Detailed Example 2: Multi-AZ Database
For a database using Amazon RDS Multi-AZ:
Setup: Primary database in us-east-1a, synchronous standby in us-east-1b
Normal Operation: All reads and writes go to the primary. Every transaction is synchronously replicated to the standby (happens in milliseconds).
Failure Scenario: The primary database instance fails
Automatic Failover:
RDS detects the failure within 60 seconds
Promotes the standby in us-east-1b to primary
Updates DNS to point to the new primary
Total downtime: 1-2 minutes
Zero data loss (synchronous replication)
Recovery: RDS automatically creates a new standby in another AZ
ā Must Know:
Each AZ is identified by a letter suffix (a, b, c, etc.)
AZ identifiers are mapped randomly per AWS account (your us-east-1a might be different from another account's us-east-1a)
Always deploy across at least 2 AZs for high availability
Some services (like RDS Multi-AZ) automatically handle AZ failover
š” Tip: For production workloads, use at least 3 AZs. This allows you to maintain availability even if one AZ fails and another is undergoing maintenance.
Edge Locations and CloudFront
What it is: Edge Locations are AWS data centers specifically designed for content delivery. They're part of Amazon CloudFront, AWS's Content Delivery Network (CDN).
Why it exists: To deliver content (web pages, videos, files) to users with low latency by caching content geographically close to them.
How it works:
You upload your content (website, images, videos) to an origin server (like S3 or EC2)
You configure CloudFront to distribute this content
CloudFront copies your content to Edge Locations around the world
When a user requests content, CloudFront serves it from the nearest Edge Location
If content isn't cached, CloudFront fetches it from the origin, caches it, and serves it
Real-world analogy: Think of Edge Locations like local convenience stores. Instead of driving to a distant warehouse (origin server) every time you need milk, you go to the nearby store that stocks popular items. The store occasionally restocks from the warehouse, but most purchases are served locally.
Detailed Example: Global Website Delivery
Scenario: You have a website hosted on EC2 in us-east-1, but users worldwide access it.
Without CloudFront:
User in Tokyo requests your website
Request travels across the internet to us-east-1 (Virginia)
Round-trip time: 200-300ms
Every image, CSS file, JavaScript file requires a separate round trip
Total page load: 3-5 seconds
With CloudFront:
User in Tokyo requests your website
Request goes to nearest Edge Location (Tokyo)
Edge Location has cached content: serves immediately (10-20ms)
Edge Location doesn't have content: fetches from us-east-1 once, caches it, serves it
Subsequent requests from Tokyo users: served from cache
Total page load: 0.5-1 second
ā Must Know:
AWS has 400+ Edge Locations globally (more than Regions)
Edge Locations are read-only caches (you can't deploy applications there)
CloudFront is the primary service using Edge Locations
Other services using Edge Locations: Route 53 (DNS), AWS WAF (web firewall), Lambda@Edge
šÆ Exam Focus: Questions often test whether you understand when to use CloudFront for performance optimization, especially for global user bases.
Section 2: Networking Fundamentals
Understanding networking is absolutely critical for the SAP-C02 exam. Domain 1 (26% of the exam) heavily focuses on network architecture. This section builds your networking foundation from scratch.
IP Addresses and CIDR Notation
What it is: An IP address is a unique identifier for a device on a network, like a phone number for computers. CIDR (Classless Inter-Domain Routing) is a way to specify ranges of IP addresses.
Why it exists: Networks need a way to identify and route traffic to specific devices. CIDR provides a flexible way to allocate IP addresses efficiently.
Real-world analogy: Think of IP addresses like street addresses. Just as "123 Main Street" uniquely identifies a house, "10.0.1.5" uniquely identifies a computer on a network.
How IP Addresses Work:
An IPv4 address consists of 4 numbers (0-255) separated by dots:
Example: 192.168.1.10
Each number is called an "octet" (8 bits)
Total: 32 bits (4 octets Ć 8 bits)
CIDR Notation Explained:
CIDR notation looks like: 10.0.0.0/16
The /16 is the "prefix length" - it tells you how many bits are fixed
Remaining bits can vary, defining the range of addresses
Detailed Example 1: Understanding /16
10.0.0.0/16 means:
First 16 bits are fixed: 10.0
Last 16 bits can vary: 0.0 to 255.255
Address range: 10.0.0.0 to 10.0.255.255
Total addresses: 2^16 = 65,536 addresses
Detailed Example 2: Understanding /24
192.168.1.0/24 means:
First 24 bits are fixed: 192.168.1
Last 8 bits can vary: 0 to 255
Address range: 192.168.1.0 to 192.168.1.255
Total addresses: 2^8 = 256 addresses
Detailed Example 3: Understanding /28
10.0.1.0/28 means:
First 28 bits are fixed
Last 4 bits can vary: 0 to 15
Address range: 10.0.1.0 to 10.0.1.15
Total addresses: 2^4 = 16 addresses
Common CIDR Blocks (Memorize These):
CIDR
Addresses
Typical Use
/32
1
Single host
/28
16
Very small subnet
/24
256
Small subnet (common)
/20
4,096
Medium subnet
/16
65,536
Large network
/8
16,777,216
Huge network
ā Must Know:
Smaller prefix (like /16) = MORE addresses
Larger prefix (like /28) = FEWER addresses
AWS VPCs can be /16 to /28
AWS subnets can be /16 to /28
š” Tip: Quick calculation - if CIDR is /X, you have 2^(32-X) addresses. For /24: 2^(32-24) = 2^8 = 256 addresses.
Private vs Public IP Addresses
What it is: IP addresses are divided into "private" (used internally) and "public" (used on the internet).
Why it exists: The internet has a limited number of IPv4 addresses (about 4 billion). Private addresses allow organizations to use the same address ranges internally without conflicts, while public addresses are globally unique.
Private IP Ranges (RFC 1918):
10.0.0.0/8 (10.0.0.0 to 10.255.255.255) - 16 million addresses
172.16.0.0/12 (172.16.0.0 to 172.31.255.255) - 1 million addresses
192.168.0.0/16 (192.168.0.0 to 192.168.255.255) - 65,536 addresses
Public IP Addresses:
All other IPv4 addresses
Globally unique and routable on the internet
Must be assigned by your ISP or cloud provider
How They Work Together:
Internal Communication: Devices use private IPs to talk to each other within your network
Internet Access: A NAT (Network Address Translation) device translates private IPs to public IPs
Incoming Traffic: Public IPs are used to reach your services from the internet
Detailed Example: Web Server Architecture
Scenario: You're running a web application on AWS.
Setup:
Web server has private IP: 10.0.1.50
Web server has public IP: 54.123.45.67 (assigned by AWS)
Database has private IP: 10.0.2.100 (no public IP)
User Access Flow:
User types www.example.com in browser
DNS resolves to public IP: 54.123.45.67
User's browser connects to 54.123.45.67
AWS routes traffic to web server's private IP: 10.0.1.50
Web server processes request
Database Access Flow:
Web server needs data from database
Web server connects to database's private IP: 10.0.2.100
Traffic stays within AWS's private network (fast, secure)
Database responds to web server
Database is NOT accessible from internet (no public IP)
ā Must Know:
AWS VPCs always use private IP ranges
Public IPs are optional and assigned separately
Databases and internal services should NEVER have public IPs (security best practice)
Use NAT Gateway for private instances to access internet
Subnets
What it is: A subnet is a subdivision of a network. It's a range of IP addresses within a larger network.
Why it exists: Subnets allow you to organize and secure your network by grouping related resources together and controlling traffic between groups.
Real-world analogy: Think of a subnet like a floor in an office building. The building (VPC) has multiple floors (subnets). Each floor has its own set of offices (IP addresses). You can control who can move between floors (routing and security).
How Subnets Work in AWS:
You create a VPC with a CIDR block (e.g., 10.0.0.0/16)
You divide the VPC into subnets (e.g., 10.0.1.0/24, 10.0.2.0/24)
Each subnet exists in ONE Availability Zone
You launch resources (EC2, RDS, etc.) into specific subnets
You control traffic between subnets using route tables and security groups
Detailed Example: Three-Tier Application
Scenario: You're designing a web application with web servers, application servers, and databases.
VPC: 10.0.0.0/16 (65,536 addresses)
Subnets:
Public Subnet 1 (us-east-1a): 10.0.1.0/24 (256 addresses)
Web servers that need internet access
Has route to Internet Gateway
Public Subnet 2 (us-east-1b): 10.0.2.0/24 (256 addresses)
Private subnet = no direct internet route (may have NAT Gateway)
Each subnet is in exactly ONE Availability Zone
Subnets cannot span multiple AZs
Always create subnets in multiple AZs for high availability
š” Tip: Use a consistent IP addressing scheme. For example:
10.0.1.x - Public subnets in AZ-a
10.0.2.x - Public subnets in AZ-b
10.0.11.x - Private subnets in AZ-a
10.0.12.x - Private subnets in AZ-b
Routing
What it is: Routing is the process of determining how network traffic gets from source to destination. Route tables contain rules (routes) that direct traffic.
Why it exists: Networks need to know where to send packets. Without routing, traffic wouldn't know how to reach its destination.
Real-world analogy: Routing is like GPS directions. When you want to go somewhere, GPS tells you which roads to take. Similarly, route tables tell network packets which path to take.
How Route Tables Work:
Every subnet has a route table (either custom or main)
Route table contains routes - rules that say "if destination is X, send to Y"
Most specific route wins - if multiple routes match, the one with longest prefix is used
Local route is automatic - traffic within VPC is always routed locally
Route Table Structure:
Destination
Target
Meaning
10.0.0.0/16
local
Traffic to any IP in VPC stays in VPC
0.0.0.0/0
igw-xxx
All other traffic goes to Internet Gateway
Detailed Example 1: Public Subnet Route Table
Destination Target Explanation
10.0.0.0/16 local VPC internal traffic
0.0.0.0/0 igw-12345678 Internet traffic
How it works:
Packet to 10.0.1.50: Matches 10.0.0.0/16 ā stays in VPC (local)
Packet to 8.8.8.8: Matches 0.0.0.0/0 ā goes to Internet Gateway
Packet to 54.123.45.67: Matches 0.0.0.0/0 ā goes to Internet Gateway
Detailed Example 2: Private Subnet Route Table
Destination Target Explanation
10.0.0.0/16 local VPC internal traffic
0.0.0.0/0 nat-87654321 Internet traffic via NAT
How it works:
Packet to 10.0.2.100: Matches 10.0.0.0/16 ā stays in VPC (local)
Packet to 8.8.8.8: Matches 0.0.0.0/0 ā goes to NAT Gateway
NAT Gateway translates private IP to public IP and forwards to Internet Gateway
Response comes back through NAT Gateway, translated back to private IP
Detailed Example 3: Database Subnet Route Table
Destination Target Explanation
10.0.0.0/16 local VPC internal traffic only
How it works:
Packet to 10.0.1.50: Matches 10.0.0.0/16 ā stays in VPC (local)
Packet to 8.8.8.8: No matching route ā dropped (no internet access)
This is intentional for security - databases shouldn't access internet
ā Must Know:
0.0.0.0/0 means "all IP addresses" (default route)
Local route is automatically added and cannot be deleted
Most specific route wins (longest prefix match)
Public subnet = route to Internet Gateway (igw-xxx)
Private subnet = route to NAT Gateway (nat-xxx) for internet access
Isolated subnet = no route to internet at all
šÆ Exam Focus: Questions often test whether you understand the difference between public and private subnets based on their route tables, not just their names.
Internet Gateway and NAT Gateway
These are critical components for internet connectivity in AWS VPCs.
Internet Gateway (IGW)
What it is: An Internet Gateway is a horizontally scaled, redundant, and highly available VPC component that allows communication between your VPC and the internet.
Why it exists: VPCs are isolated by default. An Internet Gateway provides a target for internet-routable traffic and performs NAT for instances with public IP addresses.
How it works:
You create an Internet Gateway
You attach it to your VPC (one IGW per VPC)
You add a route in your subnet's route table pointing 0.0.0.0/0 to the IGW
You assign public IPs to instances in that subnet
IGW performs 1:1 NAT between private and public IPs
Detailed Example: Web Server Internet Access
Setup:
VPC: 10.0.0.0/16
Public Subnet: 10.0.1.0/24
Web Server: Private IP 10.0.1.50, Public IP 54.123.45.67
Internet Gateway: igw-12345678
Route Table: 0.0.0.0/0 ā igw-12345678
Outbound Traffic (Web Server ā Internet):
Web server sends packet: Source 10.0.1.50, Destination 8.8.8.8
Packet delivered to web server: Source 8.8.8.8, Destination 10.0.1.50
ā Must Know:
One Internet Gateway per VPC
IGW is highly available (AWS managed)
IGW performs 1:1 NAT for instances with public IPs
No bandwidth limits or availability risks
Free (no charges for IGW itself)
NAT Gateway
What it is: A NAT (Network Address Translation) Gateway allows instances in private subnets to access the internet while preventing the internet from initiating connections to those instances.
Why it exists: Private instances (like application servers or batch processing servers) sometimes need to download updates, access external APIs, or send data to external services, but they shouldn't be directly accessible from the internet for security reasons.
Real-world analogy: Think of a NAT Gateway like a security guard at a gated community. Residents (private instances) can leave to go shopping (access internet), but random people from outside (internet) can't come in uninvited.
How it works:
You create a NAT Gateway in a PUBLIC subnet
You assign an Elastic IP (public IP) to the NAT Gateway
You add a route in PRIVATE subnet's route table: 0.0.0.0/0 ā NAT Gateway
Private instances send internet-bound traffic to NAT Gateway
NAT Gateway translates source IP to its own public IP and forwards to Internet Gateway
Responses come back to NAT Gateway, which forwards to original private instance
Detailed Example: Application Server Downloading Updates
Setup:
VPC: 10.0.0.0/16
Public Subnet: 10.0.1.0/24 (has Internet Gateway route)
NAT Instance: EC2 instance you manage, can be cheaper for low traffic, legacy approach
ā ļø Warning: Common mistake - putting NAT Gateway in private subnet. It MUST be in public subnet with Internet Gateway route.
šÆ Exam Focus: Questions often test whether you understand:
NAT Gateway must be in public subnet
Need one NAT Gateway per AZ for high availability
NAT Gateway allows outbound only, not inbound
Cost optimization: Single NAT Gateway vs one per AZ
Section 3: Security Fundamentals
Security is woven throughout the SAP-C02 exam. Understanding these fundamentals is essential for every domain.
Authentication vs Authorization
What they are:
Authentication: Proving who you are (identity verification)
Authorization: Determining what you're allowed to do (permission checking)
Why they exist: Systems need to verify identity before granting access, then control what authenticated users can do.
Real-world analogy:
Authentication: Showing your ID at airport security (proving you're you)
Authorization: Your boarding pass determines which plane you can board (what you can access)
How they work together:
User provides credentials (username/password, access keys, etc.)
System authenticates: "Is this really who they claim to be?"
If authenticated, system checks authorization: "What is this user allowed to do?"
System grants or denies access based on permissions
Detailed Example: AWS Console Login
Authentication Phase:
You navigate to AWS Console
You enter email and password
AWS verifies credentials against IAM database
If MFA enabled, you provide second factor (code from app)
AWS confirms: "Yes, this is really user John Smith"
Authorization Phase:
AWS checks IAM policies attached to your user
You try to launch an EC2 instance
AWS checks: "Does John Smith have ec2:RunInstances permission?"
If yes: Instance launches
If no: "Access Denied" error
ā Must Know:
Authentication = WHO you are
Authorization = WHAT you can do
In AWS, IAM handles both
Always authenticate first, then authorize
Encryption Basics
What it is: Encryption transforms readable data (plaintext) into unreadable data (ciphertext) using a mathematical algorithm and a key.
Why it exists: To protect data confidentiality. Even if someone intercepts encrypted data, they can't read it without the decryption key.
Real-world analogy: Encryption is like a locked safe. The data is inside the safe (encrypted), and only someone with the key can open it and read the contents.
Encryption at Rest
What it is: Encrypting data while it's stored (on disk, in database, in S3, etc.).
Why it exists: To protect data if physical storage is stolen or accessed by unauthorized parties.
How it works:
You write data to storage
Encryption software intercepts the write
Data is encrypted using a key
Encrypted data is written to disk
When reading, data is automatically decrypted using the key
Detailed Example: S3 Bucket Encryption
Setup:
S3 bucket with encryption enabled
Encryption key managed by AWS KMS
You upload a file: customer_data.csv
Upload Process:
You upload customer_data.csv (plaintext)
S3 receives the file
S3 requests encryption key from KMS
KMS provides data encryption key (DEK)
S3 encrypts file using DEK
S3 stores encrypted file on disk
S3 stores encrypted DEK with the file
Download Process:
You request customer_data.csv
S3 retrieves encrypted file and encrypted DEK
S3 sends encrypted DEK to KMS
KMS decrypts DEK (you must have permission)
S3 uses DEK to decrypt file
S3 sends plaintext file to you
If Disk is Stolen:
Thief has encrypted file
Thief has encrypted DEK
Thief CANNOT decrypt DEK (needs KMS access)
Thief CANNOT read file (needs DEK)
Data remains protected
ā Must Know:
Encryption at rest protects stored data
AWS KMS manages encryption keys
Most AWS services support encryption at rest
Encryption/decryption is transparent to applications
Encryption in Transit
What it is: Encrypting data while it's moving across networks (between client and server, between services, etc.).
Why it exists: To protect data from interception during transmission. Without encryption, network traffic can be captured and read.
How it works:
Client and server establish encrypted connection (TLS/SSL)
They exchange encryption keys securely
All data sent is encrypted before transmission
Receiving side decrypts data
Connection remains encrypted for entire session
Detailed Example: HTTPS Website Connection
Setup:
Web server with SSL/TLS certificate
User accessing website from browser
Connection Process:
User types https://example.com in browser
Browser connects to server on port 443 (HTTPS)
Server sends SSL certificate (contains public key)
Browser verifies certificate is valid and trusted
Browser generates session key
Browser encrypts session key with server's public key
Server decrypts session key with its private key
Both sides now have shared session key
Data Transfer:
User submits form with credit card number
Browser encrypts data with session key
Encrypted data travels across internet
Even if intercepted, attacker sees gibberish
Server receives encrypted data
Server decrypts with session key
Server processes plaintext credit card number
Without HTTPS (HTTP):
User submits form with credit card number
Data travels in plaintext
Anyone on network path can read it
Credit card number exposed
ā Must Know:
Encryption in transit protects data during transmission
TLS/SSL is the standard protocol
HTTPS = HTTP + TLS/SSL
AWS services support encryption in transit
Always use HTTPS for sensitive data
š” Tip: Remember the difference:
At Rest: Data sitting on disk (like a parked car)
In Transit: Data moving across network (like a car driving)
Principle of Least Privilege
What it is: Granting users and services only the minimum permissions they need to perform their job, nothing more.
Why it exists: To minimize damage if credentials are compromised. If an account only has limited permissions, an attacker who steals those credentials can only do limited damage.
Real-world analogy: A hotel housekeeper gets a key that opens guest rooms but not the safe or manager's office. If the key is lost, the damage is limited to guest rooms, not the entire hotel.
How it works:
Identify what actions a user/service needs to perform
Grant ONLY those specific permissions
Deny everything else by default
Regularly review and remove unused permissions
Use temporary credentials when possible
Detailed Example 1: Application Server Permissions
Use IAM Access Analyzer to identify overly permissive policies
šÆ Exam Focus: Questions often present scenarios where you must choose the most restrictive policy that still allows required functionality.
ā ļø Warning: Common mistake - granting broad permissions "just to make it work." Always take time to identify exact permissions needed.
Section 4: High Availability and Resilience Fundamentals
High availability and resilience are core themes throughout the SAP-C02 exam, especially in Domain 1 (Task 1.3) and Domain 2 (Task 2.2).
What is High Availability?
What it is: High availability means a system remains operational and accessible even when components fail. It's measured as a percentage of uptime.
Why it exists: Systems fail - hardware breaks, software crashes, networks disconnect, data centers lose power. High availability ensures business continuity despite these failures.
Real-world analogy: Think of high availability like having spare tires in your car. If one tire goes flat, you can replace it and keep driving. The journey continues despite the failure.
Availability Percentages:
Availability
Downtime per Year
Downtime per Month
Downtime per Week
99% (Two nines)
3.65 days
7.2 hours
1.68 hours
99.9% (Three nines)
8.76 hours
43.2 minutes
10.1 minutes
99.95% (Three and a half nines)
4.38 hours
21.6 minutes
5.04 minutes
99.99% (Four nines)
52.56 minutes
4.32 minutes
1.01 minutes
99.999% (Five nines)
5.26 minutes
25.9 seconds
6.05 seconds
How to achieve high availability:
Eliminate single points of failure: Every component should have a backup
Detect failures quickly: Monitor health and respond automatically
Fail over automatically: Switch to backup without manual intervention
Distribute across failure domains: Use multiple AZs, Regions
Design for failure: Assume everything will fail eventually
Detailed Example: Highly Available Web Application
Scenario: E-commerce website that must stay online 99.99% of the time (52 minutes downtime per year).
Architecture:
Load Balancer in multiple AZs
Web servers in 3 Availability Zones
Application servers in 3 Availability Zones
Database with Multi-AZ failover
Auto Scaling to replace failed instances
Normal Operation:
Load balancer distributes traffic across all web servers
Each web server can handle requests independently
If one server gets 100 requests/second, others can take over if it fails
Failure Scenario 1: Single Web Server Fails:
Web server in AZ-A crashes at 2:00 PM
Load balancer detects failure via health checks (30 seconds)
Load balancer stops sending traffic to failed server
Traffic redistributed to healthy servers in AZ-B and AZ-C
Auto Scaling detects missing capacity
Auto Scaling launches replacement server in AZ-A (2 minutes)
New server passes health checks and receives traffic
Total impact: 30 seconds of reduced capacity, zero downtime
Failure Scenario 2: Entire Availability Zone Fails:
AZ-A loses power at 3:00 PM
All servers in AZ-A become unreachable
Load balancer detects failures (30 seconds)
Load balancer redirects ALL traffic to AZ-B and AZ-C
Servers in AZ-B and AZ-C handle increased load
Auto Scaling launches additional servers in AZ-B and AZ-C
Database automatically fails over from AZ-A to AZ-B (1-2 minutes)
Total impact: 1-2 minutes of degraded performance, zero downtime
Why this achieves 99.99%:
No single point of failure
Automatic detection and failover
Multiple AZs provide redundancy
Auto Scaling maintains capacity
Downtime limited to failover time (1-2 minutes)
ā Must Know:
99.9% = 8.76 hours downtime per year (acceptable for many apps)
99.99% = 52 minutes downtime per year (required for critical apps)
99.999% = 5 minutes downtime per year (very expensive to achieve)
Multi-AZ deployment is minimum for high availability
Multi-Region deployment for highest availability
Redundancy
What it is: Having backup components that can take over when primary components fail.
Why it exists: Single components fail. Redundancy ensures service continues when failures occur.
Types of Redundancy:
Active-Active: All components handle traffic simultaneously
Example: Multiple web servers behind load balancer
Benefit: Full capacity always available, instant failover
Cost: Higher (paying for all components)
Active-Passive: Primary handles traffic, backup stands by
Fault Tolerance: System continues operating WITHOUT ANY DEGRADATION when components fail.
Example: RAID array - if one disk fails, system continues at full speed
Expensive: Requires duplicate everything
Zero downtime, zero performance impact
High Availability: System continues operating with MINIMAL DEGRADATION when components fail.
Example: Multi-AZ deployment - if one AZ fails, system continues with reduced capacity
Cost-effective: Shared resources with automatic failover
Brief downtime (seconds to minutes) during failover
Comparison:
Aspect
Fault Tolerance
High Availability
Downtime
Zero
Seconds to minutes
Performance Impact
None
Possible degradation
Cost
Very high
Moderate
Complexity
High
Moderate
Use Case
Mission-critical systems
Most applications
AWS Example
S3 (11 9's durability)
RDS Multi-AZ
ā Must Know:
Most applications need high availability, not fault tolerance
Fault tolerance is expensive and complex
Exam questions usually ask for high availability solutions
S3 is fault tolerant (automatically handles failures)
EC2 is not fault tolerant (you must design for HA)
Disaster Recovery Concepts
What it is: Disaster recovery (DR) is the process of restoring systems and data after a catastrophic failure.
Why it exists: Despite high availability measures, disasters can still occur - entire data centers can fail, regions can become unavailable, data can be corrupted or deleted.
Key Metrics:
RTO (Recovery Time Objective): How long can you be down?
Example: "We can tolerate 4 hours of downtime"
Determines DR strategy and cost
RPO (Recovery Point Objective): How much data can you lose?
Example: "We can lose maximum 1 hour of data"
Determines backup frequency
Detailed Example: Understanding RTO and RPO
Scenario: Online banking application
Business Requirements:
RTO: 1 hour (bank can be offline maximum 1 hour)
RPO: 5 minutes (can lose maximum 5 minutes of transactions)
What this means:
If disaster occurs at 2:00 PM, system must be back online by 3:00 PM
Data must be restored to at least 1:55 PM state
Any transactions between 1:55 PM and 2:00 PM may be lost
Architecture to meet requirements:
For RTO (1 hour):
Warm standby in another Region
Pre-deployed infrastructure ready to scale up
Automated failover procedures
Regular DR drills to ensure 1-hour recovery
For RPO (5 minutes):
Database replication every 5 minutes
Transaction logs backed up continuously
Point-in-time recovery capability
Can restore to any point within last 5 minutes
Cost Implications:
Tighter RTO = More expensive (need standby resources)
Tighter RPO = More expensive (more frequent backups/replication)
RTO 1 hour vs 24 hours: 10x cost difference
RPO 5 minutes vs 24 hours: 5x cost difference
ā Must Know:
RTO = Time to recover (how long down)
RPO = Data loss tolerance (how much data lost)
Tighter RTO/RPO = Higher cost
Business requirements drive RTO/RPO
DR strategy must meet both RTO and RPO
š” Tip: Remember the difference:
RTO: "How long until we're back?" (TIME)
RPO: "How much data can we lose?" (DATA)
šÆ Exam Focus: Questions often give you RTO/RPO requirements and ask you to choose appropriate DR strategy.
Section 5: Mental Model - How AWS Works
This section ties everything together into a cohesive mental model. Understanding this will help you make better architectural decisions throughout the exam.
The AWS Mental Model
š AWS Global Infrastructure Overview:
graph TB
subgraph "AWS Global Infrastructure"
subgraph "Region: us-east-1 (N. Virginia)"
subgraph "AZ: us-east-1a"
DC1A[Data Center 1]
DC1B[Data Center 2]
end
subgraph "AZ: us-east-1b"
DC2A[Data Center 3]
DC2B[Data Center 4]
end
subgraph "AZ: us-east-1c"
DC3A[Data Center 5]
DC3B[Data Center 6]
end
end
subgraph "Region: eu-west-1 (Ireland)"
subgraph "AZ: eu-west-1a"
DC4[Data Centers]
end
subgraph "AZ: eu-west-1b"
DC5[Data Centers]
end
subgraph "AZ: eu-west-1c"
DC6[Data Centers]
end
end
subgraph "Edge Locations (400+)"
EDGE1[Tokyo Edge]
EDGE2[London Edge]
EDGE3[Sydney Edge]
EDGE4[SĆ£o Paulo Edge]
end
end
DC1A -.High-Speed Network.-> DC2A
DC2A -.High-Speed Network.-> DC3A
DC1A -.High-Speed Network.-> DC3A
DC4 -.High-Speed Network.-> DC5
DC5 -.High-Speed Network.-> DC6
DC1A -.AWS Backbone.-> DC4
DC2A -.AWS Backbone.-> DC5
EDGE1 -.Content Delivery.-> DC1A
EDGE2 -.Content Delivery.-> DC4
EDGE3 -.Content Delivery.-> DC1A
EDGE4 -.Content Delivery.-> DC4
style DC1A fill:#c8e6c9
style DC2A fill:#c8e6c9
style DC3A fill:#c8e6c9
style DC4 fill:#fff3e0
style DC5 fill:#fff3e0
style DC6 fill:#fff3e0
style EDGE1 fill:#e1f5fe
style EDGE2 fill:#e1f5fe
style EDGE3 fill:#e1f5fe
style EDGE4 fill:#e1f5fe
This diagram shows the hierarchical structure of AWS's global infrastructure. At the highest level, AWS operates in multiple geographic Regions (shown here are us-east-1 in Virginia and eu-west-1 in Ireland, but there are 30+ Regions globally). Each Region is completely independent and isolated from other Regions, which provides fault isolation - a disaster in one Region doesn't affect others.
Within each Region, you see multiple Availability Zones (AZs). The us-east-1 Region shown has 6 AZs (us-east-1a through us-east-1f), while eu-west-1 has 3 AZs. Each AZ consists of one or more discrete data centers with redundant power, networking, and connectivity. The green boxes (DC1A, DC1B, etc.) represent individual data center facilities. AZs within a Region are connected by high-speed, low-latency private fiber networks (shown as dotted lines), allowing you to replicate data and fail over between AZs quickly.
The orange boxes represent data centers in the eu-west-1 Region. Notice the "AWS Backbone" connections between Regions - this is AWS's private global network that connects all Regions together. This network is separate from the public internet and provides faster, more reliable connectivity for cross-region replication and data transfer.
The blue boxes at the bottom represent Edge Locations, which are part of AWS's Content Delivery Network (CloudFront). There are 400+ Edge Locations globally, far more than Regions. Edge Locations cache content close to end users for low-latency delivery. The dotted lines show how Edge Locations connect back to origin Regions to fetch content that isn't cached.
Key Takeaways from this Diagram:
Regions are isolated: Failure in one Region doesn't affect others
AZs provide redundancy: Deploy across multiple AZs for high availability
Edge Locations improve performance: Cache content close to users
AWS Backbone is private: Cross-region traffic doesn't use public internet
Hierarchical structure: Region ā AZ ā Data Center ā Your Resources
Complete VPC Architecture
š VPC Architecture with Multi-AZ Deployment:
graph TB
INTERNET[Internet]
subgraph "VPC: 10.0.0.0/16"
IGW[Internet Gateway]
subgraph "Availability Zone A"
subgraph "Public Subnet: 10.0.1.0/24"
WEB1[Web Server<br/>10.0.1.10<br/>Public: 54.1.2.3]
NAT1[NAT Gateway<br/>10.0.1.50<br/>EIP: 54.1.2.100]
end
subgraph "Private Subnet: 10.0.11.0/24"
APP1[App Server<br/>10.0.11.20<br/>No Public IP]
end
subgraph "Database Subnet: 10.0.21.0/24"
DB1[Database<br/>10.0.21.30<br/>No Public IP]
end
end
subgraph "Availability Zone B"
subgraph "Public Subnet: 10.0.2.0/24"
WEB2[Web Server<br/>10.0.2.10<br/>Public: 54.1.2.4]
NAT2[NAT Gateway<br/>10.0.2.50<br/>EIP: 54.1.2.101]
end
subgraph "Private Subnet: 10.0.12.0/24"
APP2[App Server<br/>10.0.12.20<br/>No Public IP]
end
subgraph "Database Subnet: 10.0.22.0/24"
DB2[Database<br/>10.0.22.30<br/>No Public IP]
end
end
end
INTERNET <-->|Public Traffic| IGW
IGW <--> WEB1
IGW <--> WEB2
IGW <--> NAT1
IGW <--> NAT2
WEB1 <--> APP1
WEB2 <--> APP2
APP1 <--> DB1
APP2 <--> DB2
APP1 -.Outbound Only.-> NAT1
APP2 -.Outbound Only.-> NAT2
DB1 <-.Replication.-> DB2
style INTERNET fill:#ffebee
style IGW fill:#e1f5fe
style WEB1 fill:#c8e6c9
style WEB2 fill:#c8e6c9
style NAT1 fill:#fff3e0
style NAT2 fill:#fff3e0
style APP1 fill:#f3e5f5
style APP2 fill:#f3e5f5
style DB1 fill:#e8f5e9
style DB2 fill:#e8f5e9
This diagram shows a complete, production-ready VPC architecture following AWS best practices. Let's walk through each component and understand why it's designed this way.
VPC Structure (10.0.0.0/16): The entire VPC uses the 10.0.0.0/16 CIDR block, giving us 65,536 IP addresses to work with. This is a private IP range that's not routable on the public internet. The VPC spans multiple Availability Zones (AZ-A and AZ-B) for high availability.
Internet Gateway (Blue): The Internet Gateway (IGW) is the entry/exit point for internet traffic. It's attached to the VPC and provides a target for internet-routable traffic. The IGW is highly available by design - AWS manages its redundancy. All public subnets route their internet-bound traffic (0.0.0.0/0) to this IGW.
Public Subnets (Green - Web Servers): Public subnets (10.0.1.0/24 in AZ-A and 10.0.2.0/24 in AZ-B) contain resources that need to be directly accessible from the internet. The web servers (WEB1 and WEB2) each have TWO IP addresses: a private IP (10.0.1.10 and 10.0.2.10) for internal VPC communication, and a public IP (54.1.2.3 and 54.1.2.4) for internet access. The Internet Gateway performs 1:1 NAT between these addresses. These subnets are "public" because their route table has a route sending 0.0.0.0/0 traffic to the IGW.
NAT Gateways (Orange): Each public subnet also contains a NAT Gateway (NAT1 and NAT2). These are critical for allowing private subnet resources to access the internet for updates, API calls, etc., while preventing inbound connections from the internet. Each NAT Gateway has an Elastic IP (EIP) - a static public IP address. Notice we have ONE NAT Gateway per AZ - this is important for high availability. If we only had one NAT Gateway and its AZ failed, private subnets in other AZs couldn't access the internet.
Private Subnets (Purple - Application Servers): Private subnets (10.0.11.0/24 in AZ-A and 10.0.12.0/24 in AZ-B) contain application servers that don't need direct internet access. APP1 and APP2 have ONLY private IPs (10.0.11.20 and 10.0.12.20) - no public IPs. Their route table sends internet-bound traffic (0.0.0.0/0) to their AZ's NAT Gateway, not directly to the IGW. This means they can initiate outbound connections (like downloading updates) but cannot receive inbound connections from the internet.
Database Subnets (Light Green - Databases): Database subnets (10.0.21.0/24 in AZ-A and 10.0.22.0/24 in AZ-B) are the most isolated. DB1 and DB2 have ONLY private IPs and their route tables have NO route to the internet at all - not even through NAT Gateway. They can only communicate within the VPC. This is the most secure configuration for databases. The dotted line between DB1 and DB2 represents database replication for high availability.
Traffic Flows:
Internet ā Web Server: User request comes from internet, hits IGW, IGW translates public IP to private IP, traffic reaches web server
Web Server ā App Server: Web server sends request to app server's private IP, stays within VPC (fast, secure)
App Server ā Database: App server queries database using private IP, stays within VPC
App Server ā Internet: App server needs to call external API, sends to NAT Gateway, NAT translates to its EIP, forwards to IGW, reaches internet
Database Replication: DB1 and DB2 replicate data using private IPs within VPC
Why This Design:
Security: Databases have no internet access, app servers have outbound only, only web servers are publicly accessible
High Availability: Resources in both AZs, if one AZ fails, other continues
Performance: Internal traffic uses private IPs (low latency, no internet routing)
Cost Optimization: NAT Gateway charges apply, but necessary for security
Scalability: Can add more subnets and AZs as needed
Key Takeaways:
Three-tier architecture: Web (public), App (private), Database (isolated)
Multi-AZ deployment: Every tier has resources in multiple AZs
Defense in depth: Multiple layers of security (public/private subnets, security groups, NACLs)
NAT Gateway per AZ: Ensures high availability for outbound internet access
No public IPs for internal resources: Databases and app servers are not internet-accessible
This architecture is the foundation for most AWS applications and appears frequently in exam scenarios.
Section 6: AWS Service Categories Overview
Understanding how AWS services are categorized helps you choose the right service for each scenario. This section provides a high-level overview - detailed coverage comes in domain chapters.
Compute Services
Purpose: Run application code and workloads.
Key Services:
Amazon EC2 (Elastic Compute Cloud)
Virtual servers in the cloud
Full control over OS and configuration
Use when: You need specific OS, custom software, or full control
AWS Lambda
Serverless compute - run code without managing servers
Pay only when code runs
Use when: Event-driven workloads, microservices, short-running tasks
Amazon ECS/EKS (Container Services)
Run Docker containers
ECS = AWS-native, EKS = Kubernetes
Use when: Containerized applications, microservices architecture
AWS Elastic Beanstalk
Platform-as-a-Service (PaaS)
Deploy code, AWS manages infrastructure
Use when: Want to focus on code, not infrastructure
Next Steps: You're now ready to dive into Domain 1 (Design Solutions for Organizational Complexity). Open file 02_domain_1_organizational_complexity to continue.
š” Tip: Keep this fundamentals chapter bookmarked. You'll reference these concepts throughout your study.
Chapter 1: Design Solutions for Organizational Complexity
Domain Weight: 26% of exam (highest weight)
Chapter Overview
This domain focuses on designing AWS solutions for large, complex organizations with multiple accounts, teams, and requirements. You'll learn how to architect network connectivity across complex environments, implement security controls at scale, design resilient architectures, manage multi-account structures, and optimize costs across the organization.
What you'll learn:
Architect network connectivity strategies for multi-VPC and hybrid environments
Prescribe security controls for enterprise-scale deployments
Design reliable and resilient architectures with appropriate RTO/RPO
Design and manage multi-account AWS environments
Determine cost optimization and visibility strategies
Time to complete: 12-15 hours (this is the largest domain)
Prerequisites: Chapter 0 (Fundamentals) - especially networking and security sections
Exam Weight: 26% (approximately 17 questions on the actual exam)
This task covers designing network architectures for complex organizations with multiple VPCs, hybrid cloud requirements, and global presence.
Introduction
The problem: Organizations grow complex over time. They have multiple VPCs for different applications, teams, or environments. They have on-premises data centers that need to connect to AWS. They have users in multiple geographic locations. They need to segment networks for security and compliance. Traditional point-to-point connections don't scale.
The solution: AWS provides multiple services for network connectivity - VPC Peering, Transit Gateway, PrivateLink, Direct Connect, VPN, and Route 53 Resolver. Each solves specific connectivity challenges. The key is understanding when to use each service and how to combine them for optimal architecture.
Why it's tested: Network architecture is fundamental to every AWS solution. Poor network design leads to security vulnerabilities, performance issues, high costs, and operational complexity. As a Solutions Architect Professional, you must design scalable, secure, and cost-effective network architectures.
Core Concepts
VPC Peering
What it is: VPC Peering creates a direct network connection between two VPCs, allowing resources in each VPC to communicate using private IP addresses as if they were in the same network.
Why it exists: Organizations often have multiple VPCs for different purposes (production, development, different applications, different teams). VPC Peering allows these VPCs to communicate securely without going through the internet.
Real-world analogy: Think of VPC Peering like building a private bridge between two islands. Instead of taking a boat through public waters (internet), you can drive directly across the bridge (private connection).
How it works (Detailed step-by-step):
You create a peering connection request from VPC-A to VPC-B. This can be in the same account or different accounts, same region or different regions.
The owner of VPC-B accepts the peering request. Until accepted, no traffic can flow.
AWS establishes a network connection between the VPCs. This connection uses AWS's private network backbone, not the public internet.
You update route tables in both VPCs. In VPC-A, you add a route: "To reach VPC-B's CIDR (10.1.0.0/16), send traffic to the peering connection." In VPC-B, you add the reverse route.
You update security groups to allow traffic from the peer VPC's CIDR block. Security groups are stateful, so you only need to allow inbound rules.
Traffic flows directly between VPCs using private IP addresses. No NAT, no internet gateway, no public IPs needed.
AWS handles the routing automatically once route tables are configured. Traffic takes the most direct path through AWS's network.
Detailed Example 1: Development and Production VPC Peering
Scenario: You have a production VPC and a development VPC. Developers need to access a shared database in production for testing, but you want to keep the environments separate.
Setup:
Production VPC: 10.0.0.0/16 (us-east-1)
Development VPC: 10.1.0.0/16 (us-east-1)
Production Database: 10.0.50.100
Development App Server: 10.1.10.50
Implementation Steps:
Create Peering Connection:
From Development VPC, create peering request to Production VPC
Application database security group: Allow port 3306 from 10.2.0.0/16
Analytics server security group: Allow outbound to 10.0.0.0/16
Data Transfer:
Analytics server queries database across regions
Traffic uses AWS global backbone (not internet)
Latency: ~80-100ms (us-east-1 to eu-west-1)
Data transfer charges apply: $0.02/GB
Considerations:
Cross-region peering incurs data transfer charges
Higher latency than same-region peering
Useful for disaster recovery, global applications
Encryption in transit automatically enabled
Detailed Example 3: Multiple VPC Peering (Hub-and-Spoke)
Scenario: You have 4 VPCs - 1 shared services VPC and 3 application VPCs. All applications need to access shared services (Active Directory, monitoring, logging).
Setup:
Shared Services VPC: 10.0.0.0/16 (hub)
App1 VPC: 10.1.0.0/16 (spoke)
App2 VPC: 10.2.0.0/16 (spoke)
App3 VPC: 10.3.0.0/16 (spoke)
Implementation:
Create Peering Connections:
Shared Services ā App1: pcx-111
Shared Services ā App2: pcx-222
Shared Services ā App3: pcx-333
Total: 3 peering connections
Update Route Tables:
Shared Services VPC: Routes to 10.1.0.0/16, 10.2.0.0/16, 10.3.0.0/16
App1 VPC: Route to 10.0.0.0/16 only
App2 VPC: Route to 10.0.0.0/16 only
App3 VPC: Route to 10.0.0.0/16 only
Traffic Patterns:
App1 can reach Shared Services ā
App2 can reach Shared Services ā
App3 can reach Shared Services ā
App1 CANNOT reach App2 ā (no direct peering)
App2 CANNOT reach App3 ā (no direct peering)
Important Limitation: VPC Peering is NOT transitive. Even though App1 peers with Shared Services, and App2 peers with Shared Services, App1 and App2 cannot communicate through Shared Services. If you need full mesh connectivity, you'd need to peer every VPC with every other VPC (N*(N-1)/2 connections for N VPCs).
ā Must Know (Critical Facts):
VPC Peering is NOT transitive: If VPC-A peers with VPC-B, and VPC-B peers with VPC-C, VPC-A cannot reach VPC-C through VPC-B. You must create a direct peering connection between VPC-A and VPC-C.
CIDR blocks cannot overlap: You cannot peer VPCs with overlapping IP ranges. If VPC-A is 10.0.0.0/16 and VPC-B is 10.0.0.0/16, peering will fail. Plan your IP addressing carefully.
One peering connection per VPC pair: You can only have one active peering connection between any two VPCs. You cannot create multiple peering connections for redundancy.
Maximum 125 peering connections per VPC: This is a soft limit (can be increased), but it shows VPC Peering doesn't scale to hundreds of VPCs.
Cross-region peering supported: You can peer VPCs in different regions, but data transfer charges apply ($0.02/GB).
When to use (Comprehensive):
ā Use when: You have a small number of VPCs (2-10) that need to communicate
Example: Production and development VPCs, or application and shared services VPCs
Reason: Simple to set up, no additional cost (same region), low latency
ā Use when: You need the lowest possible latency between VPCs
Example: Real-time data processing between VPCs
Reason: Direct connection through AWS backbone, no intermediate hops
ā Use when: You want to avoid data transfer charges (same region)
Example: Frequent data synchronization between VPCs in same region
Reason: No charges for data transfer within same region via peering
ā Don't use when: You have many VPCs (10+) that need full mesh connectivity
Builds on VPC Fundamentals (Chapter 0) by: Extending VPC connectivity beyond a single VPC
Often used with PrivateLink (Task 1.1) to: Provide service-level connectivity instead of network-level
Troubleshooting Common Issues:
Issue 1: Peering connection created but traffic not flowing
Solution: Check route tables in both VPCs - routes must point to peering connection
Solution: Check security groups - must allow traffic from peer VPC CIDR
Solution: Check NACLs - must allow traffic (often forgotten)
Issue 2: Cannot create peering connection
Solution: Check for CIDR overlap - VPCs must have non-overlapping IP ranges
Solution: Check peering connection limit - maximum 125 per VPC
Solution: Verify IAM permissions - need ec2:CreateVpcPeeringConnection permission
AWS Transit Gateway
What it is: AWS Transit Gateway is a network transit hub that connects VPCs, on-premises networks, and remote offices through a single gateway. It acts as a cloud router, simplifying network architecture and enabling transitive routing.
Why it exists: As organizations grow, they accumulate many VPCs (10, 20, 50+). Using VPC Peering for full mesh connectivity becomes unmanageable - 50 VPCs would require 1,225 peering connections. Transit Gateway solves this by providing a central hub where all networks connect once, and the hub handles routing between them.
Real-world analogy: Think of Transit Gateway like a major airport hub. Instead of having direct flights between every pair of cities (like VPC Peering), all flights go through the hub airport. You fly from City A to the hub, then from the hub to City B. The hub handles all the routing complexity.
How it works (Detailed step-by-step):
You create a Transit Gateway in a region. It's a highly available, scalable service managed by AWS.
You attach VPCs to the Transit Gateway. Each VPC gets a "Transit Gateway attachment" which connects it to the hub.
You attach on-premises networks via VPN or Direct Connect. These also become attachments to the Transit Gateway.
You configure route tables in the Transit Gateway. These control which attachments can communicate with which other attachments.
You update VPC route tables to send traffic destined for other networks to the Transit Gateway attachment.
Transit Gateway routes traffic between attachments based on its route tables. It supports transitive routing - VPC-A can reach VPC-B through the Transit Gateway, and VPC-B can reach on-premises through the same Transit Gateway.
Traffic flows through the hub. All inter-VPC and hybrid traffic goes through Transit Gateway, which provides centralized routing, monitoring, and control.
This diagram illustrates a complete Transit Gateway deployment serving as the central networking hub for an organization. Let's examine each component and understand how they interact.
Transit Gateway (Blue, Center): The Transit Gateway (tgw-12345) sits at the center as the network hub. It's a regional service that's highly available across multiple Availability Zones automatically. Think of it as a virtual router that AWS manages for you. It has its own route tables (separate from VPC route tables) that control traffic flow between attachments.
VPC Attachments (Colored Boxes, Top): Four VPCs are attached to the Transit Gateway, each representing a different environment or application. VPC 1 (green) is Production with CIDR 10.0.0.0/16, VPC 2 (orange) is Development with 10.1.0.0/16, VPC 3 (purple) is Shared Services with 10.2.0.0/16, and VPC 4 (light orange) is Analytics with 10.3.0.0/16. Each VPC connects to the Transit Gateway through a "Transit Gateway attachment" - this is a logical connection that appears as an elastic network interface in the VPC. The bidirectional arrows show that traffic can flow in both directions.
Key Benefit - Transitive Routing: Unlike VPC Peering, Transit Gateway supports transitive routing. This means VPC 1 can communicate with VPC 2 through the Transit Gateway, VPC 2 can communicate with VPC 3, and VPC 1 can also communicate with VPC 3 - all through the same hub. You don't need direct connections between every VPC pair. With 4 VPCs, you only need 4 attachments (one per VPC) instead of 6 peering connections for full mesh.
On-Premises Connectivity (Red, Bottom Left): The diagram shows a corporate data center (192.168.0.0/16) connecting to AWS through two methods: VPN Connection and Direct Connect. Both terminate at the Transit Gateway. The VPN provides encrypted connectivity over the internet (good for backup or low-bandwidth needs), while Direct Connect provides a dedicated, high-bandwidth connection (good for primary connectivity). Having both provides redundancy - if Direct Connect fails, traffic automatically fails over to VPN.
Hybrid Cloud Routing: Here's where Transit Gateway really shines. The on-premises data center can reach ALL four VPCs through a single connection to the Transit Gateway. Without Transit Gateway, you'd need separate VPN or Direct Connect connections to each VPC, or complex routing through a "transit VPC." Transit Gateway simplifies this dramatically.
Inter-Region Connectivity (Blue, Bottom Right): The dotted line shows Transit Gateway Peering to another Transit Gateway in eu-west-1. This allows VPCs in us-east-1 to communicate with VPCs in eu-west-1 through the Transit Gateway hub. This is useful for global applications, disaster recovery, or multi-region architectures. Transit Gateway Peering uses AWS's global backbone network, not the public internet.
Traffic Flow Example: Let's trace a request from VPC 1 (Production) to the on-premises data center:
Application in VPC 1 sends packet to 192.168.1.100 (on-premises server)
Packet is forwarded to on-premises via Direct Connect (primary) or VPN (backup)
On-premises server receives packet
Response follows reverse path back to VPC 1
Centralized Management: All routing decisions happen at the Transit Gateway. You can implement network segmentation by using multiple Transit Gateway route tables. For example, you might have a "Production" route table that allows Production VPC to reach Shared Services and on-premises, but NOT Development. Development gets its own route table that allows it to reach Shared Services but NOT Production or on-premises. This provides security isolation while maintaining connectivity where needed.
Scalability: This architecture scales easily. Need to add VPC 5, 6, 7? Just attach them to the Transit Gateway and update route tables. Need to add a second data center? Attach it to the Transit Gateway. Need to connect to a partner network? Attach it. The hub-and-spoke model scales to thousands of attachments.
Detailed Example 1: Enterprise Multi-VPC Architecture
Scenario: Large enterprise with 25 VPCs across different business units, plus 3 on-premises data centers. They need full mesh connectivity between all VPCs and hybrid connectivity to all data centers.
Enable route propagation for automatic route updates
Update VPC route tables:
Single route: 0.0.0.0/0 ā Transit Gateway (or specific routes for other VPCs and on-premises)
Test connectivity: Any VPC can reach any other VPC and any data center
Benefits Realized:
375 connections reduced to 28 attachments (93% reduction)
Centralized routing and monitoring
Easy to add new VPCs or data centers
Consistent security policies
Simplified troubleshooting
Detailed Example 2: Network Segmentation with Multiple Route Tables
Scenario: Organization needs to isolate Production, Development, and Shared Services environments while allowing specific connectivity patterns.
Requirements:
Production VPCs can reach Shared Services and on-premises
Development VPCs can reach Shared Services only (NOT Production or on-premises)
Shared Services can reach everything
On-premises can reach Production and Shared Services (NOT Development)
Setup:
Transit Gateway with 3 route tables: Production-RT, Development-RT, SharedServices-RT
5 Production VPCs
3 Development VPCs
1 Shared Services VPC
1 On-premises attachment
Route Table Configuration:
Production-RT (associated with Production VPC attachments):
Routes to: Other Production VPCs, Shared Services VPC, On-premises
Does NOT route to: Development VPCs
Development-RT (associated with Development VPC attachments):
Routes to: Other Development VPCs, Shared Services VPC
Does NOT route to: Production VPCs, On-premises
SharedServices-RT (associated with Shared Services VPC attachment):
Routes to: All Production VPCs, All Development VPCs, On-premises
Can reach everything (provides shared services to all)
On-Premises-RT (associated with on-premises attachment):
Routes to: All Production VPCs, Shared Services VPC
Does NOT route to: Development VPCs
Traffic Flow Examples:
Production VPC ā Shared Services: ā Allowed
Production-RT has route to Shared Services
Traffic flows through Transit Gateway
Production VPC ā Development VPC: ā Blocked
Production-RT has no route to Development VPCs
Traffic is dropped at Transit Gateway
Development VPC ā On-premises: ā Blocked
Development-RT has no route to on-premises
Prevents developers from accessing production data
Shared Services ā Production VPC: ā Allowed
SharedServices-RT has routes to all Production VPCs
Monitoring and logging services can reach production
Security Benefits:
Network-level isolation between environments
Prevents accidental or malicious access from Development to Production
Centralized enforcement of connectivity policies
Audit trail of all routing decisions
Detailed Example 3: Multi-Region Architecture with Transit Gateway Peering
Scenario: Global application with primary region in us-east-1 and disaster recovery region in eu-west-1. Need connectivity between regions for data replication and failover.
Transit Gateway Peering: $0.05/GB for data transfer
Lower than internet-based transfer
Higher than same-region Transit Gateway ($0.02/GB)
ā Must Know (Critical Facts):
Transit Gateway supports transitive routing: Unlike VPC Peering, you can route through Transit Gateway to reach other networks. This is the key advantage.
Maximum 5,000 attachments per Transit Gateway: Scales to thousands of VPCs and connections. Soft limit, can be increased.
Supports multiple route tables: Use different route tables for network segmentation and security isolation.
Regional service: Each Transit Gateway is regional, but you can peer Transit Gateways across regions.
Bandwidth: Up to 50 Gbps per VPC attachment, 50 Gbps per VPN attachment. Scales automatically.
Pricing: $0.05/hour per attachment + $0.02/GB data processed (same region). Cross-region peering: $0.05/GB.
When to use (Comprehensive):
ā Use when: You have many VPCs (10+) that need to communicate
Example: Enterprise with 50 VPCs across multiple business units
Reason: Scales better than VPC Peering, centralized management
ā Use when: You need transitive routing
Example: VPCs need to reach on-premises through a central hub
Solution: Review traffic patterns - ensure traffic that should stay local isn't going through Transit Gateway
Solution: Consider VPC Peering for high-volume, low-latency connections between specific VPCs
Solution: Use VPC endpoints for AWS services to avoid Transit Gateway data processing charges
AWS PrivateLink
What it is: AWS PrivateLink provides private connectivity between VPCs, AWS services, and on-premises networks without exposing traffic to the public internet. It enables you to access services as if they were in your own VPC.
Why it exists: Sometimes you don't need full network-level connectivity (like VPC Peering or Transit Gateway provides). You just need to access a specific service - maybe an API, a database endpoint, or an AWS service. PrivateLink provides service-level connectivity without opening up entire networks to each other.
Real-world analogy: Think of PrivateLink like a private phone line between two offices. Instead of connecting the entire office networks together (VPC Peering), you just have a dedicated line for specific communication. The rest of the networks remain isolated.
How it works (Detailed step-by-step):
Service provider creates an endpoint service (also called VPC Endpoint Service). This exposes their application or service through a Network Load Balancer.
Service consumer creates a VPC endpoint (also called Interface Endpoint) in their VPC. This creates an elastic network interface (ENI) with a private IP address.
AWS establishes a private connection between the consumer's VPC endpoint and the provider's endpoint service. This connection uses AWS's private network, not the internet.
Consumer accesses the service using the private IP address of the VPC endpoint. Traffic never leaves AWS's network.
Provider's service receives requests through the Network Load Balancer, processes them, and sends responses back through the same private connection.
No route table changes needed in most cases. The VPC endpoint appears as a local resource in the consumer's VPC.
Detailed Example 1: SaaS Application Access via PrivateLink
Scenario: Your company uses a third-party SaaS application for customer analytics. The SaaS provider offers PrivateLink connectivity. You want to access their API from your VPCs without going over the internet.
Provider deploys their API behind a Network Load Balancer
Provider creates VPC Endpoint Service
Provider shares service name with you
Your Setup:
Create Interface VPC Endpoint in your VPC
Specify provider's service name
Select subnets where endpoint should be created (10.0.10.0/24)
Select security group (allow HTTPS from your app servers)
AWS Creates ENI:
Elastic Network Interface created in your subnet
Private IP assigned: 10.0.10.100
DNS name created: vpce-abc123.execute-api.us-east-1.vpce.amazonaws.com
Your Application Accesses Service:
App server makes HTTPS request to vpce-abc123.execute-api.us-east-1.vpce.amazonaws.com
DNS resolves to 10.0.10.100 (private IP in your VPC)
Traffic goes to VPC endpoint ENI
AWS routes traffic privately to provider's service
Provider's API processes request and responds
Response comes back through same private path
Benefits:
No internet exposure (traffic stays on AWS network)
Lower latency (direct private connection)
Better security (no public IPs needed)
Simplified network architecture (no NAT Gateway needed for this traffic)
Provider can't see your VPC structure (only sees requests)
Detailed Example 2: Accessing AWS Services via PrivateLink
Scenario: You have EC2 instances in private subnets that need to access Amazon S3 and DynamoDB. You want to avoid sending traffic through NAT Gateway (costs money and adds latency).
Setup:
VPC: 10.0.0.0/16
Private Subnet: 10.0.10.0/24 (no internet route)
EC2 Instances: Need to access S3 and DynamoDB
Implementation:
Create Gateway VPC Endpoint for S3:
Type: Gateway Endpoint (free, no ENI)
Service: com.amazonaws.us-east-1.s3
Route table: Automatically adds route for S3 prefix list ā VPC endpoint
Create Gateway VPC Endpoint for DynamoDB:
Type: Gateway Endpoint (free, no ENI)
Service: com.amazonaws.us-east-1.dynamodb
Route table: Automatically adds route for DynamoDB prefix list ā VPC endpoint
EC2 Instances Access Services:
Instance makes request to S3: aws s3 ls s3://my-bucket
Route table directs S3 traffic to Gateway Endpoint
Traffic goes directly to S3 via AWS private network
No NAT Gateway, no internet gateway, no public IPs
Same for DynamoDB requests
Cost Savings:
Without VPC Endpoints: Traffic goes through NAT Gateway
NAT Gateway: $0.045/hour + $0.045/GB processed
For 1TB/month: $45 (hourly) + $45 (data) = $90/month
With VPC Endpoints: Gateway Endpoints are free
Cost: $0/month
Savings: $90/month per NAT Gateway
Performance Benefits:
Lower latency (direct connection, no NAT hop)
Higher throughput (no NAT Gateway bandwidth limits)
More reliable (no NAT Gateway as potential failure point)
Detailed Example 3: Multi-Account Service Sharing
Scenario: You have a central shared services account with a REST API that multiple application accounts need to access. You want to provide private access without VPC Peering.
Setup:
Shared Services Account: API behind Network Load Balancer
Application Account 1: VPC 10.1.0.0/16
Application Account 2: VPC 10.2.0.0/16
Application Account 3: VPC 10.3.0.0/16
Implementation:
Shared Services Account (Service Provider):
Deploy API application
Create Network Load Balancer in front of API
Create VPC Endpoint Service
Configure acceptance: Require acceptance for connections (security)
Whitelist: Add Application Account IDs to allowed principals
Application Accounts (Service Consumers):
Each account creates Interface VPC Endpoint
Specify Shared Services' endpoint service name
Request connection
Shared Services Accepts Connections:
Review connection requests
Accept requests from known accounts
Reject unknown or unauthorized requests
Applications Access API:
Each application uses VPC endpoint DNS name
Traffic flows privately through PrivateLink
No VPC Peering needed
No Transit Gateway needed
Each account's network remains isolated
Security Benefits:
Network isolation maintained (no full VPC connectivity)
Service provider controls who can connect
Service consumer controls which subnets have access
All traffic encrypted in transit
No internet exposure
ā Must Know (Critical Facts):
Two types of VPC Endpoints: Gateway Endpoints (S3, DynamoDB, free) and Interface Endpoints (all other services, charged)
Gateway Endpoints are free: No hourly charge, no data processing charge. Always use for S3 and DynamoDB.
Interface Endpoints cost money: $0.01/hour per AZ + $0.01/GB processed. Consider cost vs NAT Gateway.
PrivateLink uses ENIs: Interface Endpoints create elastic network interfaces in your subnets with private IPs.
DNS resolution: VPC endpoints have DNS names that resolve to private IPs in your VPC.
Security groups apply: Interface Endpoints have security groups that control access.
When to use (Comprehensive):
ā Use when: You need to access AWS services from private subnets
Example: EC2 instances accessing S3, DynamoDB, or other AWS services
Reason: Avoid NAT Gateway costs, improve security and performance
ā Use when: You need service-level connectivity, not network-level
Example: Accessing a specific API or service in another VPC
Reason: More secure than VPC Peering (doesn't expose entire network)
ā Use when: You're providing a service to multiple customers/accounts
Example: SaaS provider offering private connectivity to customers
Reason: Scalable, secure, doesn't require VPC Peering with each customer
ā Use when: You need to access third-party SaaS applications privately
Example: Accessing Salesforce, Datadog, or other SaaS via PrivateLink
Reason: Better security and performance than internet-based access
ā Don't use when: You need full network connectivity between VPCs
Problem: PrivateLink is service-level, not network-level
Better solution: Use VPC Peering or Transit Gateway
ā Don't use when: Cost is primary concern and NAT Gateway is cheaper
Problem: Interface Endpoints cost $0.01/hour per AZ + data processing
Better solution: Calculate costs - for low traffic, NAT Gateway might be cheaper
Limitations & Constraints:
Interface Endpoints: Maximum 255 per VPC (soft limit)
Gateway Endpoints: Only for S3 and DynamoDB
Regional service: VPC endpoints are regional, can't access services in other regions
DNS resolution: Must enable DNS resolution in VPC for endpoint DNS names to work
Security groups: Interface Endpoints require security group configuration
Endpoint policies: Can restrict which resources/actions are accessible through endpoint
š” Tips for Understanding:
Gateway Endpoints (S3, DynamoDB) = Free, use route tables
Interface Endpoints (everything else) = Paid, use ENIs
Always use VPC endpoints for S3/DynamoDB from private subnets (free and faster)
ā ļø Common Mistakes & Misconceptions:
Mistake 1: Thinking PrivateLink provides full network connectivity
Why it's wrong: PrivateLink is service-level, not network-level
Correct understanding: Use PrivateLink for specific services, VPC Peering/Transit Gateway for full network connectivity
Mistake 2: Not using Gateway Endpoints for S3 and DynamoDB
Why it's wrong: Wastes money on NAT Gateway for traffic that could be free
Correct understanding: Always create Gateway Endpoints for S3 and DynamoDB in VPCs with private subnets
Mistake 3: Forgetting to configure security groups for Interface Endpoints
Why it's wrong: Traffic will be blocked even though endpoint is created
Correct understanding: Interface Endpoints need security groups that allow traffic from your resources
š Connections to Other Topics:
Relates to VPC Peering (Task 1.1) because: PrivateLink provides service-level alternative to network-level peering
Builds on VPC Fundamentals (Chapter 0) by: Extending VPC connectivity to services without internet exposure
Often used with Multi-Account Strategy (Task 1.4) to: Share services across accounts securely
AWS Direct Connect
What it is: AWS Direct Connect is a dedicated network connection from your on-premises data center to AWS. It provides a private, high-bandwidth, low-latency connection that doesn't use the public internet.
Why it exists: Internet connections are unpredictable - latency varies, bandwidth is shared, and security is a concern. For enterprises with significant AWS usage or strict requirements, Direct Connect provides consistent network performance and enhanced security.
Real-world analogy: Think of Direct Connect like having a private highway between your office and AWS. Instead of driving on public roads with traffic (internet), you have a dedicated lane that's always fast and reliable.
How it works (Detailed):
You order a Direct Connect connection through AWS Console. You choose a Direct Connect location (AWS facility) and connection speed (1 Gbps or 10 Gbps).
AWS provisions a port at the Direct Connect location. This is a physical network port in an AWS-managed facility.
You establish physical connectivity from your data center to the Direct Connect location. This is typically done through a telecommunications provider (cross-connect).
You create a Virtual Interface (VIF) on the Direct Connect connection. VIFs are logical connections that carry traffic:
Private VIF: Connects to VPCs via Virtual Private Gateway or Transit Gateway
Public VIF: Connects to AWS public services (S3, DynamoDB, etc.)
Transit VIF: Connects to Transit Gateway for multi-VPC access
You configure BGP (Border Gateway Protocol) to exchange routes between your network and AWS.
Traffic flows over the dedicated connection. Your on-premises resources can access AWS resources with consistent performance.
Detailed Example: Enterprise Hybrid Cloud with Direct Connect
Scenario: Large enterprise with on-premises data center needs to migrate 500TB of data to AWS and maintain ongoing hybrid connectivity for applications.
Requirements:
High bandwidth (10 Gbps)
Low latency (<10ms)
Consistent performance
Access to multiple VPCs
Redundancy for high availability
Implementation:
Order Direct Connect:
Location: Equinix DC2 (near your data center)
Speed: 10 Gbps
AWS provisions port
Establish Physical Connection:
Contract with telecom provider for cross-connect
Provider runs fiber from your data center to Equinix DC2
Provider connects to AWS port
Physical layer established
Create Transit VIF:
Connect Direct Connect to Transit Gateway
Configure BGP: Advertise on-premises routes (192.168.0.0/16) to AWS
AWS advertises VPC routes (10.0.0.0/8) to on-premises
BGP session established
Configure Transit Gateway:
Attach Direct Connect via Transit VIF
Attach 20 VPCs to Transit Gateway
Configure route tables for on-premises access
Test Connectivity:
From on-premises server, ping VPC resources
Latency: 5-8ms (excellent)
Bandwidth: 10 Gbps available
Jitter: <1ms (very stable)
Begin Data Migration:
Transfer 500TB over Direct Connect
Speed: ~10 Gbps sustained
Time: ~5 days (vs months over internet)
No internet bandwidth costs
Benefits Realized:
Consistent performance (no internet variability)
Lower latency (direct connection)
Cost savings (no data transfer out charges for Direct Connect)
Enhanced security (private connection)
Reliable for production workloads
Detailed Example: Direct Connect with Failover to VPN
Scenario: Company needs high availability for hybrid connectivity. Primary connection via Direct Connect, backup via VPN.
Setup:
Primary: Direct Connect (10 Gbps)
Backup: Site-to-Site VPN (1.25 Gbps max)
On-premises: 192.168.0.0/16
AWS VPCs: 10.0.0.0/8 (via Transit Gateway)
Implementation:
Configure Direct Connect (Primary):
Create Direct Connect connection
Create Transit VIF to Transit Gateway
BGP: Advertise routes with AS path prepending for preference
Configure VPN (Backup):
Create Site-to-Site VPN to Transit Gateway
BGP: Advertise same routes with longer AS path (lower preference)
VPN tunnels established over internet
BGP Configuration:
Direct Connect routes: AS path length 1 (preferred)
VPN routes: AS path length 3 (backup)
AWS prefers shorter AS path (Direct Connect)
Normal Operation:
All traffic flows over Direct Connect
VPN tunnels stay up but carry no traffic
Monitoring shows Direct Connect active
Failover Scenario:
Direct Connect fails (fiber cut, equipment failure)
BGP detects failure (30-90 seconds)
AWS removes Direct Connect routes
VPN routes become active
Traffic automatically fails over to VPN
Downtime: 30-90 seconds (BGP convergence time)
Recovery:
Direct Connect restored
BGP re-establishes
Direct Connect routes preferred again
Traffic fails back to Direct Connect
High Availability Achieved:
Primary path: Direct Connect (high bandwidth, low latency)
Backup path: VPN (lower bandwidth, higher latency, but available)
Automatic failover (no manual intervention)
Minimal downtime during failures
ā Must Know:
Direct Connect is NOT encrypted by default: You must use VPN over Direct Connect or application-level encryption for security
Speeds: 1 Gbps, 10 Gbps, or sub-1Gbps via hosted connections
Pricing: Port hours + data transfer out (data transfer in is free)
ā Hybrid applications with frequent AWS communication
ā Don't use for temporary or low-bandwidth needs (use VPN instead)
AWS Site-to-Site VPN
What it is: AWS Site-to-Site VPN creates an encrypted connection between your on-premises network and AWS over the internet. It's a quick, cost-effective way to establish hybrid connectivity.
Why it exists: Not every organization needs Direct Connect's bandwidth and cost. VPN provides secure hybrid connectivity using existing internet connections, with fast setup and lower cost.
Real-world analogy: VPN is like using a secure tunnel through public roads. You're still using the internet (public roads), but your traffic is encrypted and protected (tunnel).
How it works:
You create a Customer Gateway in AWS, representing your on-premises VPN device
You create a Virtual Private Gateway (VGW) and attach it to your VPC, or use Transit Gateway
You create a Site-to-Site VPN connection between Customer Gateway and VGW/Transit Gateway
AWS provisions two VPN tunnels for redundancy (active/active or active/passive)
You configure your on-premises VPN device with tunnel parameters
IPsec tunnels establish over the internet
Traffic flows encrypted through the tunnels
Detailed Example: Quick Hybrid Connectivity with VPN
Scenario: Startup needs to connect on-premises office to AWS quickly for development and testing.
Requirements:
Fast setup (days, not weeks)
Low cost
Secure connectivity
Moderate bandwidth (100-200 Mbps)
Implementation:
On-Premises Setup:
VPN device: Cisco ASA, pfSense, or AWS-compatible router
Bandwidth: Max 1.25 Gbps per tunnel (internet dependent)
Latency: Variable (depends on internet path)
Reliability: Depends on internet connection quality
ā Must Know:
Two tunnels for redundancy: AWS always provisions two tunnels
Encrypted by default: IPsec encryption included
Maximum 1.25 Gbps per tunnel: Bandwidth limited
Pricing: $0.05/hour per connection + data transfer
Setup time: Minutes to hours (vs weeks for Direct Connect)
When to use:
ā Quick setup needed (days, not weeks)
ā Lower bandwidth requirements (<1 Gbps)
ā Cost-sensitive deployments
ā Backup for Direct Connect
ā Don't use for high-bandwidth, latency-sensitive applications (use Direct Connect)
šÆ Exam Focus: Questions often ask you to choose between Direct Connect and VPN based on requirements (bandwidth, latency, cost, setup time).
Task 1.1 Summary
Key Takeaways:
VPC Peering: Direct connection between two VPCs, NOT transitive, use for 2-10 VPCs
Transit Gateway: Central hub for many VPCs, supports transitive routing, use for 10+ VPCs
PrivateLink: Service-level connectivity, use for accessing specific services privately
Direct Connect: Dedicated connection, high bandwidth, consistent performance, use for enterprise hybrid cloud
Site-to-Site VPN: Encrypted over internet, quick setup, lower cost, use for moderate bandwidth needs
Decision Framework:
Requirement
Solution
2-5 VPCs need to communicate
VPC Peering
10+ VPCs need to communicate
Transit Gateway
Access specific service in another VPC
PrivateLink
High bandwidth to on-premises (>1 Gbps)
Direct Connect
Quick hybrid connectivity (<1 Gbps)
Site-to-Site VPN
Access AWS services from private subnets
VPC Endpoints (PrivateLink)
Task 1.2: Prescribe Security Controls
This task covers implementing security controls at scale for enterprise AWS environments, including IAM, encryption, monitoring, and compliance.
Introduction
The problem: Enterprise security is complex. You have hundreds of AWS accounts, thousands of users, sensitive data across multiple services, compliance requirements (HIPAA, PCI-DSS, GDPR), and sophisticated threats. Traditional perimeter security doesn't work in the cloud. You need defense in depth, least privilege access, encryption everywhere, and continuous monitoring.
The solution: AWS provides comprehensive security services - IAM for access control, KMS for encryption, CloudTrail for auditing, Security Hub for centralized monitoring, GuardDuty for threat detection. The key is implementing these services correctly and at scale.
Why it's tested: Security is the #1 priority in AWS. Poor security leads to data breaches, compliance violations, and business disruption. As a Solutions Architect Professional, you must design secure architectures that protect data, control access, and meet compliance requirements.
Core Concepts
IAM (Identity and Access Management)
What it is: IAM controls who can access AWS resources (authentication) and what they can do (authorization). It's the foundation of AWS security.
Why it exists: Without access control, anyone could access your AWS resources. IAM ensures only authorized users and services can perform specific actions on specific resources.
Key Components:
Users: Individual people with AWS Console or API access
Groups: Collections of users with shared permissions
Roles: Temporary credentials for services or federated users
Policies: JSON documents defining permissions
How IAM Works (Detailed):
Principal (user, role, or service) makes a request to AWS
AWS authenticates the principal (verifies identity)
AWS evaluates policies attached to the principal
AWS checks resource policies (if any) on the target resource
AWS applies permission boundaries (if configured)
AWS makes decision: Allow or Deny
If allowed, action is performed; if denied, error returned
Detailed Example: Least Privilege IAM Policy
Scenario: Developer needs to deploy Lambda functions and read CloudWatch logs, but shouldn't access production databases or delete resources.
Only Lambda functions starting with "dev-" can be modified
Only CloudWatch logs for dev Lambda functions can be read
Can only pass specific execution role to Lambda
Cannot delete resources, access production, or modify IAM
ā Must Know:
Explicit Deny always wins: If any policy denies an action, it's denied regardless of allows
Default is Deny: If no policy explicitly allows an action, it's denied
Least Privilege: Grant minimum permissions needed, nothing more
Use Roles, not Users: For applications and services, always use IAM roles
MFA for sensitive operations: Require multi-factor authentication for critical actions
AWS KMS (Key Management Service)
What it is: KMS manages encryption keys used to encrypt data at rest and in transit. It provides centralized key management with audit trails.
Why it exists: Encryption is essential for data protection, but managing encryption keys is complex. KMS handles key generation, rotation, access control, and auditing.
How KMS Works:
You create a KMS key (formerly called Customer Master Key)
You define key policy controlling who can use the key
Service requests encryption (e.g., S3, RDS, EBS)
KMS generates data encryption key (DEK) using your KMS key
Service encrypts data with DEK
Service stores encrypted DEK with the data
For decryption, service sends encrypted DEK to KMS
KMS decrypts DEK (if caller has permission) and returns it
Service decrypts data with DEK
Detailed Example: S3 Encryption with KMS
Scenario: Store sensitive customer data in S3 with encryption, ensuring only authorized applications can decrypt.
Implementation:
Create KMS Key:
Key type: Symmetric
Key policy: Allow S3 service and specific IAM roles
Enable automatic key rotation (yearly)
Configure S3 Bucket:
Enable default encryption: SSE-KMS
Specify KMS key ARN
All new objects automatically encrypted
Upload Object:
Application uploads file to S3
S3 requests data key from KMS
KMS generates 256-bit AES key
S3 encrypts file with data key
S3 encrypts data key with KMS key
S3 stores encrypted file + encrypted data key
Download Object:
Application requests file from S3
S3 retrieves encrypted file + encrypted data key
S3 sends encrypted data key to KMS
KMS checks if caller has kms:Decrypt permission
If authorized, KMS decrypts data key and returns it
S3 decrypts file with data key
S3 returns plaintext file to application
Security Benefits:
Data encrypted at rest (protects against disk theft)
Centralized key management (one place to control access)
Audit trail (CloudTrail logs all KMS API calls)
Key rotation (automatic yearly rotation)
Access control (key policy + IAM policies)
ā Must Know:
KMS keys never leave KMS: Keys are stored in hardware security modules (HSMs)
Envelope encryption: Data encrypted with data key, data key encrypted with KMS key
Key policies: Control who can use keys (separate from IAM policies)
Automatic rotation: Enable for yearly key rotation (old keys still work for decryption)
Regional service: KMS keys are regional, must create keys in each region
AWS CloudTrail
What it is: CloudTrail records all API calls made in your AWS account, providing a complete audit trail of who did what, when, and from where.
Why it exists: For security, compliance, and troubleshooting, you need to know what actions were taken in your AWS account. CloudTrail provides this visibility.
How CloudTrail Works:
User or service makes API call (e.g., create EC2 instance)
CloudTrail captures event with details: who, what, when, where, result
CloudTrail writes event to S3 (encrypted, immutable)
CloudTrail sends to CloudWatch Logs (optional, for real-time monitoring)
CloudTrail sends to EventBridge (optional, for automated responses)
Detailed Example: Security Incident Investigation
Scenario: Production database was deleted. Need to find who did it and when.
Investigation Steps:
Query CloudTrail:
Event: DeleteDBInstance
Time: 2024-10-08 14:32:15 UTC
User: arn:aws:iam::123456789012:user/john.smith
Source IP: 203.0.113.45
Result: Success
Analyze Context:
Check if IP is expected (company VPN range)
Check if user should have delete permissions
Check if MFA was used (should be required for delete)
Check for other suspicious activity from same user
Take Action:
Disable user's access immediately
Restore database from backup
Review IAM policies (why did user have delete permission?)
Implement MFA requirement for destructive operations
Set up CloudWatch alarm for future delete operations
Prevention:
Require MFA for sensitive operations
Implement least privilege (users shouldn't have delete permissions)
Set up real-time alerts for critical operations
Regular access reviews
ā Must Know:
CloudTrail is regional: Must enable in each region (or use organization trail)
90-day retention: Events stored 90 days in CloudTrail console (longer in S3)
Immutable logs: Once written to S3, logs cannot be modified (use S3 Object Lock)
Management events vs Data events: Management events (API calls) free, data events (S3 object access) charged
Multi-account: Use AWS Organizations to enable CloudTrail across all accounts
AWS Security Hub
What it is: Security Hub provides a centralized view of security findings from multiple AWS services and third-party tools. It aggregates, organizes, and prioritizes security alerts.
Why it exists: Large AWS environments generate thousands of security findings from GuardDuty, Inspector, Macie, Config, and third-party tools. Security Hub consolidates these into a single dashboard with prioritization.
Automated remediation: Integrate with EventBridge and Lambda for auto-remediation
Multi-account: Master-member model for centralized monitoring
Pricing: $0.0010 per security check per month + $0.00003 per finding ingested
šÆ Exam Focus: Questions often test understanding of which security service to use for specific scenarios (IAM for access control, KMS for encryption, CloudTrail for auditing, Security Hub for centralized monitoring).
Task 1.2 Summary
Key Security Services:
IAM: Access control (who can do what)
KMS: Encryption key management
CloudTrail: API audit logging
Security Hub: Centralized security monitoring
GuardDuty: Threat detection
Security Best Practices:
Implement least privilege access
Enable MFA for sensitive operations
Encrypt data at rest and in transit
Enable CloudTrail in all regions
Use Security Hub for centralized monitoring
Automate security responses with EventBridge + Lambda
Task 1.3: Design Reliable and Resilient Architectures
Key Concepts
RTO and RPO
RTO (Recovery Time Objective): Maximum acceptable downtime
Example: "System must be back online within 4 hours"
Drives DR strategy selection and cost
RPO (Recovery Point Objective): Maximum acceptable data loss
Example: "Can lose maximum 15 minutes of data"
Drives backup frequency and replication strategy
Relationship to DR Strategies:
Strategy
RTO
RPO
Cost
Use Case
Backup & Restore
Hours to days
Hours
Lowest
Non-critical systems
Pilot Light
10s of minutes
Minutes
Low
Cost-sensitive with moderate RTO
Warm Standby
Minutes
Seconds
Medium
Business-critical applications
Multi-Site Active-Active
Seconds
None
Highest
Mission-critical, zero downtime
Disaster Recovery Strategies
1. Backup and Restore (Lowest Cost, Highest RTO/RPO):
What: Regular backups to S3, restore when needed
RTO: Hours to days (time to provision infrastructure + restore data)
RPO: Hours (backup frequency)
Cost: Very low (only pay for S3 storage)
Use when: Non-critical systems, cost is primary concern
Example: Development environments, internal tools
2. Pilot Light (Low Cost, Moderate RTO/RPO):
What: Minimal infrastructure always running (database replication), scale up during disaster
RTO: 10s of minutes (time to scale up infrastructure)
RPO: Minutes (continuous replication)
Cost: Low (minimal infrastructure + replication)
Use when: Cost-sensitive but need faster recovery than backup/restore
Example: E-commerce site with moderate traffic
3. Warm Standby (Medium Cost, Low RTO/RPO):
What: Scaled-down version of full environment always running, scale up during disaster
RTO: Minutes (time to scale up to full capacity)
RPO: Seconds (real-time replication)
Cost: Medium (running infrastructure at reduced capacity)
Use when: Business-critical applications, can tolerate brief downtime
Aurora: Multi-AZ by default, 6 copies across 3 AZs
DynamoDB: Multi-AZ by default, global tables for multi-region
ā Must Know:
Always deploy across multiple AZs for production
Use Auto Scaling for automatic recovery
RDS Multi-AZ provides automatic failover
Aurora is more resilient than standard RDS
Task 1.4: Design Multi-Account AWS Environment
Key Concepts
AWS Organizations
What it is: Service for centrally managing multiple AWS accounts. Provides consolidated billing, policy-based management, and account organization.
Why it exists: Enterprises have many AWS accounts (50, 100, 500+) for different teams, applications, and environments. Organizations provides centralized management.
Key Features:
Organizational Units (OUs): Group accounts hierarchically
Service Control Policies (SCPs): Set permission guardrails across accounts
Consolidated Billing: Single bill for all accounts, volume discounts
Account Creation: Programmatically create new accounts
Typical OU Structure:
Root
āāā Security OU
ā āāā Security Tooling Account
ā āāā Log Archive Account
āāā Infrastructure OU
ā āāā Network Account
ā āāā Shared Services Account
āāā Production OU
ā āāā Prod App 1 Account
ā āāā Prod App 2 Account
āāā Development OU
ā āāā Dev App 1 Account
ā āāā Dev App 2 Account
āāā Sandbox OU
āāā Sandbox 1 Account
āāā Sandbox 2 Account
Service Control Policies (SCPs):
Permission boundaries applied to OUs or accounts
Define maximum permissions (what's allowed)
Cannot grant permissions (only restrict)
Applied hierarchically (parent OU policies affect child accounts)
Result: Accounts in this OU can only use us-east-1 and us-west-2 regions.
ā Must Know:
SCPs don't grant permissions, only restrict
SCPs apply to all users and roles in account (including root)
Use OUs to group accounts by function, environment, or team
Consolidated billing provides volume discounts
AWS Control Tower
What it is: Service that automates setup of multi-account AWS environment based on best practices. Provides guardrails, account factory, and dashboard.
Why it exists: Setting up Organizations, OUs, SCPs, logging, and security manually is complex and error-prone. Control Tower automates this with best practices built-in.
High Availability: Always deploy across multiple AZs, use Auto Scaling, implement appropriate DR strategy based on RTO/RPO
Multi-Account Strategy: Use Organizations for centralized management, SCPs for guardrails, Control Tower for automated setup
Cost Optimization: Right-size resources, use Auto Scaling, purchase RIs/Savings Plans for steady workloads, implement tagging strategy
Self-Assessment Checklist
Before moving to Domain 2, ensure you can:
Explain when to use VPC Peering vs Transit Gateway vs PrivateLink
Describe how Direct Connect and VPN differ and when to use each
Design IAM policies following least privilege principle
Explain how KMS envelope encryption works
Choose appropriate DR strategy based on RTO/RPO requirements
Design multi-AZ and multi-region architectures
Explain AWS Organizations structure and SCPs
Describe cross-account access patterns
Choose appropriate EC2 purchasing options based on workload
Implement cost allocation using tags
Practice Questions
Test your Domain 1 knowledge:
Domain 1 Bundle 1: Questions 1-50 (target: 70%+)
Domain 1 Bundle 2: Questions 1-50 (target: 75%+)
Domain 1 Bundle 3: Questions 1-50 (target: 75%+)
If you scored below 70%:
Review sections where you struggled
Focus on ā Must Know items
Practice drawing architecture diagrams
Review decision frameworks (when to use which service)
Quick Reference Card
Network Connectivity Decision Matrix:
Scenario
Solution
2-5 VPCs
VPC Peering
10+ VPCs
Transit Gateway
Service access
PrivateLink
High bandwidth hybrid
Direct Connect
Quick hybrid setup
Site-to-Site VPN
DR Strategy Selection:
RTO
RPO
Strategy
Days
Hours
Backup & Restore
Minutes
Minutes
Pilot Light
Minutes
Seconds
Warm Standby
Seconds
None
Multi-Site Active-Active
Security Services:
Access Control: IAM
Encryption: KMS
Audit Logging: CloudTrail
Centralized Monitoring: Security Hub
Threat Detection: GuardDuty
Cost Optimization:
Monitoring: Cost Explorer, Budgets
Steady Workloads: RIs, Savings Plans
Variable Workloads: Auto Scaling
Fault-Tolerant: Spot Instances
Storage: S3 Intelligent-Tiering
Next Steps: You've completed Domain 1 (26% of exam). Continue to Domain 2 (Design for New Solutions) in file 03_domain_2_new_solutions.
š” Tip: Domain 1 is the largest domain. Take a break, review your notes, and practice with Domain 1 bundles before moving forward.
Chapter 2: Design for New Solutions
Domain Weight: 29% of exam (highest weight)
Chapter Overview
This domain focuses on designing new AWS solutions from scratch. You'll learn deployment strategies, business continuity planning, security controls, reliability patterns, performance optimization, and cost optimization for new applications.
What you'll learn:
Design deployment strategies using IaC, CI/CD, and automation
Ensure business continuity with appropriate DR strategies
Determine security controls based on requirements
Design solutions meeting reliability requirements
Meet performance objectives through proper architecture
Optimize costs while meeting solution goals
Time to complete: 12-15 hours (largest domain by weight)
Deployment: Use IaC for repeatable deployments, Blue/Green for zero-downtime, Canary for risk reduction
Business Continuity: Active-Active for lowest RTO, Aurora Global Database for global applications
Security: Implement defense in depth, encrypt data at rest and in transit, monitor continuously
Reliability: Decouple with SQS/SNS, use Auto Scaling, implement multi-AZ deployments
Performance: Cache aggressively (CloudFront, ElastiCache), use Read Replicas for read scaling
Cost: Right-size resources, use S3 Intelligent-Tiering, leverage Graviton instances
Self-Assessment Checklist
Explain CloudFormation template structure
Describe Blue/Green vs Canary deployment
Design multi-region Active-Active architecture
Implement defense in depth security
Choose appropriate caching strategy
Optimize costs using right-sizing and storage classes
Practice Questions
Domain 2 Bundle 1: Questions 1-50 (target: 70%+)
Domain 2 Bundle 2: Questions 1-50 (target: 75%+)
Domain 2 Bundle 3: Questions 1-50 (target: 75%+)
Next Steps: Continue to Domain 3 (Continuous Improvement) in file 04_domain_3_continuous_improvement.
Chapter 3: Continuous Improvement for Existing Solutions
Domain Weight: 25% of exam
Chapter Overview
This domain focuses on improving existing AWS solutions. You'll learn how to enhance operational excellence, security, performance, reliability, and cost efficiency of running systems.
What you'll learn:
Improve operational excellence through automation and monitoring
Reliability: Eliminate SPOFs, test failures, implement multi-AZ/multi-region
Cost: Right-size resources, use Spot/RIs, implement tagging, regular reviews
Self-Assessment Checklist
Set up CloudWatch monitoring and alarms
Implement automated remediation with EventBridge + Lambda
Use Secrets Manager for credential rotation
Identify and eliminate single points of failure
Implement caching for performance improvement
Right-size resources for cost optimization
Practice Questions
Domain 3 Bundle 1: Questions 1-50 (target: 70%+)
Domain 3 Bundle 2: Questions 1-50 (target: 75%+)
Next Steps: Continue to Domain 4 (Migration & Modernization) in file 05_domain_4_migration_modernization.
Chapter 4: Accelerate Workload Migration and Modernization
Domain Weight: 20% of exam
Chapter Overview
This domain focuses on migrating existing workloads to AWS and modernizing applications. You'll learn migration strategies, tools, and modernization patterns.
What you'll learn:
Select workloads for migration
Determine optimal migration approach
Design new architectures for migrated workloads
Identify modernization opportunities
Time to complete: 8-10 hours
Prerequisites: Chapters 0-3
Exam Weight: 20% (approximately 13 questions on the actual exam)
Task 4.1: Select Workloads for Migration
Key Concepts
The 7 Rs of Migration
1. Retire:
Decommission applications no longer needed
Savings: Eliminate costs entirely
Use for: Redundant, unused applications
2. Retain:
Keep on-premises (not ready for migration)
Reasons: Compliance, latency, cost
Use for: Applications requiring on-premises
3. Rehost (Lift and Shift):
Move to AWS without changes
Speed: Fastest migration
Use for: Quick migration, minimal risk
4. Relocate:
Move to AWS with minimal changes (VMware Cloud on AWS)
Speed: Fast, automated
Use for: VMware environments
5. Repurchase:
Replace with SaaS
Example: Exchange ā Microsoft 365
Use for: Standard business applications
6. Replatform (Lift, Tinker, and Shift):
Minor optimizations during migration
Example: Self-managed database ā RDS
Use for: Gain cloud benefits with minimal changes
7. Refactor/Re-architect:
Redesign application for cloud-native
Example: Monolith ā microservices
Use for: Maximum cloud benefits, long-term
ā Must Know:
Rehost: Fastest, least benefit
Replatform: Balance of speed and benefit
Refactor: Slowest, most benefit
Choose based on business priorities
Migration Assessment
AWS Migration Hub:
Centralized migration tracking
Discover on-premises resources
Track migration progress
Use for: Migration planning and tracking
AWS Application Discovery Service:
Discover on-premises applications
Map dependencies
Collect utilization data
Use for: Migration planning
Migration Evaluator:
Analyze on-premises environment
Create business case for migration
TCO analysis
Use for: Business case development
ā Must Know:
Discovery before migration
Map dependencies
Create business case
Track migration progress
Task 4.2: Determine Migration Approach
Key Concepts
Data Migration Tools
AWS DataSync:
Automated data transfer
On-premises to AWS (S3, EFS, FSx)
Up to 10 Gbps per agent
Use for: Large-scale file migrations
AWS Transfer Family:
SFTP, FTPS, FTP to S3
Managed service
Existing workflows
Use for: Partner file transfers
AWS Snow Family:
Snowcone: 8 TB, edge computing
Snowball Edge: 80 TB, compute capable
Snowmobile: 100 PB, exabyte-scale
Use for: Offline data transfer, limited bandwidth
Database Migration Service (DMS):
Migrate databases to AWS
Homogeneous (Oracle ā Oracle) or heterogeneous (Oracle ā Aurora)
Continuous replication
Use for: Database migrations
ā Must Know:
DataSync for file migrations
Snow Family for offline transfer
DMS for database migrations
Choose based on data size and bandwidth
Application Migration Tools
AWS Application Migration Service (MGN):
Automated lift-and-shift
Continuous replication
Minimal downtime
Use for: Server migrations
AWS Server Migration Service (SMS):
Automated VM migration (legacy, use MGN instead)
Incremental replication
Use for: Legacy migrations (prefer MGN)
CloudEndure Migration:
Continuous replication
Automated conversion
Use for: Large-scale migrations (now part of MGN)
ā Must Know:
MGN for server migrations (preferred)
Continuous replication minimizes downtime
Automated conversion to AWS formats
Test migrations before cutover
Task 4.3: Determine New Architecture
Key Concepts
Compute Modernization
EC2 ā Containers:
Package applications in containers
Use ECS or EKS
Benefits: Portability, efficiency, scalability
EC2 ā Serverless:
Migrate to Lambda
Event-driven architecture
Benefits: No server management, pay per use
Monolith ā Microservices:
Break into smaller services
Independent deployment
Benefits: Scalability, agility, resilience
ā Must Know:
Containers for portability
Serverless for event-driven workloads
Microservices for scalability
Choose based on application characteristics
Storage Modernization
File Servers ā EFS/FSx:
Managed file systems
Elastic capacity
Benefits: No management, high availability
Block Storage ā EBS/S3:
EBS for databases, applications
S3 for objects, backups
Benefits: Durability, scalability
Tape Backups ā S3 Glacier:
Cloud-based archive
Lower cost than tape
Benefits: Durability, accessibility
ā Must Know:
EFS for shared file storage
FSx for Windows/Lustre workloads
S3 for object storage
Glacier for archive
Database Modernization
Self-Managed ā RDS/Aurora:
Managed database service
Automated backups, patching
Benefits: Reduced management, high availability
Relational ā NoSQL:
DynamoDB for key-value
DocumentDB for documents
Benefits: Scale, performance, flexibility
Commercial ā Open Source:
Oracle ā Aurora PostgreSQL
SQL Server ā Aurora MySQL
Benefits: Cost savings, no licensing
ā Must Know:
RDS/Aurora for managed relational
DynamoDB for NoSQL
Aurora for high performance
Consider licensing costs
Task 4.4: Modernization Opportunities
Key Concepts
Serverless Adoption
Lambda Functions:
Event-driven compute
No server management
Pay per request
Use for: APIs, data processing, automation
API Gateway:
Managed API service
Throttling, caching, authentication
Use for: RESTful APIs, WebSocket APIs
Step Functions:
Orchestrate Lambda functions
Visual workflows
Use for: Complex workflows, long-running processes