43% Lower AWS Bills

A Cloud Cost Engineering Playbook Built From Real $380K/Year Optimization Engagements

BBT

Bhakta Bahadur Thapa

AI Cloud DevOps

January 17, 2026
7 min read
AWSFinOpsCost OptimizationCloud ArchitectureTerraform

The first thing we do on every cloud cost engagement is run a simple query: list all EC2 instances where average CPU utilization over the last 30 days was under 5%. Last month, this query returned 47 instances for a Series B FinTech. Running cost: $23,400/month. Some had been running since 2021. Nobody knew what they were for. Total waste identified in the first 48 hours of that engagement: $31,800/month.

Cloud sprawl is not a technology problem. It is an organizational problem with a technology solution. Engineers provision resources quickly. Decommissioning is friction. Nobody owns the AWS bill at a granular level. Over time, waste compounds. The average company wastes 32% of their cloud spend (Flexera 2025 State of the Cloud Report). For a company spending $1M/year on cloud, that's $320K you're lighting on fire.

Before You Optimize

Never optimize without observability. You need to know what each dollar is buying before you cut it. Resource tagging, cost allocation tags, and AWS Cost Anomaly Detection must be in place before you make changes. Blind cost cutting breaks production.

The 7-Layer Cloud Cost Framework

Layer 1: Zombie Resource Elimination

Idle EC2 instances, unattached EBS volumes, orphaned Elastic IPs, forgotten RDS instances in stopped state (you still pay for storage), test environments left running over weekends — these are almost always the fastest wins. Use AWS Trusted Advisor, Cost Explorer's wastage reports, and tools like Infracost to surface these automatically.

Layer 2: Right-Sizing

Over-provisioning is the default behavior of every engineering team that has ever been paged for a memory OOM. Right-sizing requires 30 days of CloudWatch metrics, an understanding of peak vs average utilization, and the organizational courage to reduce instance sizes. AWS Compute Optimizer does the analysis. Implementing recommendations typically yields 15–30% savings.

Layer 3: Savings Plans & Reserved Instances

For stable, predictable workloads — your core application servers, primary databases — Compute Savings Plans provide up to 66% discount over On-Demand in exchange for 1–3 year commitment. This is not complex. It requires baseline utilization data and a purchasing decision. In practice, most companies delay this because it requires cross-team budget conversations. Don't let that stop you — the savings are enormous.

bash
#!/bin/bash
# cloud-cost-audit.sh — A starting point for AWS cost archaeology

# 1. Find idle EC2 instances (< 5% CPU last 30 days)
echo "=== Idle EC2 Instances ==="
aws ec2 describe-instances \
  --query 'Reservations[].Instances[?State.Name==`running`].[InstanceId,InstanceType,Tags[?Key==`Name`].Value|[0]]' \
  --output table

# Then cross-reference with CloudWatch metrics — script this for scale:
aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=InstanceId,Value=i-XXXXXXXXX \
  --start-time $(date -u -d '30 days ago' +%Y-%m-%dT%H:%M:%SZ) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
  --period 2592000 \
  --statistics Average \
  --output json | jq '.Datapoints[0].Average'

# 2. Unattached EBS volumes
echo "=== Unattached EBS Volumes ==="
aws ec2 describe-volumes \
  --filters Name=status,Values=available \
  --query 'Volumes[*].[VolumeId,Size,CreateTime,Tags[?Key==`Name`].Value|[0]]' \
  --output table

# 3. Orphaned Elastic IPs
echo "=== Unattached Elastic IPs ==="
aws ec2 describe-addresses \
  --query 'Addresses[?AssociationId==null].[PublicIp,AllocationId]' \
  --output table

# Monthly EIP cost: $3.65/month each. Sounds small. At scale it adds up.

Layer 4: Spot Instances for Non-Critical Workloads

Spot instances offer up to 90% savings on On-Demand pricing for workloads that can tolerate interruption. CI/CD runners, batch processing, dev environments, data pipeline workers — all excellent Spot candidates. With proper instance diversification and Spot interruption handling, Spot is highly available for these use cases.

Layer 5: Data Transfer Optimization

AWS charges for data leaving its network. Cross-AZ data transfer costs $0.01/GB in both directions — trivial per request, catastrophic at scale. Audit your inter-AZ traffic. Many Kubernetes services have AZ-unaware load balancing, meaning every request might traverse an AZ boundary. Topology-aware routing and AZ-pinning for appropriate workloads typically saves 10–20% on networking costs.

Layer 6: Storage Lifecycle Policies

S3 Standard costs $0.023/GB. S3 Glacier costs $0.004/GB. Your application logs from 2022 are almost certainly still in S3 Standard because no one set a lifecycle policy. Intelligent-Tiering, lifecycle transitions, and automated expiry of old artifacts (build artifacts, old backups, log archives) typically represent 15–25% of storage costs at mid-size companies.

Layer 7: Cost Governance Culture

The most sustainable cost optimization is cultural. Teams that see their infrastructure costs, that have per-team cost dashboards, and that include infrastructure cost in their engineering metrics spend 30–40% less than teams that don't. The technology is easy. The culture change is hard and essential.

Real Numbers

Across our last 8 cloud cost engagements: average waste identified = 38% of monthly spend. Average savings achieved within 90 days = 28%. Average annual value delivered = $240K. Largest single engagement: $1.1M/year savings across a 47-service AWS environment.

BBT

Written by Bhakta Bahadur Thapa

AI Cloud DevOps

Back to Altitude