Terraform Best Practices for AWS Infrastructure


After working with Terraform in production environments for several years, I’ve learned that following best practices from the start can save you countless hours of debugging and refactoring later. Here are the essential patterns and practices I recommend for managing AWS infrastructure with Terraform.

State Management: The Foundation

One of the most critical decisions you’ll make is how to manage your Terraform state. Never store state files locally in production environments.

Remote State with S3 and DynamoDB

terraform {
  backend "s3" {
    bucket         = "your-terraform-state-bucket"
    key            = "environments/production/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-state-locks"
  }
}

Why this matters: Remote state enables team collaboration and provides state locking to prevent concurrent modifications that could corrupt your infrastructure.

Module Organization: Think in Components

Structure your Terraform code using modules that represent logical infrastructure components:

├── modules/
│   ├── vpc/
│   ├── eks-cluster/
│   ├── rds/
│   └── monitoring/
├── environments/
│   ├── dev/
│   ├── staging/
│   └── production/
└── shared/
    └── data-sources/

Example VPC Module

# modules/vpc/main.tf
resource "aws_vpc" "main" {
  cidr_block           = var.cidr_block
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = merge(var.common_tags, {
    Name = "${var.environment}-vpc"
  })
}

resource "aws_subnet" "private" {
  count = length(var.private_subnets)
  
  vpc_id            = aws_vpc.main.id
  cidr_block        = var.private_subnets[count.index]
  availability_zone = data.aws_availability_zones.available.names[count.index]

  tags = merge(var.common_tags, {
    Name = "${var.environment}-private-${count.index + 1}"
    Type = "private"
  })
}

Variable Management and Validation

Use variable validation to catch configuration errors early:

variable "environment" {
  description = "Environment name"
  type        = string
  
  validation {
    condition = contains(["dev", "staging", "production"], var.environment)
    error_message = "Environment must be dev, staging, or production."
  }
}

variable "instance_types" {
  description = "Allowed EC2 instance types"
  type        = list(string)
  default     = ["t3.micro", "t3.small", "t3.medium"]
  
  validation {
    condition = alltrue([
      for instance_type in var.instance_types :
      can(regex("^t3\\.", instance_type))
    ])
    error_message = "Only t3 instance types are allowed."
  }
}

Resource Tagging Strategy

Implement a consistent tagging strategy across all resources:

locals {
  common_tags = {
    Environment   = var.environment
    Project       = var.project_name
    ManagedBy     = "terraform"
    Owner         = var.team_name
    CostCenter    = var.cost_center
    BackupPolicy  = var.backup_required ? "required" : "not-required"
  }
}

resource "aws_instance" "web" {
  # ... other configuration
  
  tags = merge(local.common_tags, {
    Name = "${var.environment}-web-server"
    Role = "web-server"
  })
}

Security Best Practices

1. Use Data Sources for AMIs

Instead of hardcoding AMI IDs, use data sources to get the latest images:

data "aws_ami" "amazon_linux" {
  most_recent = true
  owners      = ["amazon"]

  filter {
    name   = "name"
    values = ["amzn2-ami-hvm-*-x86_64-gp2"]
  }

  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }
}

2. Implement Least Privilege IAM

data "aws_iam_policy_document" "ec2_assume_role" {
  statement {
    actions = ["sts:AssumeRole"]
    
    principals {
      type        = "Service"
      identifiers = ["ec2.amazonaws.com"]
    }
  }
}

data "aws_iam_policy_document" "s3_read_only" {
  statement {
    actions = [
      "s3:GetObject",
      "s3:ListBucket"
    ]
    
    resources = [
      aws_s3_bucket.app_data.arn,
      "${aws_s3_bucket.app_data.arn}/*"
    ]
  }
}

3. Enable Encryption by Default

resource "aws_s3_bucket" "secure_bucket" {
  bucket = var.bucket_name
}

resource "aws_s3_bucket_server_side_encryption_configuration" "secure_bucket_encryption" {
  bucket = aws_s3_bucket.secure_bucket.id

  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm = "AES256"
    }
  }
}

resource "aws_s3_bucket_public_access_block" "secure_bucket_pab" {
  bucket = aws_s3_bucket.secure_bucket.id

  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

Environment-Specific Configurations

Use .tfvars files for environment-specific values:

# environments/production/terraform.tfvars
environment = "production"
instance_type = "t3.large"
min_capacity = 3
max_capacity = 10
backup_required = true

# environments/dev/terraform.tfvars
environment = "dev"
instance_type = "t3.micro"
min_capacity = 1
max_capacity = 2
backup_required = false

Lifecycle Management

Use lifecycle rules to prevent accidental resource destruction:

resource "aws_rds_instance" "database" {
  # ... configuration
  
  lifecycle {
    prevent_destroy = true
    ignore_changes  = [password]
  }
}

resource "aws_launch_template" "web" {
  # ... configuration
  
  lifecycle {
    create_before_destroy = true
  }
}

Monitoring and Outputs

Always output important resource information:

output "vpc_id" {
  description = "ID of the VPC"
  value       = aws_vpc.main.id
}

output "database_endpoint" {
  description = "RDS instance endpoint"
  value       = aws_rds_instance.database.endpoint
  sensitive   = true
}

output "load_balancer_dns" {
  description = "DNS name of the load balancer"
  value       = aws_lb.main.dns_name
}

Testing and Validation

Implement automated testing for your Terraform configurations:

#!/bin/bash
# validate-terraform.sh

echo "Formatting Terraform files..."
terraform fmt -recursive

echo "Validating Terraform configuration..."
terraform validate

echo "Running security scan..."
tfsec .

echo "Planning deployment..."
terraform plan -detailed-exitcode

Key Takeaways

  1. Always use remote state with state locking
  2. Organize code into reusable modules for consistency
  3. Implement comprehensive tagging for resource management
  4. Use data sources instead of hardcoded values
  5. Apply security best practices from the start
  6. Version your modules and use semantic versioning
  7. Test your configurations before applying to production

These practices have saved me countless hours in production environments and made infrastructure management much more predictable and secure. Start implementing them early in your Terraform journey, and your future self will thank you.


What Terraform best practices have you found most valuable? I’d love to hear about your experiences in the comments or connect with you on LinkedIn.