Please take a few minutes to complete this short survey on service testing.

Bootstrapping Docker Infrastructure With Terraform

  • 2016-02-10

Docker has gained lots of popularity in the recent years. Thanks to the movement towards microservices, we are able get docker infrastructure from all major web service providers like AWS or Google Cloud.

This post is more of a tutorial style post in which I'm planning walk you through how we can bootstrap fully operating docker infrastructure from scratch in AWS using Terraform. Managing infrastructure by hand is terrible. I'm not going to go into details why is that, but I believe Terraform is going to be one of those tools that will stick with us for a while, especially once it gets more mature. We're going to use Terraform to build our infrastructure.

Infrastructure Overview

We are going to run a simple python http server, and we want to make it highly available. Here's a list of things we're going to need:

  • Separate VPC to host our ECS cluster.
  • 2 servers in different availability zones for running docker containers.
  • A load balancer to shuffle traffic between our 2 servers.
  • ECS cluster and service task definitions.
  • Correct permissions and security groups for everything.

The list is not that long, however deploying everything from scratch requires some trial and error. Once you have that figured out you templatize this confuguration and reuse it elsewhere.

Creating VPC

Let's start by creating all VPC resources first. Creating a VPC from scratch requires a little bit of work. I'm going to use a community module for creating the VPC -- tf_aws_vpc.

module "vpc" {
    source = "github.com/terraform-community-modules/tf_aws_vpc"
    name = "ecs-vpc"
    cidr = "10.0.0.0/16"
    public_subnets  = "10.0.101.0/24,10.0.102.0/24"
    azs = "us-east-1c,us-east-1b"
}

We are creating two subnets in to different availability zones. That's launched in public subnet will get assigned public facing IP address. This is not ideal, however to use ECS in a private subnet we'd have to create a NAT instance for ecs agent to access ecs endpoints. We're not doing this today.

We also need to create some security groups.

resource "aws_security_group" "allow_all_outbound" {
    name_prefix = "${module.vpc.vpc_id}-"
    description = "Allow all outbound traffic"
    vpc_id = "${module.vpc.vpc_id}"

    egress = {
        from_port = 0
        to_port = 0
        protocol = "-1"
        cidr_blocks = ["0.0.0.0/0"]
    }
}

resource "aws_security_group" "allow_all_inbound" {
    name_prefix = "${module.vpc.vpc_id}-"
    description = "Allow all inbound traffic"
    vpc_id = "${module.vpc.vpc_id}"

    ingress = {
        from_port = 0
        to_port = 0
        protocol = "-1"
        cidr_blocks = ["0.0.0.0/0"]
    }
}

resource "aws_security_group" "allow_cluster" {
    name_prefix = "${module.vpc.vpc_id}-"
    description = "Allow all traffic within cluster"
    vpc_id = "${module.vpc.vpc_id}"

    ingress = {
        from_port = 0
        to_port = 65535
        protocol = "tcp"
        self = true
    }

    egress = {
        from_port = 0
        to_port = 65535
        protocol = "tcp"
        self = true
    }
}

resource "aws_security_group" "allow_all_ssh" {
    name_prefix = "${module.vpc.vpc_id}-"
    description = "Allow all inbound SSH traffic"
    vpc_id = "${module.vpc.vpc_id}"

    ingress = {
        from_port = 22
        to_port = 22
        protocol = "tcp"
        cidr_blocks = ["0.0.0.0/0"]
    }
}

The SSH group is just for testing, so we can login into our server. It's not really required. The other ones are just to allow all incoming/outgoing/cluster traffic. We're going to assign them to our EC2 instances and ELB.

IAM Roles

Before we move into instance configuration we have to prepare some IAM roles. There are going to be two roles: one for EC2 instance and another one for the ELB. EC2 role will have permissions to interact with ECS cluster, like register itself when a server starts up. ELB is going to load balance our docker containers and must be able ECS must be able to (de)register ECS services.

resource "aws_iam_role" "ecs" {
    name = "ecs"
    assume_role_policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": "sts:AssumeRole",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Effect": "Allow",
      "Sid": ""
    }
  ]
}
EOF
}

resource "aws_iam_policy_attachment" "ecs_for_ec2" {
    name = "ecs-for-ec2"
    roles = ["${aws_iam_role.ecs.id}"]
    policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role"
}

resource "aws_iam_role" "ecs_elb" {
    name = "ecs-elb"
    assume_role_policy = <<EOF
{
  "Version": "2008-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": "ecs.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
EOF
}

resource "aws_iam_policy_attachment" "ecs_elb" {
    name = "ecs_elb"
    roles = ["${aws_iam_role.ecs_elb.id}"]
    policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceRole"
}

ECS Setup

Here's our ECS task definition which starts our python http server. You can read more about what the parameters mean in ECS docs. For now, just put it in task-definitions/simple-service.json.

[
  {
    "name": "simple-service",
    "image": "python:2.7",
    "cpu": 0,
    "memory": 128,
    "essential": true,
    "command": [
      "python",
      "-m",
      "SimpleHTTPServer"
    ],
    "portMappings": [
      {
        "containerPort": 8000,
        "hostPort": 8000
      }
    ]
  }
]

Important thing to note here, is that we are stattically allocating port numbers. This means we can only run only instance of this service per server. We're going to launch multiple servers and use ELB to balancer traffic between them, and ECS comes with a built in support for ELB as a routing layer for docker containers.

resource "aws_ecs_cluster" "staging" {
    name = "ecs-staging"
}

resource "aws_ecs_task_definition" "simple_service" {
    family = "simple_service"
    container_definitions = "${file("task-definitions/simple-service.json")}"
}

resource "aws_elb" "simple_service_elb" {
    name = "simple-service-elb"
    subnets = ["${split(",", module.vpc.public_subnets)}"]
    connection_draining = true
    cross_zone_load_balancing = true
    security_groups = [
        "${aws_security_group.allow_cluster.id}",
        "${aws_security_group.allow_all_inbound.id}",
        "${aws_security_group.allow_all_outbound.id}"
    ]

    listener {
        instance_port = 8000
        instance_protocol = "http"
        lb_port = 80
        lb_protocol = "http"
    }

    health_check {
        healthy_threshold = 2
        unhealthy_threshold = 10
        target = "HTTP:8000/"
        interval = 5
        timeout = 4
    }
}

resource "aws_ecs_service" "simple_service" {
    name = "simple-service"
    cluster = "${aws_ecs_cluster.staging.id}"
    task_definition = "${aws_ecs_task_definition.simple_service.arn}"
    desired_count = 1
    iam_role = "${aws_iam_role.ecs_elb.arn}"
    depends_on = ["aws_iam_policy_attachment.ecs_elb"]

    load_balancer {
        elb_name = "${aws_elb.simple_service_elb.id}"
        container_name = "simple-service"
        container_port = 8000
    }
}

Our ELB is pointing to the container port that we defined in task definition and is using our IAM roles to make the ECS integration work.

EC2 Cluster Setup

If we were to run terraform now, not much would happen. We are missing critical peace -- we don't have any servers on which to run our code.

Our EC2 cluster is going to be pretty simple. We have very simple cloud init script which tells ECS agent which cluster to join:

#cloud-config
bootcmd:
 - cloud-init-per instance $(echo "ECS_CLUSTER=${cluster_name}" >> /etc/ecs/ecs.config)

Place this is templates/user_data.

To manage our EC2 cluster we're going to use an auto scaling group. Autoscaling group will ensure our service pretty much always up and running.

resource "template_file" "user_data" {
    template = "templates/user_data"
    vars {
        cluster_name = "ecs-staging"
    }
}

resource "aws_iam_instance_profile" "ecs" {
    name = "ecs-profile"
    roles = ["${aws_iam_role.ecs.name}"]
}

resource "aws_launch_configuration" "ecs_cluster" {
    name = "ecs_cluster_conf"
    instance_type = "t2.micro"
    image_id = "${lookup(var.ami, var.aws_region)}"
    iam_instance_profile = "${aws_iam_instance_profile.ecs.id}"
    security_groups = [
        "${aws_security_group.allow_all_ssh.id}",
        "${aws_security_group.allow_all_outbound.id}",
        "${aws_security_group.allow_cluster.id}",
    ]
    user_data = "${template_file.user_data.rendered}"
    key_name = "${aws_key_pair.root.key_name}"
}

resource "aws_autoscaling_group" "ecs_cluster" {
    name = "ecs-cluster"
    vpc_zone_identifier = ["${split(",", module.vpc.public_subnets)}"]
    min_size = 0
    max_size = 3
    desired_capacity = 3
    launch_configuration = "${aws_launch_configuration.ecs_cluster.name}"
    health_check_type = "EC2"
}

We're using standard ECS optimized AMIs from Amazon.

variable "ami" {
    description = "AWS ECS AMI id"
    default = {
        us-east-1 = "ami-cb2305a1"
        us-west-1 = "ami-bdafdbdd"
        us-west-2 = "ami-ec75908c"
        eu-west-1 = "ami-13f84d60"
        eu-central-1 =  "ami-c3253caf"
        ap-northeast-1 = "ami-e9724c87"
        ap-southeast-1 = "ami-5f31fd3c"
        ap-southeast-2 = "ami-83af8ae0"
    }
}

That was Easy

This is how much Terraform code you need to launch highly available service running on Docker in AWS from scratch. You can literally get an AWS account, setup admin keys and launch this infrastructure in a minute.

Bluebook - API Testing for Developers

API, end-to-end, and integration testing made simple.

Try Now

Subscribe

Subscribe to stay up to date with the latest content:

Hut for macOS

Design and prototype web APIs and services.

Download Now