Automate the Cloudwith Terraform, Ansible, and DigitalOcean
Hi. I'm Brian.
— Programmer (http://github.com/napcs)
— Author (http://bphogan.com/publications)
— Musician (http://soundcloud.com/bphogan)
— Teacher
— Technical Editor @ DigitalOcean
The Plan
— Introduce Immutable Infrastructure
— Create a Server with Terraform
— Provision the Server with Ansible
— Add Another Server and a Load Balancer
— Review
Disclosure and Disclaimer
I am using DigitalOcean in this talk. I work for them. They're cool.
Want $10 of credit? https://m.do.co/c/1239feef68ae
Also we're hiring. http://do.co/careers
If you want to argue or make statements, I'll happily engage with you after the talk in exchange for a beer
Rules
— This is based on my personal experience.
— If I go too fast, or I made a mistake, speak up.
— Ask questions any time.
— If you want to argue, buy me a beer later.
Immutable Infrastructure
Changing existing servers in production results in servers that aren't quite the sameThis includes security updates! These changes result in problems that are hard to diagnose and reproduce.
Snowflake servers and Configuration Drift
"Each server becomes unique"
— So!ware updates
— Security patches
— Newer versions installed on some servers
Infrastructure as Code
Rotate machines in and out of service.
— Create processes to create new servers quickly
— Use code to destroy them and replace them when they are out of date.
How?
— Base images (cloud provider)
— Infrastructure Management tools (Terraform)
— Configuration Management tools (Ansible)
A base setup with some things preconfigured. Your cloud provider has them or you can make your own. The more complex your image is, the more testing you'll need to do and the more time it'll take to bring up a new box.
Base Images
— Ready-to-go base OS with user accounts and services
— Barebones.
— Keep it low-maintenance.
Terraform
— Tool to Create and Destroy infrastructure components.
— Uses "providers" to talk to cloud services
— Define resources with code
— Provider to use, image, size, etc
Example Terraform Resource
resource "digitalocean_droplet" "web-1" { image = "ubuntu-16-04-x64" name = "web-1" region = "nyc3" ...}
Terraform and DigitalOcean
— DigitalOcean account
— Credit card or payment method hooked up
— SSH Key uploaded to DigitalOcean
— SSH Key fingerprint
— DigitalOcean API Key
Finding your Fingerprint
Getting an API Token
Demo: Create Server with Terraform
— Set up Terraform
— Configure and Install the DigitalOcean provider
— Create a host
Set up Terraform
$ mkdir cloud_tutorial$ cd cloud_tutorial$ touch provider.tf
We have two pieces of data we need to inject. Our DO API key and our fingerprint. Set environment variables so you keep sensitive info out of your code and scripts.
Environment Variables
API key
$ echo 'export DO_API_KEY=your_digitalocean_api_token' >> ~/.bashrc
Fingerprint
$ echo 'export SSH_FINGERPRINT=your_ssh_key_fingerprint' >> ~/.bashrc
Make sure they saved!
$ . ~/.bashrc$ echo $DO_API_KEY$ echo $SSH_FINGERPRINT
Define a Provider
touch rovider.tf
variable "do_api_key" {}variable "ssh_fingerprint" {}
provider "digitalocean" { token = "${var.digitalocean_token}"}
Install provider
$ terraform init
Initializing provider plugins...- Checking for available provider plugins on https://releases.hashicorp.com...- Downloading plugin for provider "digitalocean" (0.1.3)...
Define a server
touch web-1.tf
resource "digitalocean_droplet" "web-1" { image = "ubuntu-16-04-x64" name = "web-1" region = "nyc3" monitoring = true size = "1gb" ssh_keys = [ "${var.ssh_fingerprint}" ]}
output "web-1-address" { value = "${digitalocean_droplet.web-1.ipv4_address}"}
DigitalOcean's API lets you find the images and sizes available.
Get Images and Sizes from DigitalOcean API
curl -X GET -H "Content-Type: application/json" \-H "Authorization: Bearer $DO_API_KEY" \"https://api.digitalocean.com/v2/images"
Sizes
curl -X GET -H "Content-Type: application/json" \-H "Authorization: Bearer $DO_API_KEY" \"https://api.digitalocean.com/v2/sizes"
See what will happen
$ terraform plan \-var "do_api_key=${DO_API_KEY}" \-var "ssh_fingerprint=${SSH_FINGERPRINT}"
Apply!
$ terraform apply \-var "do_api_key=${DO_API_KEY}" \-var "ssh_fingerprint=${SSH_FINGERPRINT}"
...
Apply complete! Resources: 1 added, 0 changed, 0 destroyed.
Outputs:
web-1-address = 159.89.179.202
Demo
Ansible lets you define how things should be set up on your servers. It's designed to be idempotent, so you can run the same script over and over. Ansible will only change what needs changing. If you have more than one machine, you can run the commands on many machines at once. And you can define roles or use existing roles to add additional functionality.
Provision Server with Ansible
— Idempotent machine setup
— Define how things should be, not necessarily what to do
— Supports parallel execution
— Only needs SSH and Python on target machine
— Supports code reuse through roles
Provision with Ansible
— Create Ansible configuration
— Create a configuration file
— Create an inventory file listing your servers
— Define a "playbook" of tasks
— Run the playbook.
The Inventory
— Lists all the hosts Ansible should work with
— Lets you put them into groups
— Lets you specify per-host or per-group options (keys, users, etc)
A Playbook
---- hosts: all remote_user: deploy gather_facts: false
tasks: - name: Update apt cache apt: update_cache=yes become: true
- name: Install nginx apt: name: nginx state: installed become: true
Demo: Creating a Web Server with Ansible
— Create a deploy user
— Install Nginx
— Upload a Serve Block (virtual host)
— Create web directory
— Enable server block
— Upload web page
Ansible connects to your servers using SSH and uses host key checking. When you first log in to a remote machine with SSH, the SSH client app will ask if you want to add the server to your "known hosts." If you have to rebuild your server, or add a new server, you'll get this prompt when Ansible tries to connect. It's a nice security feature, but you should turn it off. Add this section to the new file:By default, Ansible makes a new SSH connection for each command it runs. This is slow. As your playbooks get larger, this will take more time. You can tell Ansible to share SSH connections using pipelining. However, this requires your servers to disable the requiretty for sudo users.
Create ansible.cfg
touch ansible.cfg
[defaults]host_key_checking = False
[ssh_connection]pipelining = True
Ansible uses an inventory file to list out the servers. We're going to start with one. First we define a host called web-1 and assign it the IP address of our machine. We need to tell Ansible what private key file we want to use to connect to the server over SSH, and since we'll use the same one for all our servers, we'll create a group called servers. We put the web-1 host in the servers group, and then we create variables for the servers. We're using Ubuntu 16, which only ships with Python3. Ansible uses python2 by default, so we're just telling Ansible to use Python3 for all members of the servers group.
Creating an Inventory
touch inventory
web-1 ansible_host=xx.xx.xx.xx
[servers]web-1
[servers:vars]ansible_private_key_file='/Users/your_username/.ssh/id_rsa'ansible_python_interpreter=/usr/bin/python3
Creating a Playbook
touch playbook.yml
---- hosts: all remote_user: root
Adding a User
— Use the user module to add the user
— Can only use hashed passwords in playbooks
— Get a hashed password
Getting the password with Python
$ pip install passlib
$ python -c "from passlib.hash import sha512_crypt; import getpass; print sha512_crypt.using(rounds=5000).hash(getpass.getpass())"
(command taken shamelessly from Ansible docs)
This sets the username to deploy, sets the password, and adds the user to the sudo group. It also sets up the shell. The append option says to add the new group, rather than replacing any existing groups. Finally, we're telling Ansible not to ever change the password on subsequent runs. We want the state to be the same every time. If we need to change the password, we'll provision a new server from scratch and decommission this one.
Task to Create User
The password is d3ploy
tasks: - name: Add deploy user and add to sudoers user: name: deploy password: $6$zsQNYitEkWYJzVYj$/6sa8XlOAbfWAtn2S7ww1ok.w1ipqQ1dfHY1Mlo6f9p/xFsp1sp0N9grxLyN6qMcnlvyx266vbPczJd0EacOC1 groups: sudo append: true shell: /bin/bash update_password: on_create
Run the playbook
ansible-playbook -i inventory.txt playbook.yml
PLAY [all] *********************************************************************
TASK [Gathering Facts] *********************************************************ok: [web-1]
TASK [Add deploy user and add to sudoers] **************************************changed: [web-1]
PLAY RECAP *********************************************************************web-1 : ok=2 changed=1 unreachable=0 failed=0
On DigitalOcean, once you upload a public key to your account, password logins are disabled for all your users. The root user already gets your public key added, but subsequent users need your public key too. Ansible has a module for uploading your public key to a user.
Add public key auth for user
- name: add public key for deploy user authorized_key: user: deploy state: present key: "{{ lookup('file', '/Users/your_username/.ssh/id_rsa.pub') }}"
Since the user is already there, Ansibe won't try creating it again. But it will add the key:
Apply the change to the server
$ ansible-playbook -i inventory.txt playbook.yml
TASK [Add deploy user and add to sudoers] **************************************ok: [web-1]
TASK [add public key for deploy user] ******************************************changed: [web-1]
PLAY RECAP *********************************************************************web-1 : ok=3 changed=1 unreachable=0 failed=0
Adding the Webserver Tasks
— Install package
— Update config file
— Create web directory
— Upload home page
We're creating another section in our file that sets a new remote user. Then we define a new set of tasks, and define a task that uses the apt module. We then add become: true to tell Ansible it should execute the command with sudo access.
Update Cache
- hosts: all remote_user: deploy gather_facts: false
tasks: - name: Update apt cache apt: update_cache=yes become: true
In order to use sudo, you have to provide a password. Ansible is non-interactive, so if you try to run the playbook, it'll stall out and error saying there was no password provided. You provide the password for sudo access by adding the --ask-become-pass flag.
Run Ansible and apply changes
ansible-playbook -i inventory.txt playbook.yml \--ask-become-pass
SUDO password:
PLAY [all] *********************************************************************
...
TASK [Update apt cache] ********************************************************
changed: [web-1]
...
Let's install the Nginx web server on our box and set up a new default web site. Once again, use the apt module for this.
Installing Software
- name: Install nginx apt: name: nginx state: installed become: true
Now we'll create the new website directory by using the file module to create /var/www/example.com and make sure it's owned by the deploy user and group. This way we can manage the content in that directory as the deploy user rather than as the root user.
Create the Web Directory
- name: Create the web directory file: path: /var/www/example.com state: directory owner: deploy group: deploy become: true
We need to remove the default site. Nginx on Ubuntu stores server block configuration files in the /etc/nginx/sites-available directory. When a site is enabled, a symbolic link is created from that folder to /etc/nginx/sites-enabled. To disable a site, you remove the symlink from /etc/nginx/sites-enabled. This makes it easy to enable and disable configurations as needed.
Disabling the default web site
— Web site definitions are in /etc/nginx/sites_available
— Live sites are in /etc/nginx/sites_enabled
— Live sites are symlinks from sites_available to sites_enabled
— Remove the symlink to disable a site.
We're checking to see if there's no file in the destination. If it's absent, we're good. If it's not, Ansible will remove it.
Task to remove the default site
- name: Disable `default` site file: src: /etc/nginx/sites-available/default dest: /etc/nginx/sites-enabled/default state: absent notify: reload nginx become: true
The notify directive lets us tell Ansible to fire a handler. A handler is a task that responds to events from other tasks. In this case, we're saying "we've dropped the default Nginx web site configuration, so reload Nginx's configuration to make the changes stick.To make this work, we have to define the handler that explains how this works.
Handlers
notify: reload nginx
Defining a Handler
tasks: ...
handlers: - name: reload nginx service: name: nginx state: reloaded become: true
Install and Configure nginx
$ ansible-playbook -u deploy -i inventory playbook.yml --ask-become-pass
TASK [Update apt cache] ********************************************************changed: [web-1]
TASK [Install nginx] ***********************************************************changed: [web-1]
TASK [Create the web directory] ************************************************changed: [web-1]
TASK [Disable `default` site] **************************************************ok: [web-1]
Templates
— Local files we can upload to the server
— Can use variables to change their contents
— Uses the Jinja language
Creating the Server Block with a Template
touch site.conf
server { listen 80; listen [::]:80;
root /var/www/example.com/; index index.html;
server_name example.com
location / { try_files $uri $uri/ =404; }}
This task uses the template module, which uploads the template to the location on the server. Templates can have additional processing instructions which we'll look at later. Right now we'll just upload the file as-is.
Upload the file to the server
- name: Upload the virtual host template: src: site.conf dest: /etc/nginx/sites-available/example.com become: true
Enable the new host
- name: Enable the new virtual host file: src: /etc/nginx/sites-available/example.com dest: /etc/nginx/sites-enabled/example.com state: link become: true notify: reload nginx
Make a home page
touch index.html.j2
<!DOCTYPE html>
<html lang="en-US">
<head>
<meta charset="utf-8">
<title>Welcome</title>
</head>
<body>
<h1>Welcome to my web site</h1>
</body>
</html>
This time we don't use become: true because we want the file owned by the deploy user, and we've already made sure the /var/www/example.com directory is owned by the deploy user.
Upload the file
- name: Upload the home page template: src: index.html dest: /var/www/example.com
Deploy the site
ansible-playbook -u deploy -i inventory playbook.yml --ask-become-pass
Roles
— Reusable compnents
— Tasks
— Templates
— Handlers
— Sharable!
The tasks folder contains the task definitions. The handlers folder contains the definitions for our handlers, and the templates folder holds our template files. Create this structure:
Anatomy of a Role
▾ role_name/ ▾ handlers/ main.yml ▾ tasks/ main.yml ▾ templates/ some_template.j2
Create a role for our server
— Create website role
— Move tasks, handlers, and templates out of our playbook
— Add the role to the playbook
Create the Role structure
$ mkdir -p roles/website/{handlers,tasks,templates}$ touch roles/website/{handlers,tasks}/main.yml$ mv {index.html,site.conf} roles/website/templates
Move handler into roles/website/handlers/main.yml
---- name: reload nginx service: name=nginx state=reloaded become: true
Move tasks into roles/website/tasks/main.yml
---- name: Update apt cache apt: update_cache=yes become: true
- name: Install nginx apt: name: nginx state: installed become: true
- name: Create the web directory
...
- name: Enable the new virtual host file: src: /etc/nginx/sites-available/example.com dest: /etc/nginx/sites-enabled/example.com state: link become: true notify: reload nginx
Add role to playbook
- hosts: all remote_user: deploy gather_facts: false
# all other stuff moved into the role
roles: - website
Make sure it still works!
$ ansible-playbook -u deploy -i inventory playbook.yml --ask-become-pass
Demo: Roles
Scaling Out
— Add another host
— Add a load balancer
We'll just clone the web-1 definition using sed and replace all occorrances of web-1 with web-2.
Create a web-2.tf file
sed -e 's/web-1/web-2/g' web-1.tf > web-2.tf
Create web-2
$ terraform apply \-var "digitalocean_token=${DO_WORK_TOKEN}" \-var "ssh_fingerprint=${SSH_FINGERPRINT}"
Update Inventory
web-1 ansible_host=xx.xx.xx.xxweb-2 ansible_host=xx.xx.xx.yy
[servers]web-1web-2
...
Provision the servers
$ ansible-playbook -u deploy -i inventory playbook.yml \--ask-become-pass
Add a Load Balancer
— Floating IP
— Two HAProxy or Nginx instances
— Each instance monitoring the other
— Each instance pointing to web-1 and web-2
OR
— Digital Ocean Load Balancer
We define the forwarding rule and a health check, and then we specify the IDs of the Droplets we want to configure.
Add a DO Load Balancer with Terraform
touch loadbalancer.tr
resource "digitalocean_loadbalancer" "web-lb" { name = "web-lb" region = "nyc3"
forwarding_rule { entry_port = 80 entry_protocol = "http"
target_port = 80 target_protocol = "http" }
healthcheck { port = 22 protocol = "tcp" }
droplet_ids = ["${digitalocean_droplet.web-1.id}","${digitalocean_droplet.web-2.id}" ]}
Show Load Balancer IPloadbalancer.tf
output "web-lb-address" { value = "${digitalocean_loadbalancer.web-lb.ip}"}
Apply!
$ terraform apply \-var "do_api_key=${DO_API_KEY}" \-var "ssh_fingerprint=${SSH_FINGERPRINT}"
...
Outputs:
web-1-address = xx.xx.xx.xx
web-2-address = xx.xx.xx.yy
web-lb-address = xx.xx.xx.zz
And you have your infrastructure.
Demo
Tear it down
$ terraform destroy \-var "do_api_key=${DO_API_KEY}" \-var "ssh_fingerprint=${SSH_FINGERPRINT}"
Rebuild
$ terraform destroy \-var "do_api_key=${DO_API_KEY}" \-var "ssh_fingerprint=${SSH_FINGERPRINT}"
Add IPs to inventory... then:
$ ansible-playbook -u deploy -i inventory playbook.yml \--ask-become-pass
Going Forward
— Add more .tf files for your infra
— Add them to your loadbalancer.tf file
— Add new IPs to Inventory
— Provision them with Ansible
— Remove old hosts from loadbalancer when you make config changes or need security patches
— Investigate Ansible variables to handle domains, user accounts, passwords, etc.
— Add new IPs to inventory automatically using Terraform's provisioner
Things I learned
— Using other people's Ansible roles is awful
— Build everything from scratch and read the docs
— Ansible module docs are great... if you know what module you need.
— StackOverflow is full of deprecated syntax. Use the Ansible Docs!
— Don't be clever. Be explicit. DRY rule isn't always preferred. Or good.
Questions?
— Slides: https://bphogan.com/presentations/automate2018
— Twitter: @bphogan
Reminder:$10 of DO credit? https://m.do.co/c/1239feef68ae