More play, less book. Ask about DevOps way of automation, Ansible will pop up, sure thing. It has its bright sides, and ones that aren’t as bright, but that’s not what we’re touching today – In this article, I will try to explain the principles of Ansible role building, that you can easily introduce in practice, to build elegant, elastic, easy to use and troubleshoot code. A real joy to maintain.

I do also want this article to be understandable and simple – so that you won’t need 10 years of experience, nor a Senior job title, to reap the benefits and become better at coding. When I started, such articles helped me very much, and now I am glad to return the favor and help others.

Granulate – do not overdo

The first thing you should keep an eye out for, is granulation. Do not copy the same task to every playbook that needs it, create a “boilerplate” role to fulfill the task instead, and make the main role depend on it:

Listing. yum role boilerplate, used as a dependency of centsible.es-elastic

# dep-yum/tasks/main.yml
- name: 'put {{ yum__pkgs | to_json }} into the state {{ yum__state }}'
  yum:
    name: '{{ yum__pkgs }}'
    state: '{{ yum__state }}'
    update_cache: '{{ yum__update_cache }}'
# centsible.es-elastic/meta/main.yml
dependencies:
  - role: 'centsible.dep-yum'
    yum__pkgs: '{{ elastic__pkgs }}'

However, not every role you’ll need should be set as dependency – there’s another option, the prerequisites – roles attached in the playbook instead of main role’s meta file. What’s the difference, and when to use one or the other?

  • dependency mean that the role is crucial to the proper execution of the role it depends on. That you have to use it, in order to accomplish a bigger goal of the main role.
  • prerequisite comes into action, when you might want something done, but not really attached to the main role; basically, when the task you’d want to accomplish is out of the scope of the main role.

Listing. centsible.es-elastic-repository and centsible.es-elastic roles used as prerequisites of es-filebeat, es-packetbeat and es-auditbeat roles

- hosts: es-masters
  roles:
    - role: 'centsible.elastic-stack/centsible.es-elastic-repository'
    - role: 'centsible.elastic-stack/centsible.es-elastic'

- hosts: es-masters
  roles:
    - role: 'centsible.elastic-stack/centsible.es-filebeat'
    - role: 'centsible.elastic-stack/centsible.es-auditbeat'
    - role: 'centsible.elastic-stack/centsible.es-packetbeat'

Think of roles like they were large, heavy shopping bags, with a lot of different food products. Carrying them all at once is impossible, or at least, very, very uncomfortable and cumbersome.

So, you split them – one with your lunch goes first, as the lunchtime is near – it’s crucial for you. Then, you can grab the rest – they’re not as important, yet, you still need them for dinner and tomorrow’s menu.

This example is also great to explain why you shouldn’t over-granulate. Imagine that you split 20KG of one bag into 100 separate small bags. Things got heavier, as bags do have some weight too. Comfort? Dropped. And, you still have a problem, as because it’s even harder to carry them you cannot get all 100 bags at one go. If you’d split this into two, maybe three or four bags – sure, that would help.

The lesson is – You should split things to make work easier, but don’t split everything you touch, you’ll end up with a mini-zoo of partially useful, partially useless roles.

Roles should have one main responsibility. Do one thing, and do it right. Depend on dependencies, give roles their roles.

Make it bend

Easy to change, and easy to bend, make the role handy and bring pain to end

Me, 2020.

Ansible loves Jinja2 – it is a fact. Besides variable templating, which you already probably know, there’s much, much more – filtering variable values, basic testing, etc…

Check out this example:

Listing. Possible use cases for Jinja filters in Ansible

# Value manipulation
{{ eye_color | mandatory }} - Makes setting the 'eye_color' required
{{ variable | default(256) }} - Makes the variable value '256' if not set
{{ comment_value | quote }} - encloses the  'comment_value' in quotes
{{ (color == "Blue") | ternary('Yes','No', 'Cannot determine') }} - will be yes if color is 'Blue', no if it isn't, and 'Cannot determine' if value of 'color' is null
{{ the_address | ipv4 }} - checks if 'the_address' is an correct IPv4 address
{{ '127.0.0.1/24' | ipaddr('address') }} - will provide '127.0.0.1' 

To put it simple – Jinja makes playbooks easily reusable and very flexible, when done right.

The green reed which bends in the wind is stronger than the mighty oak which breaks in a storm.”

Confucius

Keeping things easy to bend, make them harder to break. Ansible is no exception. Templated mandatory values won’t get lost, input tested via templated checks won’t spoil anything… And there’s also the comfort of getting some values without huge code blocks or workarounds. Definitely worth a little while, spent to learn, how Jinja works.

Keep secrets… secret

Sometimes, you might run into values you would not want to disclose – API keys, SSH keys, usernames, passwords, authentication keys; maybe even private endpoint addresses… Stuff that should remain obscure, else it would pose a threat to your infrastructure’s security, for example. There are two ways to sort it out in Ansible:

  • Dead men tell no tales (If by a twist of fate you happen to be Jack Sparrow)
  • Use ansible-vault (If you’re not a Caribbean Pirate)

I’ve never captained a ship, except if you count Docker as a container sea vessel – so let’s focus on the vault. The easiest use you could leverage, is to encrypt whole files, but I’d propose a bit harder, yet much more… polite way to handle the situation: store secret variable values in a separate file, call them using variable substitution, and let Ansible take care of the rest.

Listing. Values before and after encryption via ansible-vault encrypt < secret file path >​*​

Pro-tip: the vault__ prefix makes things a bit easier to differentiate – if your variable is called shell_root_password you don’t know for sure if it comes from the vault, or not. Add the prefix, and now you always know where to look for the value.

# Dummy value, put into a yaml file, ie. secrets.yml
vault__shell_root_password: asd12345
$ANSIBLE_VAULT;1.1;AES256
33363634633461383735333035303166663332633433306338323332323461343936363864333665
6431353634613433373666373266386231303439376433650a633964323135353363383034303666
63646165373833373839373833663466356562616439356132316533653031353836373338666562
6138663961393437610a616566616436643932643534303263626535383464353063383038373966
31386236653233636466623533363063373236313530323764626265356231316232

There are a few ways to attach the secrets to your playbook – we’ll go with the simplest and easiest one this time, attachment of secrets as vars file for all hosts. That will work, and it’s common – but, as your project grows, you’ll surely need a better way to handle this – does every host and role need to know the secrets?

You’ll find something more appropriate for larger or more formal environments in the docs, for sure – the keywords are vault ID's and labeling vaults

Listing. Adding the secret values into the playbook

- hosts: '< all your target hosts >'
  vars_files:
    - '< path to secrets file, can be templated too! >/secrets.yml'

That’s it. Seriously, it’s that easy. One discomfort persists though – blind guess tells me you don’t like to provide the password each time you deploy anything.
To fix that, create a file called .vault_pass anywhere you want, then tell Ansible where it is, and it will automatically get it when needed:

Listing. Automatic vault decryption setup

# Put this into ansible.cfg - the Ansible config file
vault_password_file = < path to the .vault_pass file >

Make sure it is well guarded, and most importantly – if you are using git; add it to.gitignore or encrypt it with git-secret, otherwise you won’t protect anything!

Treat the password like a password – tell it only to those who need to know it, keep it safe and secure. It’s much easier than protecting multiple variable values, isn’t it?

Do not use vars by defaults

The basic structure of Ansible role contains two directories with variable sources.

Listing. Variables separation in a role.

├── defaults
│   └── main.yml
└── vars
    └── main.yml
  • vars/main.yml – that’s where you should set variable values, that you don’t want the playbook operator to tamper with. As a sign that they are the more “constant” ones or probably even essential to the playbook operation, and shouldn’t be changed by the “player” if he wants to be sure that everything will work as intended.
  • defaults/main.yml – those values go into use only when there’s no other source of variable value – if you simply didn’t set them anywhere else. They are pretty much intended to be set or overwritten by the “player”

And why would that be, what’s the deal? The answer lies in the so-called “Variable precedence” – basically, it means that there is an order, in which variable values are overwritten or not, depending on the type they were set as.

defaults will be nearly always overwritten, vars have to be overridden by deliberate, intentional manual action.

Keep an eye on this – maintenance of roles with proper variable segregation is much, much easier.

Ghost in the shell

Ah, yes. The shell module. It is awesome – sometimes, even mandatory – but, when you can, you should use appropriate dedicated Ansible modules instead, as they are much more predictable – those were built with automation in mind.

Using shell may yield various results (it just shoots whatever you put there into /bin/sh – this may even be destructive, watch out!) and is often impractical, as it’s easier to just set a toggle in a module, than to play with pipes, echo stuff, duct-tape output capture, escaping characters…

Listing. Setting timezone via shell command

- name: 'set timezone to {{ your_timezone }}'
  shell: 'timedatectl set-timezone {{ your_timezone }}'

Listing. Setting timezone via timezone module

- name: 'set timezone to -- {{ timezone__zone }}'
  timezone:
    name: '{{ timezone__zone }}'

Want the nail in a perfect spot, just to hang your painting? Use a proper hammer. Sledgehammer won’t do you no good. Moreover, you’ll have to explain, why there’s a giant hole in the wall.

Take “OK’s” with a grain of salt

Everybody lies.

Gregory House, House M.D

Trust is expensive – and dangerous. It’s much safer to make sure, that when you get “OK”, it’s really “OK” – systemd does not know everything.

A simple set of smoke tests can be the guarantee you need, properly implemented, will of course make the role execution longer, but also assure you, that it is reliable. Ansible handlers fit perfectly here – define tests as handlers, set notify statement in the appropriate task, and voilà.

Listing. A few handler smoke tests made for haproxy role

- name: '(smoke-test) -- service is running'
  systemd:
    name: '{{ haproxy__service_name }}'
    state: 'started'
  register: haproxy__status


- name: '(smoke-test) -- report fail'
  fail:
    msg: |
      Service {{ haproxy__service_name }} is not running.
      Output of `systemctl status {{ haproxy__service_name }}`:
      {{ haproxy__status.stdout }} {{ haproxy__status.stderr }}
  when: haproxy__status is failed


- name: '(smoke-test) -- frontend service is listening'
  wait_for:
    port: '{{ haproxy__frontend_port }}'
    delay: 3
    timeout: 5
  ignore_errors: 'true'


- name: '(smoke-test) -- backend service is listening'
  wait_for:
    host: '{{ item.address }}'
    port: '{{ item.port }}'
    delay: 3
    timeout: 5
  with_items: '{{ haproxy__backend_servers }}'
  ignore_errors: 'true'


- name: '(smoke-test) -- frontend service responds with 200 OK'
  uri:
    url: 'http://localhost:{{ haproxy__frontend_port }}'
  ignore_errors: 'true'

Listing. A smoke-test attachment.

- name: 'deploy configuration -- {{ haproxy__conf_dest }}'
  template:
    src: '{{ haproxy__conf_src }}'
    dest: '{{ haproxy__conf_dest }}'
    validate: '{{ haproxy__conf_validate_cmd }}'
  notify:
    - 'restart -- {{ haproxy__service_name }}'
    - '(smoke-test) -- service is running'
    - '(smoke-test) -- report fail'
    - '(smoke-test) -- frontend service is listening'
    - '(smoke-test) -- backend service is listening'
    - '(smoke-test) -- frontend service responds with 200 OK'

Making sure that everything’s alright is always worth the time. Remember – forewarned is forearmed. You’ll often have to predict whether something will or won’t happen – and it’s easier to fix tiny mistakes on development than huge failures on production environment.

Perfection requires style selection

To avoid confusion, interpretation problems and a lot of headaches, select one code style, and keep it. With a common standard, it is easy to catch mistypes – odd elements stand out, look conspicuous.

Someone will probably one day read, maybe review the code, stumble upon the different style, and start pondering… “Is it not in the set standard because it’s faulty, or maybe it’s monkey-patched and should be rewritten later?” There is no guarantee that you’ll be there to clear things out.

Hence, if you’ve decided on something once, stand with it. Make your code speak for yourself, so you won’t have to.

Use pre-commit pre-commit, to avoid crying post-commit

I am 100% sure that everyone has their “big brains time” once a while. You know, the little while, in which you become a mastermind, and crunch code like a supercomputer.

When the idea flow and quantity win, quality usually loses. Later, when the dust settles, you’ve got a lot of code, made up of a crystallized idea, transferred to reality. It’s working… But it’s a huge mess too…

That’s when pre-commit saves the day. To put things short – people tend to forget about rules, when in heat of passion. Machine won’t. With pre-commit you can set a list of rules and checks that will be performed on certain actions – for instance, before the commit creation, or just before git push – if the tests fail, nothing goes further, so you won’t be sending those pesky double quotes or invalid tabbing anywhere. It can even correct some issues for you, so that you don’t even have to touch anything again. Git hooks for the rescue!

Listing.A failed test example with pre-commit

(files) -- yaml files has to have yml extension....................(no files to check)Skipped
(pre-commit) -- validate manifest..................................(no files to check)Skipped
(git) -- forbid adding large files.....................................................Passed
(git) -- forbid using submodules.......................................................Passed
(git) -- forbid commiting with merge-conflicts.........................................Passed
(git) -- forbid commiting to master, develop or release directly.......................Passed
(file) -- check if executables have interpreter mentioned..........(no files to check)Skipped
(file) -- verify case-sensitive conflicts..............................................Passed
(file) -- look for empty symlinks..................................(no files to check)Skipped
(code) -- fix trailing whitespace......................................................Passed
(code) -- fix end-of-line..............................................................Failed
hookid: end-of-file-fixer

Files were modified by this hook. Additional output:

Fixing roles/centos/es-elastic/tasks/main.yml

(code) -- check document-string........................................................Passed
(code) -- verify JSON syntax...........................................................Passed
(code) -- verify YAML syntax...........................................................Passed
(code) -- sort and fix requirements-txt................................................Passed
(code) -- fix double-quote-string......................................................Passed
(security) -- detect AWS credentials...................................................Passed
(security) -- detect private keys......................................................Passed
(code) -- fix dependencies order.......................................................Passed
(code) -- check trailing-coma..........................................................Passed
(code) -- forbid tabs to be commited...................................................Passed
(code) -- replace tabs with spaces automatically.......................................Passed
(ansible) -- lint the code.............................................................Passed
(ansible) -- verify that vault files are really encrypted..............................Passed
(shell) -- lint the code...........................................(no files to check)Skipped
(shell) -- format the code.........................................(no files to check)Skipped
(code) -- lint the yaml files..........................................................Passed
[...]

As you see, the test has failed – i forgot to put a blank line after code in es-elastic role tasks. pre-commit fixed it for me, and if i run the test again, providing that there are no other issues, test will pass:

Listing.A passed test example with pre-commit

(files) -- yaml files has to have yml extension....................(no files to check)Skipped
(pre-commit) -- validate manifest..................................(no files to check)Skipped
(git) -- forbid adding large files.....................................................Passed
(git) -- forbid using submodules.......................................................Passed
(git) -- forbid commiting with merge-conflicts.........................................Passed
(git) -- forbid commiting to master, develop or release directly.......................Passed
(file) -- check if executables have interpreter mentioned..........(no files to check)Skipped
(file) -- verify case-sensitive conflicts..............................................Passed
(file) -- look for empty symlinks..................................(no files to check)Skipped
(code) -- fix trailing whitespace......................................................Passed
(code) -- fix end-of-line..............................................................Passed
(code) -- check document-string........................................................Passed
(code) -- verify JSON syntax...........................................................Passed
(code) -- verify YAML syntax...........................................................Passed
(code) -- sort and fix requirements-txt................................................Passed
(code) -- fix double-quote-string......................................................Passed
(security) -- detect AWS credentials...................................................Passed
(security) -- detect private keys......................................................Passed
(code) -- fix dependencies order.......................................................Passed
(code) -- check trailing-coma..........................................................Passed
(code) -- forbid tabs to be commited...................................................Passed
(code) -- replace tabs with spaces automatically.......................................Passed
(ansible) -- lint the code.............................................................Passed
(ansible) -- verify that vault files are really encrypted..............................Passed
(shell) -- lint the code...........................................(no files to check)Skipped
(shell) -- format the code.........................................(no files to check)Skipped
(code) -- lint the yaml files..........................................................Passed
[...]

Details are important, but they consume time. Here’s an option to both get a cookie and eat a cookie – no strings attached. Consider adding pre-commit to your daily code-crunching, you’ll surely love it.

Focus on test, be blessed

There is no way to be 100% sure, if nothing will go crazy when you do basically anything. But, solid test session can provide 90% of such assurance, if not even more.

Ansible has its own test tool – it’s called molecule. It worked, for a long time – but currently, its usability is up to a debate, due to quirks that haven’t been worked out, or worse – deteriorated from version 2 to 3.

Pros:

  • Somewhat simplified testing – a few things are taken care of.
  • python3 in use – up-to-date is good.
  • Pick your favorite test verifier – there are (or were) a few available.
  • Idempotence testing – you can check if you’ll get the same result every time, which is something you really, really want

Cons:

  • Some things you still have to take care of – want to do something else than running the roles? Write a playbook for that.
  • Python3 in use – some things changed, you’ll have to keep that in mind.
  • Favorite test verifier means either do Ansible-driven tests (If you can call simple assertions a test), or testinfra, goss support was cut from version 3.
  • If you decide on testinfra (I am not surprised), you may be forced to enact bad practices. testaid is no longer maintained; its successor, takeltest might not be the same thing.
  • docker is the default driver, even though it cannot do everything – to test some roles, you will need a full VM – which could be neatly provided by Vagrant as in molecule v2, but…
  • Vagrant driver was further separated from main molecule core and isn’t available “by default” – it is available as a separate plugin instead – you’ll have to tinker with it and with the main config to get it up, the docs won’t help you.

I’ll also try to rank up the test validators for you:

ansiblePrimitive assertion testing, done via verify.yml playbook. Nothing is comfortable, everything is wrong – why would you even do that?
gossVery, very easy to learn/use – you could make a simple test after 5-10 minutes of reading the docs, that would execute blazing fast. It’s tiny in size. New versions even support templating variables… but it’s not supported by molecule anymore.
testinfraRobust, extremely potent but much heavier, tests take more time to execute, and it’s not as easy to build them. Requires Python know-how, as tests are written in Python. Variable templating done via testaid/takeltest

Listing. Basic goss test, checking if the packages are installed

package:
  package_1:
    installed: true
  package_2:
    installed: true

Listing. Basic testinfra test (with testaid), checking if the packages are installed

# -*- coding: utf-8 -*-
import pytest
import os
import testinfra.utils.ansible_runner
import testaid

testinfra_hosts = testaid.hosts()


def test_packages(host, testvars):
    apt__pkg = host.package(testvars['apt__pkgs'])

    assert apt__pkg.is_installed

As you see, molecule is far away from perfection, and there’s a lot of difficult choices. See for yourself, if something like that suits you – maybe it’s worth to lose some time and nerves, but profit from the assurance that your roles are solid and reliable.

I do also have one more general Python tip, hence you’ll probably be dealing with it if you want to get molecule running. Use venv – it’s much better than installing everything “as is” on top of everything else and then figuring out why something doesn’t work, it’s easier to separate different python versions, manage dependencies, and clean up later when you’re done.

The best defense is to use common sense

Times we’re living in are pretty… wild. There’s a lot to be lost, a lot to be gained… I’ll provide a few tips here – to keep safe, as better safe than a drawer sorry.

  • Sensitive data, even protected by ansible-vault, isn’t safe from you, as you can disclose it by pure accident, even unintentionally. Use no_log to tell Ansible that it shouldn’t show the output of some tasks/commands and keep those “Peeping Toms” away.
  • Debug stuff where it’s safe, validate roles on VM’s or other means of testing before you’ll put them to use. You wouldn’t want to see your empire of infrastructure crumble to dust, would you?
  • Always think before you do – ask yourself, “Is there any risk of casualties, if I’ll make it this way?”
  • When something goes wrong, don’t panic – it won’t help you, actually it only makes things worse. You need to be cold as stone, quickly evaluate the facts, and take action.
  • Don’t mindlessly copy-paste things into Ansible. Learn how it works first, then evaluate if you want to use it or not. Think of such snippets, like a black box, which contents you don’t know. There might be a delicious cake there… But a hand-grenade with pin pulled too.

The end.

I hope that this article will one day prove useful to you – and I’m glad I could share my knowledge with you.

Good luck with your playbooks. Now you’re past [Gathering Facts], time to [Play] – happy tinkering!

About the author

GDPR