On building an Ansible training environment on FreeBSD

In the course of the last three years I’ve had the pleasure of doing a lot of teaching, and it really has mostly been a pleasure. I take teaching very seriously and invest quite a bit of energy to bring my love for a particular topic across. It’s strenuous, nay exhausting, at times but worth it.

For the courses I prepare system environments on which the students get to run labs (a.k.a. practical exercises). They create and run Ansible playbooks in Ansible trainings, and they install DNS servers during DNS trainings. In the past I used VirtualBox with which to create these lab environments because it runs on multiple platforms. Preconfigured VMs with specific capabilities (e.g. for Ansible a VM contains DHCP and DNS, package repo, pip repo, git server, etc.) are created, and a small shell script unpacks the VirtualBox “appliance” (.ova file) into a number of VMs for each student which they then use as their “data center” (I love saying that). When you think about it it is a bit of a data center, if a teeny tiny one: until now there were four VMs from and onto which deployments are developed; nothing fancy, but many students need a bit of effort in understanding what’s going on.

I give trainings at customer sites, and there are clients who say “we don’t have Unix/Linux workstations; yes, we know Ansible requires Unix, but we want the training on Windows PCs”, and the shell script installer (it cannot practicably be Ansible at this point) becomes useless. The individual steps need to be typed out manually. (I could turn it into a .bat file or whatever, but my motivation for doing that has been low, knowing I would change the whole setup sometime.)

While VirtualBox has served us well in the past, it is slow, a bit brittle on Linux (closing the lid on the laptop often causes breakage the next morning; strangely that happens neither on Windows nor on macOS as far as I’ve been able to ascertain), and in particular it is cumbersome to update anything in that “data center”: I have to bring up the VMs, change whatever requires changing, package up the VMs into an .ova and copy the 8GB onto thumb drives and/or make them available for downloading.

It is high time for a change. What I envision is a laptop which I take to a customer site, power it up, plug it into a switch, and off we go: users can login when it’s time to start doing labs and experience a trouble-free Ansible environment. The laptop and its software must be easy to update, in particular the Ansible installation must be easily exchanged for a newer version just before a training. However, I also require the possibility to fall back to a prior version if something goes awry.

(I built an environment for DNS trainings using Fedora Linux and systemd-nspawn after hearing Pieter discuss systemd-nspawn at Loadays. The system, based on Btrfs has worked well (and still does), but I recall being not amused by some of the pitfalls I ran into (and then overcame). My plan is to redo that based on the work I present here, and I hope to demonstrate why.)

The following diagram depicts what we’re doing, and I intend to describe what the individual bits are and how they interact.

The controller is the Ansible controller; that’s the machine trainees login to in order to develop and then run playbooks. We give them a choice of editors, from the Standard Unix editor via joe, pine, pico, mg, through to vi and vim. Students create a “system” which requires three machines in their data center; those are the little boxes on the right hand side. Each student has three.

the new training environment

revisiting *BSD

Having mostly ignored all things *BSD during the last dozen years, I visited BSDCan and EuroBSDCon this year, and I became enamored once again with the clean simplicity of some of what I saw. (There’s a very balanced comparative introduction to FreeBSD for Linux users I can recommend you read if you want to get into the mood.)

Earlier this year I installed OpenBSD on a laptop, and while it hasn’t been able to replace my Mac, it’s a close second. I considered using OpenBSD with VMM/VMD as experiments were promising, and Mischa runs openbsd.amsterdam successfully with it, but one idea led to the next, and I decided to give FreeBSD a closer look, in particular to jails and ZFS.

Oh, and before I forget to mention it, if you want a really sexy-looking graphical UI for FreeBSD, head over to NomadBSD: I gave it a whirl and it looks gorgeous. (As you’ll read in a moment I don’t require or need a graphical workstation for this project, but if I did I would install NomadBSD.)

a few weeks in jail

(With apologies to Michael Lucas for stealing bits from the title of a talk of his.)

I suspected FreeBSD jails would be well suited for my project. Jails are a virtualization of access to file systems, users, processes, and networking. Students require a platform onto which they can SSH via Ansible and install some packages, deploy templates, create users, etc. Why not jails?

I started a bit low-level, first with jail(8) and then ezjail. I then quickly found BastilleBSD which I liked a lot, but because I got stuck with networking, I then tried iocage and have remained with it. (The record should show that my “stuck” with networking was not BastilleBSD’s fault but mine.)

A combination of Lucas’ Jails book, the iocage documentation, and a glance or two at the iocage source got me going.

The main reason for choosing iocage over BastilleBSD for jail creation is iocage’s templates which allow me to build a base jail containing all I need in it and then fire off creation of jails based on that template. (BastilleBSD also has templating – a bit like automation which is applied to the jail after it’s launched. This is practical, but for my purposes iocage templates are faster.)

The Ansible training template is created as follows, with an initial bootstrap shell script:

host# iocage create -r 12.1-RELEASE -n t-ansible0 ip4_addr="jail0|127.0.2.1"
host# cp ansi-bootstrap.sh /zroot/iocage/jails/t-ansible0/root/tmp/
host# iocage start t-ansible0
mach% /tmp/ansi-bootstrap.sh
mach% exit
host# iocage stop t-ansible0
host# iocage set template=yes notes="template for Ansible student machines" t-ansible0

That completes building the template which takes a minute or two.

In order to then build the three jails for each student I run the mk-ansi-jails.sh shell script which is generated from an Ansible template (see below):

iocage create -t t-ansible0 -n an102 \
                notes="ansible,xx" \
                quota=4G \
                resolver="domain XX.example;nameserver 10.53.1.1" \
                ip4_addr="jail0|10.53.1.102/32"
...

Apropos “stuck with networking”, that happened to me with iocage as well, but I found out what I was doing incorrectly: I had to add jail addresses to PF.

delivering packages

For setting up the Ansible template jail as well as for providing a package repository for students when they install something during lab work, we need a FreeBSD package repository, but I wanted to keep it as simple as possible: recall this is on a laptop which will be used to conduct training sessions, so I can afford to cut a few corners in terms of security: at the end of a session, we reset all the jails:

Dan taught me to make a jail for each “system”, so I do that: the jail with the package repository is lovingly called dhl (because it’s a service which delivers packages; ok, sorry), and it’s set to boot=on when iocage starts. dhl basically just runs an nginx-lite service with autoindex on, and I build the package repository proper with a Makefile. (I think I recently confessed: I love make – it’s one of Unix’ hidden gems):

LIST = screen sudo tmux bash wget curl \
       bind-tools moreutils python36 ...

all:
        ASSUME_ALWAYS_YES=yes pkg fetch -U --output . -d $(LIST)
        pkg repo .

That’s it: fetch obtains the packages with their dependencies (-d), and repo creates a repository. All I need to do within the repository directory is ensure there’s a symbolic link from pkg.txz to the latest version so that machines can bootstrap the package system.

The training jails built from the t-ansible0 iocage template use the dhl repository server with this package configuration installed on them: we disable FreeBSD’s repository and add our own. I really like the way this is implemented: no need to muck about with the FreeBSD provided configuration from /etc/pkg; simply add a new file to /usr/local/etc/pkg/ and therein disable the FreeBSD standard repository:

FreeBSD: { enabled: no }

DHL: { url: http://dhl.ansible.example:80/repo,
           enabled: yes,
           signature_type: none }

Copying packages in this way is likely simplistic to the FreeBSD folk who run Poudriere for building packages, but that would be overkill here; this lightweight copy of just the packages and their dependencies we’ll be requiring for labs is simple and sufficient.

ansible installs ansible

Each student will login to their own account on the ansible controller, irrespective of whether it runs on the host or in a dedicated jail, and we set up a dedicated login class for them:

ansible:\
   :nologin=/var/run/ansible-no:\
   :path=~/bin /bin /usr/bin /usr/local/bin /usr/local/ansible/latest/bin:\
   :hushlogin:\
   :tc=standard:

This permits us to have different Ansible installations and use $PATH and a symbolic link called latest to point students to the current version.

$ ls -l /usr/local/ansible
drwxr-xr-x  5 root  wheel   7 Dec  3 18:41 2.7.6
drwxr-xr-x  5 root  wheel   7 Dec  3 18:41 2.9.1
drwxr-xr-x  5 root  wheel   7 Dec 11 13:29 2.9.2
lrwxr-xr-x  1 root  wheel  24 Dec 11 13:30 latest -> /usr/local/ansible/2.9.2

How do those versions get there? Well, we distribute Ansible with Ansible, of course! :-)

vars:
  adir: "/usr/local/ansible"
  aver: "2.9.2"
tasks:
- name: ansible | install ansible from pip into new virtualenv
  pip:
      name: "ansible=={{ aver }}"
      virtualenv: "{{ adir }}/{{ aver }}"
      virtualenv_command: "/usr/local/bin/python3.6 -m venv"

- name: ansible | symlink current version to latest
  file:
      src: "{{ adir }}/{{ aver }}"
      dest: "{{ adir }}/latest"
      state: link
      force: true

naming things

Students shall each have a subdomain so that when a lab instructs them to create a playbook to “deploy this onto host web” they can all use that short name which will point to their specific Web server jail (one of 0, 1, or 2 in the diagram above).

We all know, don’t we, that naming things is difficult so I had a bit of fun asking on Twitter, and spent the better part of an hour grinning about all manner of suggestions. (I have of course taken note of what certain people said about me. :-) )

One of the ~~least useless~~ good ideas was elements in the periodic table, so I took it a step further and chose those which matched ccTLD names. (It turns out Tony got a list of those years ago.) The result became a small YAML “database” (yeah, I knew you’d like that!):

---
hostnames:
  - app
  - mosq
  - web
domains:
  - { code: "al", elem: "aluminium", cc: "ALBANIA" }
  - { code: "as", elem: "arsenic", cc: "AMERICAN SAMOA" }
  - { code: "au", elem: "aurum", cc: "AUSTRALIA" }
  ...

I use this data base within Ansible templates for creating Unbound DNS local data, rules for the PF firewall, for creating the users (al, as, au, …) on the Ansible controller, and for templating out the mk-ansi-jails.sh script which creates student jails:

# {{ ansible_managed }}
# This shell script is GENERATED

{% set ns = namespace(counter = 0, n = 0) %}
{% for d in domains %}
{%    for h in hostnames %}
{%      set nnn = "%d" | format(100 + ns.counter + ns.n) %}
{%      set ip = "10.53.1.%d" | format(100 + ns.counter + ns.n) %}

iocage create -t t-ansible0 -n an{{ nnn }} \
                notes="ansible,{{ d.code|lower }}" \
                quota=4G \
                resolver="domain {{ d.code|upper }}.example;nameserver 10.53.1.1" \
                ip4_addr="jail0|{{ ip }}/32"

{%      set ns.n = ns.n + 1 %}
{%    endfor %}
{%    set ns.counter = ns.counter + 10 %}{# skip to next block of 10 #}
{%    set ns.n = 0 %}

{% endfor %}

domain name stuff

Student jails are created with a resolver configuration which sets domain within the jail:

host# iocage create .... resolver="domain XX.example;nameserver 10.53.1.1"

iocage uses the resolver= setting, replacing semicolons by newlines, to install an /etc/resolv.conf file in the jail when it’s started. Each of the student jails therewith accesses their own domain.

In the shell accounts, we template out a .profile which sets $LOCALDOMAIN to the same value, e.g. XX.example, so when a student uses Ansible to ssh to www, say, they get a response to the query for www.XX.example. ($LOCALDOMAIN doesn’t make a lot of sense if I set domain in the resolver configuration, but we also use it with figlet(1), one of Unix’ more important utilities, to welcome a user on login.

figlet in operation

Both host and jails point to the same resolver on 10.53.1.1 which is an Unbound configured with local data. (The TXT records are there because I’m bound to forget what the two-letter codes mean.) This we template out from above domains.yml:

# Ansible managed
server:
        local-data:     "app.AL.example. IN A 10.53.1.100"
        local-data:     "app.AL.example. IN TXT 'element: aluminium'"
        local-data:     "app.AL.example. IN TXT 'cc: albania'"
        local-data-ptr: "10.53.1.100 app.AL.example."
        ...

Then come incoming PF port redirections for a lab with a Web server:

rdr pass inet proto tcp from any to any port { 8102 } -> 10.53.1.102 port 80 # AL aluminium
rdr pass inet proto tcp from any to any port { 8112 } -> 10.53.1.112 port 80 # AS arsenic
rdr pass inet proto tcp from any to any port { 8122 } -> 10.53.1.122 port 80 # AU aurum

The domains.yml database is also used to template out the mk-ansi-jails.sh shell script with which to create the student iocage jails (seen earlier), and lastly, I create directories, users, etc. on the Ansible controller jail:

- name: create .profile
  copy:
      content: |
          export LOCALDOMAIN="{{ item.code|upper }}.example."
          figlet $LOCALDOMAIN
      dest: "{{ homeprefix }}/{{ item.code|lower }}/.profile"
      owner: "{{ item.code|lower }}"
      mode: "0755"
  with_items: '{{ domains }}'

You get the picture.

enough resources

The training laptop is a Thinkpad T430 running FreeBSD 12.1 in 16GB of RAM. Before starting I asked around whether people thought that sort of hardware would manage 30 to 40 jails with Python in them; the response was “yeah, probably, it depends” of course. :-) The number 30 comes from 10 participants, each with 3 machines in their “data center”.

I’ve not yet had the Ansible training laptop in production, but being the ~~pessimistic~~ careful type I am, I’ve simulated some load on it, putting much more pressure on the system than I expect it to have.

I created 170 jails (over five times the amount I need) and I could have likely done many more. The following output is an almost idle T430 running 171 jails.

last pid: 21220;  load averages:  0.84,  0.89,  0.59  up 1+10:13:20    07:49:27
568 processes: 1 running, 567 sleeping
CPU:  0.1% user,  0.0% nice,  0.2% system,  0.0% interrupt, 99.7% idle
Mem: 645M Active, 1869M Inact, 7613M Wired, 5461M Free
ARC: 3586M Total, 1984M MFU, 980M MRU, 65M Anon, 35M Header, 522M Other
     1856M Compressed, 4297M Uncompressed, 2.32:1 Ratio
Swap: 2048M Total, 2048M Free

I then whipped up a very simple playbook to install a few packages and launch nginx and a BIND name server on each of those jails, using ten forks. The total runtime was 10m42s. During this time the host (i.e. laptop) remained very responsive; if I hadn’t known all those processes were running I wouldn’t have noticed. The instant the playbook finished, the system looked like this:

last pid: 17626;  load averages:  7.52,  6.33,  3.71  up 1+10:30:00    08:06:07
1112 processes:1 running, 1111 sleeping
CPU:  0.1% user,  0.0% nice,  0.2% system,  0.0% interrupt, 99.7% idle
Mem: 4163M Active, 3400M Inact, 65M Laundry, 7321M Wired, 639M Free
ARC: 4407M Total, 2453M MFU, 1210M MRU, 65M Anon, 54M Header, 625M Other
     2338M Compressed, 4536M Uncompressed, 1.94:1 Ratio
Swap: 2048M Total, 2048M Free

After a minute or two, with all the jails still running a copy of BIND and nginx, the load average fell to 0.75.

I decided to spice things up a bit and fired off the creation of a cron entry which would execute once a minute to all machines via Ansible:

#Ansible: keygenerate
* * * * * rm -f /tmp/xxx; ssh-keygen -t rsa -b 4096 -N bla -f /tmp/xxx

I ran the playbook in batches of about 50, and after the second batch, this happened:

last pid: 31375;  load averages: 150.10, 107.97, 59.07  up 1+10:59:32    08:35:39
1385 processes:154 running, 1226 sleeping, 3 stopped, 2 zombie
CPU: 99.4% user,  0.0% nice,  0.6% system,  0.0% interrupt,  0.0% idle
Mem: 4086M Active, 4071M Inact, 64M Laundry, 6921M Wired, 442M Free
ARC: 4035M Total, 2466M MFU, 824M MRU, 65M Anon, 48M Header, 632M Other
     1933M Compressed, 3702M Uncompressed, 1.91:1 Ratio
Swap: 2048M Total, 2048M Free

The machine became a wee sluggish, to put it mildly. To be honest, it was unusable. There was but one thing left to do: iocage destroy was too slow, but this killed off all the jails quickly:

jail -r $(jls | grep an | awk '{print$1;}')

After that I removed all the jails:

iocage destroy -f $(iocage list -H | awk 'substr($2, 1, 2) == "an" { print $2; }')

Running 100 SSH key generations, many of them in parallel, on 170 jails is not what this environment is built for, but I was curious. Apart from that experiment, I can say that working on the machine proper I didn’t notice any performance issues, and I don’t expect any trouble at all.

getting users logged in

I originally envisioned a VNET jail onto which trainees would login, but I think that’s more trouble than it’s worth. For one, I couldn’t get DHCP working with a VNET iocage (here again, likely something missing in PF), and on the other hand I cannot use a static IP because the machine must connect to whichever network I’m visiting with the training laptop.

Unix being Unix and a multi user system, I first decided to let users login to the host itself: they require neither root permissions nor any special privileges other than a text editor, being able to run Ansible and SSH, so there’s not much that students can kaputt. On the other hand, I hear Dan saying “one jail per service!”, but with jails on shared private IP space … ? I could configure PF to redirect a different port into the Ansible controller jail, but then I hear the Lucas guy yell

this leads to telling users things like “add 61000 to all your port numbers to find the services for jail 61. Yes, as in 61022, 61443, and so on.”

I also must consider I’ll have users with Windows only (running Putty, MobaXterm, or similar), so for now I’ve got the following options:

create training users on the host proper
create a controller jail for the users into which SSH clients jump

The users and the Ansible configuration would be identical in both cases (and we create all that is required via Ansible from our “domains.yml” data base), but the second case requires a bit of additional work, and it requires special configuration on the client side.

I easily create an SSH jump host configuration in sshd_config on the host:

host# echo training | pw useradd -n detour -c 'SSH Jump' -s /bin/csh -m -h 0 # -w none

host# tail -10 /etc/ssh/sshd_config
Match user detour
    AllowAgentForwarding no
    AllowTcpForwarding yes
    PermitTunnel no
    GatewayPorts no
    X11Forwarding no
    PubkeyAuthentication no
    PermitEmptyPasswords yes
    PasswordAuthentication yes
    ForceCommand echo 'This account is only for ProxyJump'

This handles both new and old versions of “jumping” via a bastion host with the disadvantage of having to specify two passwords (one for the jump host, the other for the target jail). But as students basically do this once a day, I think it’s just a minor inconvenience.

$ ssh -J detour@192.168.1.179 -l xx 10.53.1.20
$ ssh -o ProxyCommand="ssh -W %h:%p detour@192.168.1.179" -l xx 10.53.1.20

(The minor inconvenience can be overcome by creating the user with -w none as shown above, adding nullok to the end of the line ^auth.*required.*pam_unix\.so in /etc/pam.d/sshd, and restarting sshd; the jump host will not prompt for a password for users. But please don’t tell anybody I told you that…)

Now, while that configuration also works well for MobaXterm, the venerable Putty doesn’t support jump hosts; SOCKS to the rescue in form of ss5:

$ grep ss5 /etc/rc.conf
ss5_enable="YES"
ss5_flags="-b 0.0.0.0:1080 -u root"

$ cat /usr/local/etc/ss5/ss5.conf
set SS5_VERBOSE

auth      0.0.0.0/0  - n
permit -  0.0.0.0/0  - 10.53.1.0/24  22 -  - - -

When I configure Putty to use the SOCKS4 proxy on port 1080, I see the connection in the ss5 logs:

$ grep 202 /var/log/ss5/ss5.log
[11/Dec/2019:10:43:24 UTC] [15102] 192.168.1.202  "CONNECT" STARTED 0 0 0 (192.168.1.202:50157 -> 10.53.1.102:22)
[11/Dec/2019:10:53:45 UTC] [15102] 192.168.1.202  "CONNECT" TERMINATED 3070 2508 21 (192.168.1.202:50157 -> 10.53.1.102:22)

I could even use ss5 as a socks proxy to get Firefox to speak to the users’ lab jails …

I’ve not decided whether students should “jump” into the Ansible controller jail or whether I should make their lives easier and populate them on the host proper. I tend to the former, because resetting it all is easier at the end of a training, but the latter would be more comfortable for students.

again and again and again

Would I do this again? Gladly. Will I do this again? Yes!

Working with FreeBSD is rewarding: its documentation is very good, the software is solid, and everything makes a very polished impression. It’s not a problem that some of the packaged programs are a bit older (get a port), and neither, for the record, is the fact that a special console driver is not automatically detected.

I much appreciate some of what initially looks almost primitive (e.g. rc.conf) but in fact is inspiring through its simplicity. For example, upon reading the manual for login.conf(5) I appreciate the well thought-out integration in the system and, in particular, that what I read actually functions that way. What I mentioned earlier about overwriting the standard FreeBSD package repository by dropping a file which invalidates it into a separate directory not only makes sense, but makes a lot of things easier. That’s just one example.

Most if not all of the parts in the BSD operating systems are well thought out. As Lucas H. said recently, the software “has a long term feel to it”.

In the past years I’ve had to work with at least three distinct Linux distributions at customer sites. In no particular order these are CentOS and Debian in differing versions, and Suse Linux or SLES also in different versions.

I’ve lost count on how many different locations I have to try in order to configure networking or set up DNS resolution. Was it the INI-looking file in /etc/networks/ was it /etc/sysconfig/something, oh right, it’s *.yaml here. Or it’s NetworkManager, or this or that, and I fear that the next version of something will again rip out and replace yet another system. I’m sure there are plenty of you out there who know all this by muscle memory for thirtytween distributions, but I don’t and never will, and I can add: I simply don’t want to!

I’ve also, and this to me is much more important, lost count of the number of times I’ve not been able to find what I need in documentation bundled with the Linux distro du jour and instead “googled” for an often incorrect answer.

The BSDs may be somewhat old-fashioned compared to the newest whiz-bang Linux distro, but that suits me: I also am.

Continued…

freebsd, ansible, jail, zfs, and iocage :: 11 Dec 2019 :: e-mail

Jan-Piet Mens