Monday, May 15, 2017

Centrally managed Bhyve infrastructure with Ansible, libvirt and pkg-ssh

At work we've been using Bhyve for a while to run non-critical systems.  It is a really nice and stable hypervisor even though we are using an earlier version available on FreeBSD 10.3. This means we lack Windows and VNC support among other things, but it is not a big deal.

After some iterations in our internal tools, we realised that the installation process was too slow and we always repeated the same steps. Of course,  any good sysadmin will scream "AUTOMATION!" and so did we. Therefore, we started looking for different ways to improve our deployments.

We had a look at existing frameworks that manage Bhyve, but none of them had a feature that we find really important: having a centralized repository of VM images. For instance, SmartOS applies this method successfully by having a backend server that stores a catalog of VMs and Zones, meaning that new instances can be deployed in a minute at most. This is a game changer if you are really busy in your day-to-day operations.

Since we are not great programmers, we decided to leverage the existing tools to achieve the same results. This is,  having a centralised repository of Bhyve images in our data centers.  The following building blocks are used:

  • We write a yml dictionary to define the parameters needed to create a new VM:
    • VM template (name of the pkg that will be installed  in /bhyve/images)
    • VM name, cpu, memory, domain template, serial console, etc.
  • This dictionary will be kept in the corresponding host_vars definition that configures our Bhyve host server.
  • The Ansible playbook:
    • installs the package named after the VM template (ZFS snapshot).e.g. pkg install FreeBSD-10.3-RELEASE-ZFS-20G-20170515.
    • uses cat and zfs receive to load the ZFS snapshot in a new volume.
    • calls the libvirt modules to automatically configure and boot the VM.
  • The Sysadmin logs in the new VM and adjusts the hostname and network settings.
  • Run a separate Ansible playbook to configure the new VM as usual.
Once automated, the installation process needs 2 minutes at most, compared with the 30 minutes needed to manually install VM plus allowing us to deploy many guests in parallel.


Tuesday, January 3, 2017

OpenNTPD, leap seconds and other horror stories

In case you are not informed, there was a leap second on December 31, 2016. I don't know you, but I've read many horror stories about things going terribly wrong after leap seconds and sysadmins in despair being paged at night. Well, today I am going to share one of those stories with you and I hope it will be terrifying.

Horror story

Like diligent sysadmins, we monitor the ntpd services on our servers (OpenNTPD in our case) and we will be alerted if a noticeable clock offset happens. Of course, in the event of a leap second,  all the servers should trigger an alert and the corresponding recovery. The leap second was inserted as 23:59:60 on December 31 and the servers slowly chewed the difference in around 3 hours.

But... Here comes the horror story. Some of the servers didn't recover at all. The graphs showed that the offset was still around -900 ms ( an extra second was introduced, therefore we were one second behind). At the end we had to restart openntpd as a quick remediation.

Below you can find the status of one of the servers, for reference.

# ntpctl -s all

4/4 peers valid, clock synced, stratum 3

   wt tl st  next  poll          offset       delay      jitter from pool
    1 10  2 1474s 1502s      -984.909ms     6.266ms     0.175ms from pool
 *  1 10  2  733s 1640s      -984.824ms     1.105ms     0.126ms from pool
    1 10  3  888s 1509s      -984.824ms     6.380ms     0.138ms from pool
    1 10  2 3087s 3098s       105.306ms     6.295ms     0.130ms

You may notice that one of the peers has a positive offset and it doesn't make any sense because an extra second was introduced as already explained above. I hope you can smell the stink at this moment because it is quite strong.

Well, digging in the logs I also found the following line:

ntpd[1438]: reply from not synced (alarm), next query 3228s

Yes,  openntpd was unhappy with that peer and decided to stop the time synchronisation until the issue is solved. Notice that this is really bad situation because we don't control that peer at all. The only option is to restart openntpd because we configured a Round Robin DNS record.

I decided to do a bit of research and I went to the openntpd's github repo to read the source code. Particuarly, src/usr.sbin/ntpd/client.c . Here,  the NTP packet's status is evaluated against a bit mask to analyse the LI bits (Leap Indicator)

(msg.status & LI_ALARM) == LI_ALARM || msg.stratum == 0 ||
msg.stratum > NTP_MAXSTRATUM)

The name LI_ALARM is self explanatory. This bitmask evaluates to true when both bits in the Leap Indicator are set to 1. From the RFC:

LI Leap Indicator (leap): 2-bit integer warning of an impending leap second to be inserted or deleted in the last minute of the current  month with values defined in Figure 9.

0     no warning
1     last minute of the day has 61 seconds
2     last minute of the day has 59 seconds
3     unknown (clock unsynchronized)

At this point, I can claim that the peer was totally broken because it ran for hours (and it may be still broken at this point) with the clock unsynchronized and it hit us in a chain reaction. Well, one may expect a minimum quality but these are the risks that you must accept if you use services ran by others (unless you sign on paper a Service Level Agreement).

To understand how risky it can be, we can look at the page that describes how to join an ntp pool. Only an static IP address and a minimum bandwidth is required.  This and a couple of recommendations. Hence, the many hobbyists running their own time servers. is running a monitoring system, that can be queried online. The servers with a score lower than 10 will be automatically removed from the pool (mine had -100) and  this is a good measure but good luck if they are already active in your ntpd service. They will cause you trouble until you manually restart the service.

Lessons learned

  • Actively monitor the ntp service.
    • Monitor the general status: un/synchronized, stratum, num valid peers, etc
    • Monitor the offset. I do an average of all peers and then apply abs().
  • Plan carefully and search for a reliable ntp source.
    • Does your datacenter offer this service? Can you have an SLA?
    • Avoid country/region pool at because they may be run by hobbyists and will cost you pain, even if recommends you do to so. Perhaps running the ntp servers provided by your OS vendor is safer.
    • Perhaps buy a DCF77 receiver to make your own Stratum 1 server but you may need an external antenna if the datacenter walls are too thick.

Friday, October 21, 2016

SSH public key authentication with security tokens

I've been using a Yubikey for two factor authentication with HOTP for a long time but this crypto hardware has many more functionalities, like storing certificates (RSA and ECC keys).

The use I will describe below allows us to do SSH public key authentication while keeping the private key stored in the device at all times. This gives an extra layer of security, because the key cannot be extracted and the device will be locked if the PIN is bruteforced.

Formally speaking,  many of these crypto keys (commonly in the form of a USB device emulating a card reader) support the Personal Identity Verification (PIV) card interface, that allows ECC/RSA sign/decryption operations with the private key stored in the device ( Read the NIST SP 800-78 document for more information). This hardware interface together with the API PKSC11 will allow programs like ssh to perform cryptographic operations with the certificates stored in the device.

One weak point in this scenario is the vendor trust, particularly when it comes to the random number generator implemented in the hardware, that can potentially create weak and easy to bruteforce certificates, but this can be minimized if we use the normal ssh tools to generate the ssh keys and then we import them into the device. In my case, I have followed this path.

Another downside is that NIST SP 800-78 only defines RSA keys up to 2048 bits. You must take this into consideration because the chip may support bigger keys (e.g. for OpenPGP cards) but the PIV interface is up to 2048 unless NIST updates the standard.

Finally, either PKCS11 or OpenSC (I don't quite remember), do not support ECC keys. You are out of luck in that case.


  • A crypto device that is NIST SP 800-78 compliant, a Yubikey 4 in my case.
  • An RSA key pair created with ssh-keygen(1).
  • Install OpenSC in your computer to have the PKCS11 library support and management tools. There are installers available for almost any platform: Windows, OSX, Linux, BSD,etc.


  • Convert the RSA private key into pem format.
openssl rsa -in ./id_rsa -out id_rsa.pem
  • Load the private key into the slot 9a in the device. It will ask for the PIN, that you may have changed (look for 'change-pin' and 'change-puk'  in this document). Notice that I've setup the 'pin-policy' to once and the 'touch-policy' to never, effectively asking the PIN only once when I load the key in the ssh-agent, but you can change the behaviour that fits you best (e.g. force a touch every time you want to login via ssh). 
yubico-piv-tool -a import-key -s 9a --pin-policy=once --touch-policy=never  -i id_rsa.pem
  • Transform the public key to a format that is understood by the device
ssh-keygen -e -f ./ -m PKCS8 >
  • Use the public and private keys (the last one in the device) to generate an SSL selfsigned certificate, to be imported later in the device, with a 10 years expiration date (just in case). It will ask for your PIN again.
 yubico-piv-tool -a verify -a selfsign-certificate --valid-days 3650  -s 9a -S "/CN=myname/O=ssh/" -i -o 9a-cert.pem
  • Import the generated certificate.
yubico-piv-tool -a verify -a import-certificate -s 9a -i 9a-cert.pem

Using the device together with OpenSSH

In case you don't have the public key (this step is not needed because I generated the key in my PC), you can extract it with the ssh-keygen. You have to search for the pkcs11 shared library, that is /Library/OpenSC/lib/pkcs11/ in case of OSX.

ssh-keygen -D /Library/OpenSC/lib/pkcs11/
ssh-rsa AAAAB....e1

Then you can tell ssh to interact with the device by pointing to this library instead of using a private key stored in your disk, but it is not very convenient because it will always ask for your pin.

ssh -I /Library/OpenSC/lib/pkcs11/ myserver
Enter PIN for 'PIV_II (PIV Card Holder pin)':

Loading the key in your ssh-agent is more convenient because it will only ask for the PIN once (following the pin-policy=once) and you can be sure nobody will try to abuse it because the device must be present at all times. Remember that the private key never leaves the device.

bash-3.2$ ssh-add -s /Library/OpenSC/lib/pkcs11/
Enter passphrase for PKCS#11:
Card added: /Library/OpenSC/lib/pkcs11/ 
bash-3.2$ ssh-add -l
2048 SHA256:random_hash_value /Library/OpenSC/lib/pkcs11/ (RSA)

bash-3.2$ ssh-add -e /Library/OpenSC/lib/pkcs11/
Card removed: /Library/OpenSC/lib/pkcs11/ 
bash-3.2$ ssh-add -l
The agent has no identities. 

Thursday, August 25, 2016

Building a DNS sinkhole in FreeBSD with Unbound and Dnscrypt


There is already lots of literature regarding DNS sinkholes and it is a common term in Information Security. In my case, I wanted to give it a try on FreeBSD 10 but I didn't want to make use of Bind since it was removed from the base distribution in favor of Unbound.

The setup will have the following steps:

  • Create a jail where the service will be configured (not explained because there is lots of examples in Internet)
  • Install Unbound
  • Basic Unbound configuration
  • Configure Unbound to block DNS queries
  • Choosing block lists available in Internet
  • Updating the block lists
  • Bonus: use dnscrypt to avoid DNS spoofing
  • Final Unbound configuration file

Configuring our DNS sinkhole

Installing Unbound

I ran my test in FreeBSD 10.1. Sadly, it ships  Unbound v. 1.4.x, that is quite old and lacks some nice features. In the end, I had to install dns/unbound form the ports, that currently installs v.1.5.9.

If you are using a most recent FreeBSD distribution (e.g. FreeBSD 10.3), you will not need to install the port.

The only difference is that you will need to use local_unbound_enable="YES" in /etc/rc.conf instead of unbound_enable="YES" and the configuration file will be located in /etc/unbound/unbound.conf instead of /usr/local/etc/unbound/unbound.conf.

Basic Unbound configuration

First, we have to download the root-hints, to allow our dns cache to find the right master DNS servers.

# fetch -o /usr/local/etc/unbound/root.hints

Then, we edit the unbound.conf.


        #who can use our DNS cache
        access-control: allow

        logfile: "/usr/local/etc/unbound/logs/unbound.log"
        username: unbound
        directory: /usr/local/etc/unbound
        chroot: /usr/local/etc/unbound
        pidfile: /usr/local/etc/unbound/
        verbosity: 1
        root-hints: /usr/local/etc/unbound/root.hints

#remote-control allows us to use the unbound-control
#utility to manage the service from the command line
        control-enable: yes
        control-interface: /usr/local/etc/unbound/local_unbound.ctl
        control-use-cert: no

Please, notice, that all files are located in /usr/local/etc/unbound/. If you are not using the version provided by de ports tree, the base directory will be /var/unbound/ instead.

The last step is to enable and to start the service

# sysrc unbound_enable="YES
# service unbound start

With this setup, we have a basic DNS cache configurated in our network. Now, you should be able to query the DNS server listening on

# host
Using domain server:
Aliases: has address has address has address has address has address has address has IPv6 address 2404:6800:4003:c02::6

Configure Unbound to block DNS queries

The classic trick in DNS sinkholes is to define authoritative zones in the DNS cache, that will return a defined static IP address (e.g. to identify in the logs (or in network devices) when somebody is trying to connect to a blocked domain.

In Unbound, it is a bit more difficult because it is only a basic DNS cache service and lacks some features, but there are some ways around.

unbound.conf(5) has the local-zone directive and it is used to define local DNS zones but we will "abuse it", by dropping all the queries to these domains. For instance, if we want to drop all the DNS queries asking for (and subdomains) we need to add the directive:

local-zone: "" inform_deny.

This will silently drop the DNS query and it will write an entry in the log file ("/usr/local/etc/unbound/logs/unbound.log" in our case). The client will see show the query times out.

[1472139065] unbound[28162:0] info: inform A IN
[1472139071] unbound[28162:0] info: inform A IN

To keep a tidy configuration, we will not add this big list of local-zone directives in the main configuration file but we will include a file thanks to the the include directive, that is located in the server section.

        include: /usr/local/etc/unbound/

Choosing block lists available in Internet

I am using the following URLs that should be considered safe, with around 23 thousand domains listed.

Updating the block lists

I've written a small shell script that downloads all the lists every night and reloads the Unbound configuration.

Please, notice that reloading Unbound will also flush the DNS cache. A good way to do it is:

Dump the cache
unbound-control dump_cache > $cache_file

Then, download the files with fetch(1) and regenerate /usr/local/etc/unbound/

Reload the configuration
unbound-control reload

Load the cache dump back in Unbound
unbound-control load_cache < $cache_file

Bonus: use dnscrypt to avoid DNS spoofing

Dnscrypt can be used to avoid some common DNS attacks by encrypting and signing the DNS queries. All traffic will go encrypted using the port 443, both TCP and UDP.

Of course, other issues remain, like DNS spoofing at the server end and the possible logging.

The client is available in the port tree under dns/dnscrypt-proxy and it is really easy to configure. We only need two parameters: the ip:port where we want to listen and which server we want to connect to (aka Resolver)

# sysrc  dnscrypt_proxy_enable="YES"
# sysrc dnscrypt_proxy_flags="-a"
# sysrc  dnscrypt_proxy_resolver=""
# service dnscrypt_proxy start

The final step will be configuring Unbound to forward all the DNS queries to dnscrypt. This can be done in the forward-zone section.

  name: "."

Final Unbound configuration file

        access-control: allow

        logfile: "/usr/local/etc/unbound/logs/unbound.log"

        username: unbound
        directory: /usr/local/etc/unbound
        chroot: /usr/local/etc/unbound
        pidfile: /usr/local/etc/unbound/
        verbosity: 1

        root-hints: /usr/local/etc/unbound/root.hints
        include: /usr/local/etc/unbound/

       control-enable: yes
       control-interface: /usr/local/etc/unbound/local_unbound.ctl
       control-use-cert: no

  name: "."

Thursday, August 18, 2016

Unix shells and the lack of basic understanding

It is not news if I say that some major firewall vendors have been owned for years and the exploits have been leaked to the Internet this week. Just search for "Equation Group" and you will be amused.

What pisses me off is one particular privilege escalation for Fortinet Watchguard, that seems to be like wizardry for many people. I have even read comments in my Twitter timeline that made no sense at all and were so wrong that I wanted to rip my eyes.

The vulnerability seems to be located in the restricted CLI, if I am not mistaken, because it doesn't validate the user input when ifconfig is executed.

# ifconfig "$(bash -c 'touch /tmp/test; echo -n eth0')" 
Here comes the thing: many people are claiming that ifconfig is broken because it is executing bash commands. I have even read comments saying that echo is also broken.

All these people should quit their jobs and go back to the university to learn how a Unix shell works.

It is a very basic concept. Any thing under double-quotes is interpreted by the shell before the command  (ifconfig in our example) is executed.  Read the Bash Reference Manual for more information.

My guess, because I don't have the source code, is that the CLI was executing the full command via the system() function or similar, without validating the user input. In practice, system() will invoke sh(1) that then will interpret the code enclosed in double-quotes.

  • # ifconfig "$(bash -c 'touch /tmp/test; echo -n eth0')"
    • system()
      • sh
        • bash
          • touch /tmp/test
          • echo -n eth0
TL;DR we have seen the same mistake happen again and again since the 90's, when the script kiddies were abusing badly written CGI scripts. Some people will never learn.

Edit: I originally claimed that the privilege escalation exploit is for Fortinet but it actually targets Watchguards firewalls. See here.

Monday, March 14, 2016

Backing up ZFS zpools

In this post, I am going to describe the process I followed to configure an offline backup of my home NAS server,  to keep the data in a safe place just in case it breaks. This server is running FreeBSD 10.1 and it has a single zpool.

The procedure is quite simple and consists of three steps.

  1. Create another zpool somewhere else where we will synchronise all the data. A USB disk in this example. "raid" is the source zpool and "usbbackups" will be the destination one.
  2. Create a recursive snapshot. This will create a snapshot in all the volumes
  3. Use zfs send and zfs receive to copy all the data over. In our case, we well execute it periodically via a cron job.

Creating the backup zpool

First, we destroy the partition table and we create a gpt one. da1 is our USB disk
# gpart destroy da1
# gpart create -s gpt da1

Then, we create the zfs file system with 4k sectors.
# gpart add -a 4k -t freebsd-zfs -l backups da1

From now on, we can use our USB disk by pointing to /dev/gpt/backups.

The last step is to create the usbbackups zpool. We use an alternative mount point (/mnt) because after the synchronisation the backed up ZFS volumes may try to use the same mount point the production volumes are using and this may mess up our NAS server.

# zpool create -R /mnt  usbbackups  /dev/gpt/backups

Creating the snapshots

I am using the port sysutils/zfsnap to create daily/hourly snapshots of my ZFS volumes so this part is solved quite easily.

I have setup two cron tasks to create and delete the snapshots respectively. We will do a recursive snapshot off all the volumes at midnight and it will be deleted after 4 weeks.

10      0       *       *       *       root    /usr/local/sbin/zfSnap -a 4w -r raid
0       1       *       *       *       root    /usr/local/sbin/zfSnap -d

The snapshots are named using the date plus the expiration time  (used by the script to figure out when the snapshot must be deleted).

As an example:

Synchronizing the data to the backup zpool

We are going to periodically synchronize all our data, snaphsots included, to our backup zpool. We will take advantage of the existing cron that is creating daily snapshots to use them as the synchronization point. 

First, we have to do the initial synchronization. All data will be copied and it will need quite a bit of time to finish depending on the zpool size, the storage, etc.

We will use the newest snapshot in the system and the destination point is our USB disk. In case the destination system is a network device you can pipe the data to ssh (Internet is full of examples).

# zfs send -R raid@2016-03-13_11.05.10--4w | zfs receive -Fdvu usbbackups

The newest snapshot can be pulled with the command below. It removes the headers from the zfs command, only displays the snapshot name and finally sorts by snapshot name (the newest will be the first one).

# zfs list -H -t snapshot -d 1 -o name -S name raid | egrep "\-\-4w"| head -n 1

For the periodic synchronizations  we will do a differential copy. This can be achieved by using two snapshots that exist in the source zpool, the older one also existing in the destination zpool.

I have put together the script below to automate the differential backup once a week, via a cron executions. Notice that the -F flag in the zfs receive will delete all the snapshots in the destination zpool that do not exist in the source one.  You can remove this flag and use other means to clean the old snapshots, if you want to have a different expiration policy in the destination zpool.


# we import/export the destination zpool because we
# have a weak CPU and the USB disk slows the server down.
zpool import -R /mnt usbbackups

LATEST_RAID=`zfs list -t snapshot -H -o name -d1 -S name raid | egrep "\-\-4w" | head -n 1`
LATEST_USB=`zfs list -t snapshot -H -o name -d1 -S name usbbackups| egrep "\-\-4w" | head -n 1`

#a bit of bash fu to replace the  zpool name in the variable.
zfs send -R -i "${LATEST_USB/usbbackups/raid}" $LATEST_RAID | zfs receive -Fdvu usbbackups

# disable the USB zpool
zpool export usbbackups

To finish the task, we will run the script every Monday at 4am
0       4       *       *       0       root    /root/

Wednesday, January 27, 2016

Running FreeBSD in single user mode with zfs on root

Today I bricked a server (long story, etc ...) and I had to fix it in single user mode. The process is very simple.

  • Use the boot loader to enter in single user mode
  • Press Intro to run /bin/sh
  • mount all the zfs volumes: zfs mount -a
  • Mount the ROOT volume read-write: zfs set readonly=off zroot/ROOT/default
  • Fix what you broke
  • Reboot the server