I am using runwhen together with daemontools to launch and monitor the backup. The run script used by the svc service executes runwhen commands to sleep until the next run (every hour) and then launch the backup script. The service is running in a dedicated jail.
The run script listed below uses some runwhen commands (rw-add,rw-matchand rw-sleep) to wake-up every hour and setuidgid to run the service with an unprivileged user.
#!/bin/sh exec 2>&1 exec setuidgid gitbackup \ rw-add n d1S now1s \ rw-match \$now1s ,M=00 wake \ rw-sleep \$wake \ /home/gitbackup/update.sh
The actual backup script that iterates over all the git repos and fetches the changes.
#!/bin/sh exec 2>&1 cd /usr/home/gitbackup/backup echo "====" date echo "====" for repo in `ls -d1 *.git`; do cd $repo && /usr/local/bin/git fetch --all cd - done echo "===="
checking the output log
$ cat /var/service/backups/log/main/current | tai64nlocal 2018-02-05 18:00:00.098641500 ==== 2018-02-05 18:00:00.150083500 Mon Feb 5 18:00:00 CET 2018 2018-02-05 18:00:00.180056500 ==== 2018-02-05 18:00:00.211689500 Fetching origin 2018-02-05 18:00:01.073738500 From https://github.com/xgarcias/ansible-cmdb-freebsd-template 2018-02-05 18:00:01.073743500 * branch HEAD -> FETCH_HEAD 2018-02-05 18:00:01.091577500 Fetching origin 2018-02-05 18:00:02.185366500 From https://github.com/xgarcias/ansible-daemontools 2018-02-05 18:00:02.185371500 * branch HEAD -> FETCH_HEAD 2018-02-05 18:00:02.203049500 Fetching origin 2018-02-05 18:00:04.180310500 From https://github.com/xgarcias/ansible-macbook 2018-02-05 18:00:04.180315500 * branch HEAD -> FETCH_HEAD 2018-02-05 18:00:04.198104500 Fetching origin 2018-02-05 18:00:06.448429500 From https://github.com/xgarcias/daemontools-dyndns 2018-02-05 18:00:06.448434500 * branch HEAD -> FETCH_HEAD 2018-02-05 18:00:06.466266500 Fetching origin 2018-02-05 18:00:08.299785500 From https://github.com/xgarcias/daemontools-poudriere 2018-02-05 18:00:08.299790500 * branch HEAD -> FETCH_HEAD 2018-02-05 18:00:08.321755500 Fetching origin 2018-02-05 18:00:09.749956500 From https://github.com/xgarcias/daemontools-unbound-sinkhole 2018-02-05 18:00:09.749961500 * branch HEAD -> FETCH_HEAD 2018-02-05 18:00:09.771744500 Fetching origin 2018-02-05 18:00:11.113934500 From https://github.com/xgarcias/elasticsearch-plugin-readonlyrest 2018-02-05 18:00:11.113939500 * branch HEAD -> FETCH_HEAD 2018-02-05 18:00:11.135774500 Fetching origin 2018-02-05 18:00:12.703191500 From https://github.com/xgarcias/freebsd_local_ports 2018-02-05 18:00:12.703197500 * branch HEAD -> FETCH_HEAD 2018-02-05 18:00:12.724967500 Fetching origin 2018-02-05 18:00:13.583204500 From https://github.com/xgarcias/xgarcias.github.io 2018-02-05 18:00:13.583209500 * branch HEAD -> FETCH_HEAD 2018-02-05 18:00:13.601461500 ====
Querying ANS/IP records via non rate-limited unauthenticated REST API.
Also, you can use @DuckDuckGo to get the same results with the !Arin and !Ripe bang searches.
You can also use !Arin and !Ripe bang searches on @DuckDuckGo to quickly lookup IP information— Greg Bray (@GBrayUT) January 27, 2018
At work we’ve been using Bhyve for a while to run non-critical systems. It is a really nice and stable hypervisor even though we are using an earlier version available on FreeBSD 10.3. This means we lack Windows and VNC support among other things, but it is not a big deal.
After some iterations in our internal tools, we realised that the installation process was too slow and we always repeated the same steps. Of course, any good sysadmin will scream “AUTOMATION!” and so did we. Therefore, we started looking for different ways to improve our deployments.
We had a look at existing frameworks that manage Bhyve, but none of them had a feature that we find really important: having a centralized repository of VM images. For instance, SmartOS applies this method successfully by having a backend server that stores a catalog of VMs and Zones, meaning that new instances can be deployed in a minute at most. This is a game changer if you are really busy in your day-to-day operations.
Since we are not great programmers, we decided to leverage the existing tools to achieve the same results. This is, having a centralised repository of Bhyve images in our data centers. The following building blocks are used:
- The ZFS snapshot of an existing VM. This will be our VM template.
- A modified version of oneoff-pkg-create to package the ZFS snapshots.
- pkg-ssh and pkg-repo to host a local FreeBSD repo in a FreeBSD jail.
- libvirt to manage our Bhyve VMs.
- The ansible modules virt, virt_net and virt_pool.
- We write a yml dictionary to define the parameters needed to create a new VM:
- VM template (name of the pkg that will be installed in /bhyve/images)
- VM name, cpu, memory, domain template, serial console, etc.
- This dictionary will be kept in the corresponding host_vars definition that configures our Bhyve host server.
- The Ansible playbook:
- installs the package named after the VM template (ZFS snapshot).e.g. pkg install FreeBSD-10.3-RELEASE-ZFS-20G-20170515.
- uses cat and zfs receive to load the ZFS snapshot in a new volume.
- calls the libvirt modules to automatically configure and boot the VM.
- The Sysadmin logs in the new VM and adjusts the hostname and network settings.
- Run a separate Ansible playbook to configure the new VM as usual.
Once automated, the installation process needs 2 minutes at most, compared with the 30 minutes needed to manually install VM plus allowing us to deploy many guests in parallel.
- Sample config for FreeBSD https://people.freebsd.org/~rodrigc/libvirt-bhyve/libvirt-bhyve.html
- bhyve driver for libvirt http://libvirt.org/drvbhyve.html
- virsh examples https://wiki.libvirt.org/page/VM_lifecycle#Creating_a_domain
- migrating VMs w/o shared storage https://hgj.hu/live-migrating-a-virtual-machine-with-libvirt-without-a-shared-storage/
- xml reference http://libvirt.org/formatdomain.html
- Virtual networking https://wiki.libvirt.org/page/VirtualNetworking
In case you are not informed, there was a leap second on December 31, 2016. I don’t know you, but I’ve read many horror stories about things going terribly wrong after leap seconds and sysadmins in despair being paged at night. Well, today I am going to share one of those stories with you and I hope it will be terrifying.
Like diligent sysadmins, we monitor the ntpd services on our servers (OpenNTPD in our case) and we will be alerted if a noticeable clock offset happens. Of course, in the event of a leap second, all the servers should trigger an alert and the corresponding recovery. The leap second was inserted as 23:59:60 on December 31 and the servers slowly chewed the difference in around 3 hours.
But… Here comes the horror story. Some of the servers didn’t recover at all. The graphs showed that the offset was still around -900 ms ( an extra second was introduced, therefore we were one second behind). At the end we had to restart openntpd as a quick remediation.
Below you can find the status of one of the servers, for reference.
$ ntpctl -s all 4/4 peers valid, clock synced, stratum 3 peer wt tl st next poll offset delay jitter 184.108.40.206 from pool de.pool.ntp.org 1 10 2 1474s 1502s -984.909ms 6.266ms 0.175ms 220.127.116.11 from pool de.pool.ntp.org * 1 10 2 733s 1640s -984.824ms 1.105ms 0.126ms 18.104.22.168 from pool de.pool.ntp.org 1 10 3 888s 1509s -984.824ms 6.380ms 0.138ms 22.214.171.124 from pool de.pool.ntp.org 1 10 2 3087s 3098s 105.306ms 6.295ms 0.130ms
You may notice that one of the peers has a positive offset and it doesn’t make any sense because an extra second was introduced as already explained above. I hope you can smell the stink at this moment because it is quite strong.
Well, digging in the logs I also found the following line:
ntpd: reply from 126.96.36.199: not synced (alarm), next query 3228s
Yes, openntpd was unhappy with that peer and decided to stop the time synchronisation until the issue is solved. Notice that this is really bad situation because we don’t control that peer at all. The only option is to restart openntpd because we configured a Round Robin DNS record.
I decided to do a bit of research and I went to the openntpd’s github repo to read the source code. Particuarly, src/usr.sbin/ntpd/client.c . Here, the NTP packet’s status is evaluated against a bit mask to analyse the LI bits (Leap Indicator)
(msg.status & LI_ALARM) == LI_ALARM || msg.stratum == 0 || msg.stratum > NTP_MAXSTRATUM)
The name LI_ALARM is self explanatory. This bitmask evaluates to true when both bits in the Leap Indicator are set to 1. From the RFC:
LI Leap Indicator (leap): 2-bit integer warning of an impending leap second to be inserted or deleted in the last minute of the current month with values defined in Figure 9. 0 no warning 1 last minute of the day has 61 seconds 2 last minute of the day has 59 seconds 3 unknown (clock unsynchronized)
At this point, I can claim that the peer was totally broken because it ran for hours (and it may be still broken at this point) with the clock unsynchronized and it hit us in a chain reaction. Well, one may expect a minimum quality but these are the risks that you must accept if you use services ran by others (unless you sign on paper a Service Level Agreement).
To understand how risky it can be, we can look at the page that describes how to join an ntp pool. Only an static IP address and a minimum bandwidth is required. This and a couple of recommendations. Hence, the many hobbyists running their own time servers.
ntp.org is running a monitoring system, that can be queried online. The servers with a score lower than 10 will be automatically removed from the pool (mine had -100) and this is a good measure but good luck if they are already active in your ntpd service. They will cause you trouble until you manually restart the service.
- Actively monitor the ntp service.
- Monitor the general status: un/synchronized, stratum, num valid peers, etc
- Monitor the offset. I do an average of all peers and then apply abs().
- Plan carefully and search for a reliable ntp source.
- Does your datacenter offer this service? Can you have an SLA?
- Avoid country/region pool at pool.ntp.org because they may be run by hobbyists and will cost you pain, even if ntp.org recommends you do to so. Perhaps running the ntp servers provided by your OS vendor is safer.
- Perhaps buy a DCF77 receiver to make your own Stratum 1 server but you may need an external antenna if the datacenter walls are too thick.