Using virsh to add a volume (disk) to an existing vm

Categories: English Geeky

I have a second hard disk mounted under /thepool, and I want to make “virtual disks” in there and be able to mount them on any of my virsh-defined virtual machines.

The root filesystem of the VM resides on a different storage pool (uvtool) on a fast SSD, but for bulk storage I don’t want to fill out the SSD with crap.

# Create the storage pool, under "/thepool" where the big disk is mounted
virsh pool-define-as disk-pool dir - - - - "/thepool/libvirt-pool/"
virsh pool-build disk-pool
virsh pool-autostart disk-pool
virsh pool-start disk-pool
# CHeck it's there
virsh pool-list
# Create a volume inside the pool, qcow2 format
virsh vol-create-as disk-pool juju-zfs-pool.qcow2 64G --format qcow2
# Attach it to the VM
virsh attach-disk juju /thepool/libvirt-pool/juju-zfs-pool.qcow2  vdc --persistent  --subdriver=qcow2
# Now inside the vm, /dev/vdc exists and can be formatted/partitioned and mounted as normal

Converting only one stream in a mkv file.

Categories: English Geeky

These mkv files have h.265 hevc video which my media player can’t read, so I’d like to convert only the video stream to h.264, while leaving all other streams (2 audio tracks in aac, 2 subtitle tracks) intact.

ffmpeg -i some-x265-video.mkv -map 0 -c:v libx264 -c:a copy /tmp/x264-version.mkv

Cross-timezone date calculations using the “date”command

Categories: English Geeky

Working remotely for a timezone-distributed company poses an interesting challenge: that of having to figure out dates and times for people in different timezones. This involves not only the relatively trivial “what time is it now in A_FARAWAY_PLACE”, but “what time, in FARAWAY_PLACE_X, will it be in FARAWAY_PLACE_Z” and other fun things.

There are a handful of websites that have handy tools to do these conversions for you; but a problem I’ve found is that the web is going to the crapper, and these sites often have confusing UIs concocted by some javascript-crazed, CSS-infected webmonkey; and often they are completely swamped and rendered unusable by a rising tide of ads and other aggressive content (oh and some won’t let you do anything until you agree to them storing information in cookies in your browser – which they then bafflingly don’t use to store the PREFERENCE you have selected , so like a forgetful vampire, they ask you every single time if you want to accept their silly cookies).

I’ve known how to use the date command to show the date on a different place/timezone, which is already a huge timesaver:

$ TZ="Taiwan/Taipei" date
Fri Apr 12 19:25:31 Taiwan 2019

but – today I was trying to answer “what time in TZ=”America/Chicago” is 1 PM, on Tuesday, in “UK/London“. This is interesting because it’s conversion between two timezones which are not the one I’m in, of a date/time in the future. So I was checking date’s man page for “how to convert a specific point in time”, when I realized date can do this for you! Right in the man page there’s this example:

Show the local time for 9AM next Friday on the west coast of the US

$ date --date='TZ="America/Los_Angeles" 09:00 next Fri'

so then I combined that with the earlier one to come up with:

$ TZ="America/Chicago" date --date='TZ="UK/London" 1:00 PM next Tue'
Tue Apr 16 08:00:00 CDT 2019

This combines:

  • TZ argument to calculate dates for a specific timezone, not the current one
  • --date parameter to “display time described by STRING, not ‘now’”
  • Descriptive time specifications (1:00 PM next Tuesday – this is a pseudo-human-readable format which is not entirely intuitive – info date has the specifics)
  • TZ support inside the descriptive specification

And a list of known timezones can be obtained with timedatectl list-timezones.

SAML development tools

Categories: English Geeky

If you work on a system that needs to authenticate against an external identity provider (IdP), SAML is almost certainly a fact of life. Working on an actual Identity Provider, sometimes the concern is flipped and you need to ensure Service Providers (SPs) can authenticate against your IdP.

I inherited the somewhat clunky django-saml2-idp along with other developers in my team, and we’ve been maintaining it to add new features. If we were doing this today, we’d probably integrate the very complete OneLogin SAML library instead.

Developing with and for our somewhat homegrown SAML library is made easier by a set of developer tools. For example, OneLogin provides a toolbox to slice and dice SAML assertions; you can verify your assertions, extract attributes, see some examples, play with zipping and encoding, all in one place.

Once you have your IdP mostly working, it’s great to have a test SP to connect to it. For this, I’ve used the RSA SAML test Service Provider. You give some details about your IdP, and it will give you a URL that forwards you to the IdP for authentication, then back to the SP, which verifies authentication worked as expected and even shows you the attribute and auth payload received from the SP.

Once you get things mostly working but need to fine-tune or tweak something (I can never tell between issuer, ACS_URL and audience), the Firefox SAML Tracer extension is absolutely essential. It shows you all requests and responses, which ones contain SAML payloads, and lets you see the actual, decoded and formatted XML payload which makes it a breeze to troubleshoot.

There is an equivalent SAML tracer extension for Chrome but 1) Chrome is crap and 2) the Chrome SAML extension is crap. Use Firefox instead.

Remote display of a KVM virtual machine

Categories: English Geeky

In this case I’m hosting the VM on a fast server and trying to access the display on another system (a laptop).

One way to do it is by simply SSHing with X forwarding and running KVM like so:

qemu-system-x86_64 -boot d -cdrom ubuntu-18.04.2-live-server-amd64.iso -m 8192 -enable-kvm

This by default uses a terminal window, but it’s quite slow.

Another option is to start the KVM machine in nographic mode and enable a VNC server:

qemu-system-x86_64 -nographic -vnc :5 -boot d -cdrom ubuntu-14.04.6-desktop-amd64.iso -m 8192 -enable-kvm

then on the desktop system use a vnc client to connect to the magic port:

xtightvncviewer thehost.local:5905

KVM bridged to the LAN with DHCP

Categories: English Geeky Uncategorized

The goal here is to instantiate VMs with a br0 interface grabbing an IP from the LAN DHCP, so in turn the VM can instantiate LXD containers whose IP is also exposed to the LAN. That way everything is visible on the same network segment and this makes some experimentation easier.

Host configuration

Some info taken from this URL.

The metal host is running Ubuntu 18.04, which uses netplan. Here’s the netplan.yaml file:

network:
    ethernets:
        enp7s0:
            addresses: []
            dhcp4: no
            dhcp6: no
            optional: true
    bridges:
        br0:
            dhcp4: true
            dhcp6: no
            interfaces:
                - enp7s0
            parameters:
                stp: false
                forward-delay: 0
    version: 2

With this, on boot the system grabs an address from the network’s DHCP service (from my home router) and puts it on the br0 interface (which bridges enp7s0, a Gigabit Ethernet port).

The system also has avahi-daemon installed so I can ssh the-server.local easily.

VM configuration

Next, the VM which I created using uvt-kvm:

# Get a Xenial cloud image
uvt-simplestreams-libvirt --verbose sync release=xenial arch=amd64
# Create/launch a VM
PARAMS='--memory 8192 --disk 32 --cpu 4'
uvt-kvm create the-vm  $PARAMS --bridge br0 --packages avahi-daemon,bridge-utils,haveged --run-script-once setup_network.sh

The setup_network.sh script takes care of setting up the network 🙂 This can more cleanly be done with cloud-init but I’m lazy and wanted something fast.

The script deletes the cloudconfig-created .cfg file, tells cloud-init to NOT reconfigure the network, and drops the config file I actually need in place.

#!/bin/bash

echo "Acquire::http::Proxy \"http://192.168.1.187:3128\"; " >/etc/apt/apt.conf.d/80proxy

# Drop the cloudinit-configured interface
ifdown ens3

# Reconfigure the network...
cat <<EOF >/etc/network/interfaces.d/1-bridge.cfg
auto lo br0

iface lo inet loopback

iface ens3 inet manual

iface br0 inet dhcp
    bridge_ports ens3
    bridge_stp off       # disable Spanning Tree Protocol
    bridge_waitport 0    # no delay before a port becomes available
    bridge_fd 0          # no forwarding delay
EOF

echo "network: {config: disabled}" > /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg
rm /etc/network/interfaces.d/50-cloud-init.cfg

# Then bring up the new nice bridge
ifup br0

apt-get remove -y snapd && apt-get -y autoremove

The network config in /etc/network/interfaces.d/1-bridge.cfg should look like:

auto lo br0

iface lo inet loopback

iface ens3 inet manual

iface br0 inet dhcp
    bridge_ports ens3
    bridge_stp off       # disable Spanning Tree Protocol
    bridge_waitport 0    # no delay before a port becomes available
    bridge_fd 0          # no forwarding delay

LXD configuration

Finally,  install lxd. When asked to configure the lxd bridge, respond “no”, and on the next question you’ll be asked whether to supply an existing bridge. Respond “yes” and specify “br0”.

Now, when an lxd container is instantiated, it’ll by default appear on the same network (the home network!) as the VM and the main host, getting its DHCP from the home router.

When things break

Suddenly the bridge interface stopped working. I checked this to help diagnose it. But that wasn’t it. Turns out, I’d installed Docker on the main host and Docker messes with the firewall configuration by setting iptables -P FORWARD DROP. I just set it back to ACCEPT to get it working.

Bisecting Python unit test errors to find test interdependencies

Categories: English Geeky

Many of our test runs use parallelization to run faster. Sometimes we see test failures which we can’t reproduce locally, because locally we usually run sequentially; and even then, the test ordering seems to be somewhat unpredictable so it’s hard to reproduce the exact test ordering seen in our test runner.

Most of the time these failures are due to unidentified test interdependencies: either test A causes test B to pass (where running test B in isolation would fail), or test A causes B to fail (where running B in isolation would pass). And we have seen more complex scenarios where C passes, A-B-C passes, but A-C fails (because A sets C up for failure, while B would set C up for success). We added some diagnostic output to our test runner so it would show exactly the list of tests each process runs. This way we can copy the list and run it locally, which usually reproduces the failure.

But we needed a tool to then determine exactly which of the tests preceding the failing one was setting up the failure conditions. So I wrote this simple bisecter script, which expects a list of test names, which must contain the faily test “A”, and of course, the name of the faily test “A”. It looks for “A” in the list and will use bisection to determine which of the tests preceding “A” is causing the failure.

As an example, I used it to find a test failure in Ubuntu SSO:

python bisecter.py  test-orders/loadbad1.txt webui.tests.test_decorators.SSOLoginRequiredTestCase.test_account_must_require_two_factor
273 elements in the list, about 8 iterations left
Test causing failure is in second half of given list
137 elements in the list, about 7 iterations left
Test causing failure is in second half of given list
69 elements in the list, about 6 iterations left
Test causing failure is in first half of given list
34 elements in the list, about 5 iterations left
Test causing failure is in second half of given list
17 elements in the list, about 4 iterations left
Test causing failure is in second half of given list
9 elements in the list, about 3 iterations left
Test causing failure is in second half of given list
5 elements in the list, about 2 iterations left
Test causing failure is in second half of given list
3 elements in the list, about 1 iterations left
Test causing failure is in second half of given list
2 elements in the list, about 1 iterations left
Test causing failure is in first half of given list
The test that causes the failure is webui.tests.test_views_account.AccountTemplateTestCase.test_backup_device_warning

Mocking iterators

Categories: Geeky

A colleague wanted to mock a Journal object which both has callable methods and works as an iterator itself. So it works like this:

j = Journal()
j.log_level(Journal.INFO)
for line in j:
   print(line)

We mocked it like this, to be able to pass an actual list of expected values the function will iterate over:

import mock

mock_journal = mock.Mock()
mock_journal.__next__ = mock.Mock(side_effect=[1,2,3,4])
mock_journal.__iter__ = mock.Mock(return_value=mock_journal)


for i in mock_journal:
    print(i)
# I don't call any methods in mock_journal, but I could,
# :and could then assert they were called.

So mock_journal is both a mock proper, where methods can be called (and then asserted on), and an iterable, which when called repeatedly will yield elements of the __next__ side_effect.

Forcing Python Requests to connect to a specific IP address

Categories: English Geeky Trabajo

Recently I ran into a script which tried to verify HTTPS connection and response to a specific IP address. The “traditional” way to do this is  (assuming I want http://example.com/some/path on IP 1.2.3.4):

    requests.get("http://1.2.3.4/some/path", headers={'Host': 'example.com'})

This is useful if I want to specifically test how 1.2.3.4 is responding; for instance, if example.com is DNS round-robined to several IP addresses and I want to hit one of them specifically.

This also works for https requests if using Python <2.7.9 because older versions don’t do SNI and thus don’t pass the requested hostname as part of the SSL handshake.

However, Python >=2.7.9 and >=3.4.x conveniently added SNI support, breaking this hackish way of connecting to the IP, because the IP address embedded in the URL is passed as part of the SSL handshake, causing errors (mainly, the server returns a 400 Bad Request because the SNI host 1.2.3.4 doesn’t match the one in the HTTP headers example.com).

The “easiest” way to achieve this is to force the IP address at the lowest possible level, namely when we do socket.create_connection. The rest of the “stack” is given the actual hostname. So the sequence is:

  1. Open a socket to 1.2.3.4
  2. SSL wrap this socket using the hostname.
  3. Do the rest of the HTTPS traffic, headers and all over this socket.

Unfortunately Requests hides the socket.create_connection call in the deep recesses of urllib3, so the specified chain of classes is needed to propagate the given dest_ip value all the way down the stack.

After wrestling with this for a bit, I wrote a TransportAdapter and accompanying stack of subclasses to be able to pass a specific IP for connection.

Use it like this:

session = requests.Session()
session.mount("https://example.com", ForcedIPHTTPSAdapter(dest_ip='1.2.3.4'))
response = session.get(
    '/some/path', headers={'Host': 'example.com'}, verify=False)

There are a good number of subtleties on how it works, because it messes with the connection stack at all levels, I suggest you read the README to see how to use it in detail and whether it applies to you need. I even included a complete example script that uses this adapter.

Resources that helped:

http://stackoverflow.com/questions/22609385/python-requests-library-define-specific-dns

https://github.com/RhubarbSin/example-requests-transport-adapter/blob/master/adapter.py

Take me to your leader – Using Juju leadership for cron tasks in a multiunit service

Categories: English Geeky

I’m working on adding some periodic maintenance tasks to a service deployed using Juju. It’s a standard 3-tier web application with a number of Django application server units for load balancing and distribution.

Clearly the maintenance tasks’ most natural place to run is in one of these units, since they have all of the application’s software installed and doing the maintenance is as simple as running a “management command” with the proper environment set up.

A nice property we have by using Juju is that these application server units are just clones of each other, this allows scaling up/down very easily because the units are treated the same. However, the periodic maintenance stuff introduces an interesting problem, because we want only one of the units to run the maintenance tasks (no need for them to run several times). The maintenance scripts can conceivably be run in all units, even simultaneously (they do proper locking to avoid stepping on each other). And this would perhaps be OK if we only had 2 service units, but what if, as is the case, we have many more? there is still a single database and hitting it 5-10 times with what is essentially a redundant process sounded like an unacceptable tradeoff for the simplicity of the “just run them on each unit” approach.

We could also implement some sort of duplicate collapsing, perhaps by using something like rabbitmq and celery/celery beat to schedule periodic tasks. I refused to consider this since it seemed like swatting flies with a cannon, given that the first solution coming to mind is a one-line cron job. Why reinvent the wheel?

The feature that ended up solving the problem, thanks to the fine folks in Freenet’s #juju channel, is leadership, a feature which debuted in recent versions of Juju. Essentially, each service has one unit designated as the “leader” and it can be targeted with specific commands, queried by other units (‘ask this to my service’s leader’) and more importantly, unambiguously identified: a unit can determine whether it is the leader, and Juju events are fired when leadership changes, so units can act accordingly. Note that leadership is fluid and can change, so the charm needs to account for these changes. For example, if the existing leader is destroyed or has a charm hook error, it will be “deposed” and a new leader is elected from among the surviving units. Luckily all the details of this are handled by Juju itself, and charms/units need only hook on the leadership events and act accordingly.

So it’s then as easy as having the cron jobs run only on the leader unit, and not on the followers.

The simplistic way of using leadership to ensure only the leader unit performs an action was something like this in the crontab:

* * * * * root if [ $(juju-run {{ unit_name }} is-leader) = 'True' ]; then run-maintenance.sh; fi

This uses juju-run with the unit’s name (which is hardcoded in the crontab – this is a detail of how juju run is used which I don’t love, but it works) to run the is-leader command in the unit. This will print out “True” if the executing unit is the leader, and False otherwise. So this will condition execution on the current unit being the leader.

Discussing this with my knowledgeable colleagues, a problem was pointed out: juju-run is blocking and could potentially stall if other Juju tasks are being run. This is possibly not a big deal but also not ideal, because we know leadership information changes infrequently and we also have specific events that are fired when it does change.

So instead, they suggested updating the crontab file when leadership changes, and hardcoding leadership status in the file. This way units can decide whether to actually run the command based on locally-available information which removes the lock on Juju.

The solution looks like this, when implemented using Ansible integration in the charm. I just added two tasks: One registers a variable holding is-leader output when either the config or leadership changes:

- name: register leadership data
      tags:
        - config-changed
        - leader-elected
        - leader-settings-changed
      command: is-leader
      register: is_leader

The second one fires on the same events and just uses the registered variable to write the crontabs appropriately. Note that Ansible’s “cron” plugin takes care of ensuring “crupdate” behavior for these crontab entries. Just be mindful if you change the “name” because Ansible uses that as the key to decide whether to update or create anew:

- name: create maintenance crontabs
      tags:
        - config-changed
        - leader-elected
        - leader-settings-changed
      cron:
        name: "roadmr maintenance - {{item.name}}"
        special_time: "daily"
        job: "IS_LEADER='{{ is_leader.stdout }}'; if [ $IS_LEADER = 'True' ]; then {{ item.command }}; fi"
        cron_file: roadmr-maintenance
        user: "{{ user }}"
      with_items:
        - name: Delete all foos
          command: "delete_foos"
        - name: Update all bars
          command: "update_bars"

A created crontab file (in /etc/cron.d/roadmr-maintenance) looks like this:

# Ansible: roadmr maintenance - Delete all foos
@daily roadmr IS_LEADER='True'; if [ $IS_LEADER = 'True' ]; then delete_foos; fi

A few notes about this. The IS_LEADER variable looks redundant. We could have put it directly in the comparison or simply wrote the crontab file only in the leader unit, removing it on the other ones. We specifically wanted the crontab to exist in all units and just be conditional on leadership. IS_LEADER makes it super obvious, right there in the crontab, whether the command will run. While redundant, we felt it added clarity.

Save for the actual value of IS_LEADER, the crontab is present and identical in all units. This helps people who log directly into the unit to understand what may be going on in case of trouble. Traditionally people log into the first unit; but what if that happens to not be the leader? If we write the crontab only on the leader and remove from other units, it will not be obvious that there’s a task running somewhere.

Charm Ansible integration magically runs tasks by tags identifying the hook events they should fire on. So by just adding the three tags, these events will fire in the specified order on config-changed, leader-elected and leader-settings-changed events.

The two leader hooks are needed because leader-elected is only fired on the actual leader unit; all the others get leader-settings-changed instead.

Last but not least, on’t forget to also declare the new hooks in your hooks.py file, in the hooks declaration which now looks like this (see last two lines added):

hooks = charmhelpers.contrib.ansible.AnsibleHooks(
    playbook_path='playbook.yaml',
    default_hooks=[
        'config-changed',
        'upgrade-charm',
        'memcached-relation-changed',
        'wsgi-file-relation-changed',
        'website-relation-changed',
        'leader-elected',
        'leader-settings-changed',
    ])

Finally, I’d be remiss not to mention an existing bug in leadership event firing. Because of that, until leadership event functionality is fixed and 100% reliable, I wouldn’t use this technique for tasks which absolutely, positively need to be run without fail or the world will end. Here, I’m just using them for maintenance and it’s not a big deal if runs are missed for a few days. That said, if you need a 100% guarantee that your tasks will run, you’ll definitely want to implement something more robust and failproof than a simple crontab.