The hunt for a kernel bug, part 3: compiling a kernel

Compiling a Linux kernel sounds scary and complicated, but I found out it actually isn’t.

The first thing to do, is to install some prerequisites:

$ sudo apt install --yes asciidoc binutils-dev bison build-essential ccache \
    crash dwarves fakeroot flex git git-core git-doc git-email kernel-package \
    kernel-wedge kexec-tools libelf-dev libncurses5 libncurses5-dev libssl-dev \
    makedumpfile zstd
$ sudo apt-get --yes build-dep linux

Next I cloned the Ubuntu Impish repository. This takes a while…

$ git clone git://kernel.ubuntu.com/ubuntu/ubuntu-impish.git
$ cd ubuntu-impish

Now let’s see which versions are in the repository:

$ git tag --list
Ubuntu-5.11.0-16.17
Ubuntu-5.11.0-18.19+21.10.1
Ubuntu-5.11.0-20.21+21.10.1
Ubuntu-5.13.0-11.11
Ubuntu-5.13.0-12.12
Ubuntu-5.13.0-13.13
Ubuntu-5.13.0-14.14
Ubuntu-5.13.0-15.15
Ubuntu-5.13.0-16.16
Ubuntu-5.13.0-17.17
Ubuntu-5.13.0-18.18
Ubuntu-5.13.0-19.19
Ubuntu-5.13.0-20.20
Ubuntu-5.13.0-21.21
Ubuntu-5.13.0-22.22
Ubuntu-5.13.0-23.23
Ubuntu-5.13.0-24.24
Ubuntu-5.13.0-25.26
Ubuntu-5.13.0-26.27
Ubuntu-5.13.0-27.29
Ubuntu-5.13.0-28.31
Ubuntu-5.13.0-29.32
Ubuntu-5.13.0-30.33
Ubuntu-5.13.0-31.34
Ubuntu-5.13.0-32.35
freeze-20211018
freeze-20211108
freeze-20220131
freeze-20220221
v5.11
v5.13

The two tags that interest me, are Ubuntu-5.13.0-22.22 and Ubuntu-5.13.0-23.23. I’m starting with the former.

git checkout Ubuntu-5.13.0-22.22

First I copy the configuration of the current running kernel to the working directory:

$ cp /boot/config-$(uname --kernel-release) .config

I don’t want or need full debugging. That makes an enormous kernel and it takes twice as long to compile, so I turn debugging off:

$ scripts/config --disable DEBUG_INFO

I need to disable certificate stuff:

$ scripts/config --disable SYSTEM_TRUSTED_KEYS
$ scripts/config --disable SYSTEM_REVOCATION_KEYS

Next: update the kernel config and set all new symbols to their default value.

$ make olddefconfig

Then the most exciting thing can start: actually compiling the kernel!

$ make clean
$ time make --jobs=$(getconf _NPROCESSORS_ONLN) bindeb-pkg \
    LOCALVERSION=-$(git describe --long | tr '[:upper:]' '[:lower:]')
  • time is to see how long the compilation took.
  • getconf _NPROCESSORS_ONLN queries the number of processors on the computer. make will then try to run that many jobs in parallel.
  • bindeb-pkg will create .deb packages in the directory above.
  • LOCALVERSION appends a string to the kernel name.
  • git describe --long shows how far after a tag a certain commit is. In this case: Ubuntu-5.13.0-22.22-0-g3ab15e228151
    • Ubuntu-5.13.0-22.22 is the tag.
    • 0 is how many commits after the tag. In this case it’s the tag itself.
    • 3ab15e228151 is the abbreviated hash of the current commit.
  • tr '[:upper:]' '[:lower:]' is needed because .deb packages can’t contain upper case letters (I found out the hard way).

Now go grab a coffee, tea or chai latte. Compilation took 22 minutes on my computer.

Chai latte

When the compilation is done, there are 3 .deb packages in the directory above:

$ ls -1 ../*.deb
../linux-headers-5.13.19-ubuntu-5.13.0-22.22-0-g3ab15e228151_5.13.19-ubuntu-5.13.0-22.22-0-g3ab15e228151-21_amd64.deb
../linux-image-5.13.19-ubuntu-5.13.0-22.22-0-g3ab15e228151_5.13.19-ubuntu-5.13.0-22.22-0-g3ab15e228151-21_amd64.deb
../linux-libc-dev_5.13.19-ubuntu-5.13.0-22.22-0-g3ab15e228151-21_amd64.deb

Install the linux-headers and the linux-image packages, you don’t need the libc-dev package.

$ sudo dpkg --install \
    ../linux-{headers,image}-*$(git describe --long | tr '[:upper:]' '[:lower:]')*.deb

The kernel is now installed in the /boot directory, and it’s available in the Grub menu after reboot.

$ ls -1 /boot/*$(git describe --long | tr '[:upper:]' '[:lower:]')*
/boot/config-5.13.19-ubuntu-5.13.0-22.22-0-g3ab15e228151
/boot/initrd.img-5.13.19-ubuntu-5.13.0-22.22-0-g3ab15e228151
/boot/System.map-5.13.19-ubuntu-5.13.0-22.22-0-g3ab15e228151
/boot/vmlinuz-5.13.19-ubuntu-5.13.0-22.22-0-g3ab15e228151

Kernel ubuntu-5.13.0-22.22-0-g3ab15e228151 is, for all intents and purposes, the same as kernel 5.13.0-22-generic, so I expected it to be a “good” kernel, and it was.

For kernel Ubuntu-5.13.0-23.23 I did the same thing: starting from the git checkout. I skipped copying and editing the config file, because between minor releases I don’t expect there to be much change. I did run make olddefconfig for good measure, though. As expected, the keyboard and mouse didn’t work with the compiled ...-23 kernel.

Next up: using git bisect to find the exact commit where it went wrong. It’s got to be somewhere between ...-22 and ...-23!

The hunt for a kernel bug, part 2: an easy way to install mainline kernels

As I wrote previously, I’m suspecting a Linux kernel bug somewhere between versions 5.13.0-22 and 5.13.0-23, in the Ubuntu kernels. I wanted to know if the issue only surfaced in Ubuntu-flavored kernels, or also in the upstream (mainline) kernels from kernel.org.

There is an Ubuntu Mainline PPA with all the upstream kernels, but I found it a bit too opaque to use. Fortunately I found the Ubuntu Mainline Kernel Installer (UMKI), a tool for installing the latest Linux kernels on Ubuntu-based distributions.

Ubuntu Mainline Kernel Installer (UMKI)

The UMKI is pretty straightforward. It fetches a list of kernels from the Ubuntu Mainline PPA and a GUI displays available and installed kernels, regardless of how they were installed. It installs the kernel, headers and modules. There is also a CLI client.

To install the UMKI:

sudo add-apt-repository ppa:cappelikan/ppa
sudo apt update
sudo apt install mainline

With that out of the way, there’s the matter of deciding which kernels to try. The “interesting” Ubuntu kernels are 5.13.0-22 and 5.13.0-23, so the mainline kernels I definitely want to test, are around those versions. That means 5.13.0 and 5.13.1. I also want to try the latest 5.13.x kernel, so that’s 5.13.19, and the most recent stable kernel, 5.16.11 (as of 2022-03-01).

To summarize, I have tested these mainline kernels:

  • 5.13.0
  • 5.13.1
  • 5.13.19
  • 5.16.11

The result (after several reboots)? With all of them, my keyboard and mouse worked without a hitch. That means the issue most likely doesn’t occur in (stable) mainline kernels, only in kernels with additional patches from Ubuntu.

Up next: compiling kernels from source.

Lasciate ogne speranza, voi ch’intrate.

Dante Alighieri

The hunt for a kernel bug, part 1

The operating system on my computer is Ubuntu Linux, version 21.10 (Impish Indri). Recently I had an issue that, after a kernel update (and reboot), my USB keyboard and mouse didn’t work any more in the login screen. Huh, that’s unexpected.
The issue was:

  • At the Grub boot menu, the keyboard works: I can use the keys, the numlock led lights up, the LCD of the Logitech G19 displays a logo.
  • At the Ubuntu login screen, the keyboard (and the mouse) went dark: no backlight of the keys, no numlock led, no logo on the display. And the mouse cursor didn’t move on screen.

Must be a problem at my end, I initially thought, because surely, something so essential as input devices wouldn’t break by a simple kernel update? So I did some basic troubleshooting:

  • Have you tried to turn it off and on again?
Have you tried to turn it off and on again?
Have you tried to turn it off and on again?
  • Plug the keyboard in another USB port.
  • Try a different keyboard.
  • Start with the older kernel, which was still in the Grub menu. And indeed, this gave me back control over my input devices!

So if the only thing I changed was the kernel, then maybe it’s a kernel bug after all?

I know that Ubuntu 21.10 uses kernel 5.something, and I know that I use the generic kernels. So which kernels are we talking about, actually?

$ apt-cache show linux-image-5*-generic | grep Package: | sed 's/Package: //g'
linux-image-5.13.0-19-generic
linux-image-5.13.0-20-generic
linux-image-5.13.0-21-generic
linux-image-5.13.0-22-generic
linux-image-5.13.0-23-generic
linux-image-5.13.0-25-generic
linux-image-5.13.0-27-generic
linux-image-5.13.0-28-generic
linux-image-5.13.0-30-generic

9 kernels, that’s not too bad. All of them 5.13.0-XX-generic. So I just installed all the kernels:

$ sudo apt install --yes \
    linux-{image,headers,modules,modules-extra,tools}-5.13.0-*-generic
One Eternity Later

My /boot directory is quite busy now:

$  ls -hl /boot
total 1,2G
drwxr-xr-x  4 root root  12K mrt  1 18:11 .
drwxr-xr-x 20 root root 4,0K mrt  1 18:11 ..
-rw-r--r--  1 root root 252K okt  7 11:09 config-5.13.0-19-generic
-rw-r--r--  1 root root 252K okt 15 15:53 config-5.13.0-20-generic
-rw-r--r--  1 root root 252K okt 19 10:41 config-5.13.0-21-generic
-rw-r--r--  1 root root 252K nov  5 10:21 config-5.13.0-22-generic
-rw-r--r--  1 root root 252K nov 26 12:14 config-5.13.0-23-generic
-rw-r--r--  1 root root 252K jan  7 16:16 config-5.13.0-25-generic
-rw-r--r--  1 root root 252K jan 12 15:43 config-5.13.0-27-generic
-rw-r--r--  1 root root 252K jan 13 18:13 config-5.13.0-28-generic
-rw-r--r--  1 root root 252K feb  4 17:40 config-5.13.0-30-generic
drwx------  4 root root 4,0K jan  1  1970 efi
drwxr-xr-x  5 root root 4,0K mrt  1 18:11 grub
lrwxrwxrwx  1 root root   28 feb 28 04:26 initrd.img -> initrd.img-5.13.0-22-generic
-rw-r--r--  1 root root  40M mrt  1 16:02 initrd.img-5.13.0-19-generic
-rw-r--r--  1 root root  40M mrt  1 17:39 initrd.img-5.13.0-20-generic
-rw-r--r--  1 root root  40M mrt  1 17:38 initrd.img-5.13.0-21-generic
-rw-r--r--  1 root root  40M feb 26 13:55 initrd.img-5.13.0-22-generic
-rw-r--r--  1 root root  40M mrt  1 17:40 initrd.img-5.13.0-23-generic
-rw-r--r--  1 root root  40M mrt  1 17:40 initrd.img-5.13.0-25-generic
-rw-r--r--  1 root root  40M mrt  1 17:41 initrd.img-5.13.0-27-generic
-rw-r--r--  1 root root  40M mrt  1 17:41 initrd.img-5.13.0-28-generic
-rw-r--r--  1 root root  40M mrt  1 17:38 initrd.img-5.13.0-30-generic
-rw-------  1 root root 5,7M okt  7 11:09 System.map-5.13.0-19-generic
-rw-------  1 root root 5,7M okt 15 15:53 System.map-5.13.0-20-generic
-rw-------  1 root root 5,7M okt 19 10:41 System.map-5.13.0-21-generic
-rw-------  1 root root 5,7M nov  5 10:21 System.map-5.13.0-22-generic
-rw-------  1 root root 5,7M nov 26 12:14 System.map-5.13.0-23-generic
-rw-------  1 root root 5,7M jan  7 16:16 System.map-5.13.0-25-generic
-rw-------  1 root root 5,7M jan 12 15:43 System.map-5.13.0-27-generic
-rw-------  1 root root 5,7M jan 13 18:13 System.map-5.13.0-28-generic
-rw-------  1 root root 5,7M feb  4 17:40 System.map-5.13.0-30-generic
lrwxrwxrwx  1 root root   25 feb 28 04:27 vmlinuz -> vmlinuz-5.13.0-22-generic
-rw-------  1 root root 9,8M okt  7 19:37 vmlinuz-5.13.0-19-generic
-rw-------  1 root root 9,8M okt 15 15:56 vmlinuz-5.13.0-20-generic
-rw-------  1 root root 9,8M okt 19 10:43 vmlinuz-5.13.0-21-generic
-rw-------  1 root root 9,8M nov  5 13:51 vmlinuz-5.13.0-22-generic
-rw-------  1 root root 9,8M nov 26 11:52 vmlinuz-5.13.0-23-generic
-rw-------  1 root root 9,8M jan  7 16:19 vmlinuz-5.13.0-25-generic
-rw-------  1 root root 9,8M jan 12 16:19 vmlinuz-5.13.0-27-generic
-rw-------  1 root root 9,8M jan 13 18:10 vmlinuz-5.13.0-28-generic
-rw-------  1 root root 9,8M feb  4 17:46 vmlinuz-5.13.0-30-generic

I tried all these kernels. The last kernel where my input devices still worked, was 5.13.0-22-generic, and the first where they stopped working, was 5.13.0-23-generic. Which leads me to assume that some unintended change was introduced between those two versions, and it hasn’t been fixed since.

For now, I’m telling Ubuntu to keep kernel 5.13.0-22-generic and not upgrade to a more recent version.

$ sudo apt-mark hold linux-image-5.13.0-22-generic
linux-image-5.13.0-22-generic set on hold.

I also want Grub to show me the known working kernel as the default change. To do that, I’ve put this in /etc/default/grub:

GRUB_DEFAULT="Advanced options for Ubuntu>Ubuntu, with Linux 5.13.0-22-generic"

followed by sudo update-grub.

I’ll do the following things next, to get to the bottom of this:

A small rant about dependencies (and a promise)

Every now and then I run into some awesome open source project on GitHub, that is written in some cool programming language, and it assumes that the development tools for that language are already installed. My assumption is that they have a specific target audience in mind: an already existing developer community around that specific language. People who already have those tools installed.

The annoying thing is when someone like me, who doesn’t really need to know if a thing is written in Python or Ruby or JavaScript or whatever, tries to follow instructions like these:

$ pip install foo
Command 'pip' not found
$ gem install bar
Command 'gem' not found
$ yarn install baz
Command 'yarn' not found
$ ./configure && make && sudo make install
Command 'make' not found

By now, I already know that I first need to do sudo apt install python3-pip (or the equivalent installation commands for RubyGems, Yarn, build-essential,…). I also understand that, within the context of a specific developer community, this is so obvious that it is often assumed. That being said, I am making a promise:

For every open source project that I will henceforth publish online (on Github or any other code sharing platforms), I promise to do the following things:
(1) Test the installation on at least one clean installed operating system – which will be documented.
(2) Include full installation steps in the documentation, including all frameworks, development tools, etc. that would otherwise be assumed.
(3) Where possible and useful, provide an installation script.

The operating system I’m currently targeting, is Ubuntu, which means I’ll include apt commands. I’m counting on Continuous Integration to help me test on other operating systems that I don’t personally use.

Installing Ubuntu 20.04 LTS on 2011 MacBook Air

My laptop is a 2011 MacBook Air. I’m not a huge Apple fan, it’s just that at the time it had the most interesting hardware features compared to similar laptops. And it’s quite sturdy, so that’s nice.

Over the years I have experimented with installing Linux in parallel to the OS X operating system, but in the end I settled on installing my favorite Linux tools inside OS X using Homebrew, because having two different operating systems on one laptop was Too Much Effort. In recent times Apple has decided, in it’s infinite wisdom (no sarcasm at all *cough*), that it will no longer provide operating system upgrades for older hardware. Okay, then. Lately the laptop had become slow as molasses anyway, so I decided to replace OS X entirely with Ubuntu. No more half measures! I chose 20.04 LTS for the laptop because reasons. 🙂

The laptop was really slow…

According to the Ubuntu Community Help Wiki, all hardware should be supported, except Thunderbolt. I don’t use anything Thunderbolt, so that’s OK for me. The installation was pretty straightforward: I just created a bootable USB stick and powered on the Mac with the Option/Alt (⌥) key pressed. Choose EFI Boot in the Startup Manager, and from there on it’s all a typical Ubuntu installation.

screenshot
Startup Manager

I did not bother with any of the customizations described on the Ubuntu Wiki, because everything worked straight out of the box, and besides, the wiki is terribly outdated anyway.

The end result? I now have a laptop that feels snappy again, and that still gets updates for the operating system and the installed applications. And it’s my familiar Linux. What’s next? I’m thinking about using Ansible to configure the laptop.

To finish, I want to show you my sticker collection on the laptop. There’s still room for a lot more!

sticker collection on my laptop. Photo copyright: me.

Living without email for a month

Remember when my webserver was acting up? Well, I was so fed up with it, that I took a preconfigured Bitnami WordPress image and ran that on AWS. I don’t care how Bitnami configured it, as long as it works.

As a minor detail, postfix/procmail/dovecot were of course not installed or configured. Meh. This annoyed the Mrs. a bit because she didn’t get her newsletters. But I was so fed up with all the technical problems, that I waited a month to do anything about it.

Doing sudo apt-get -y install postfix procmail dovecot-pop3d and copying over the configs from the old server solved that.

Did I miss email during that month? Not at all. People were able to contact met through Twitter, Facebook, Telegram and all the other social networks. And I had an entire month without spam. Wonderful!

The Website Was Down

Captain: What happen?
Mechanic: Somebody set up us the bomb!

So yeah, my blog was off the air for a couple of days. So what happened?

This is what /var/log/nginx/error.log told me:

2016/06/27 08:48:46 [error] 22758#0: *21197
connect() to unix:/var/run/php5-fpm.sock failed (11: Resource temporarily unavailable) while connecting to upstream, client: 194.187.170.206, server: blog.amedee.be, request: "GET /wuala-0 HTTP/1.0", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host:
"amedee.be"

So I asked Doctor Google “connect() to unix:/var/run/php5-fpm.sock failed (11: resource temporarily unavailable)” and got this answer from StackOverflow:

The issue is socket itself, its problems on high-load cases is well-known. Please consider using TCP/IP connection instead of unix socket, for that you need to make these changes:

  • in php-fpm pool configuration replace listen = /var/run/php5-fpm.sock with listen = 127.0.0.1:7777
  • in /etc/nginx/php_location replace fastcgi_pass unix:/var/run/php5-fpm.sock; with fastcgi_pass 127.0.0.1:7777;

followed by a carefull application of

sudo /etc/init.d/php-fpm restart
sudo /etc/init.d/nginx restart

Tl;dr version: don’t use a Unix socket, use an IP socket. For great justice!

I leave you with this classic:

How big is a clean install of Ubuntu Jammy Jellyfish (22.04)?

Because curiosity killed the cat, not because it’s useful! 😀

Start with a clean install in a virtual machine

I start with a simple Vagrantfile:

Vagrant.configure("2") do |config|
  config.vm.box = "ubuntu/jammy64"
  config.vm.provision "ansible" do |ansible|
    ansible.playbook = "playbook.yml"
  end
end

This Ansible playbook updates all packages to the latest version and removes unused packages.

- name: Update all packages to the latest version
  hosts: all
  remote_user: ubuntu
  become: yes

  tasks:

  - name: Update apt cache
    apt:
      update_cache: yes
      cache_valid_time: 3600
      force_apt_get: yes

  - name: Upgrade all apt packages
    apt:
      force_apt_get: yes
      upgrade: dist

  - name: Check if a reboot is needed for Ubuntu boxes
    register: reboot_required_file
    stat: path=/var/run/reboot-required get_md5=no

  - name: Reboot the Ubuntu box
    reboot:
      msg: "Reboot initiated by Ansible due to kernel updates"
      connect_timeout: 5
      reboot_timeout: 300
      pre_reboot_delay: 0
      post_reboot_delay: 30
      test_command: uptime
    when: reboot_required_file.stat.exists

  - name: Remove unused packages
    apt:
      autoremove: yes
      purge: yes
      force_apt_get: yes

Then bring up the virtual machine with vagrant up --provision.

Get the installation size

I ssh into the box (vagrant ssh) and run a couple of commands to get some numbers.

Number of installed packages:

$ dpkg-query --show | wc --lines
592

Size of the installed packages:

$ dpkg-query --show --showformat '${Installed-size}\n' | awk '{s+=$1*1024} END {print s}' | numfmt --to=iec-i --format='%.2fB'
1.14GiB

I need to multiply the package size with 1024 because dpkg-query outputs size in kilobytes.

Total size:

$ sudo du --summarize --human-readable --one-file-system /
1.9G	/

Get the installation size using Ansible

Of course, I can also add this to my Ansible playbook, and then I don’t have to ssh into the virtual machine.

  - name: Get the number of installed packages
    shell: dpkg-query --show | wc --lines
    register: package_count
    changed_when: false
    failed_when: false
  - debug: msg="{{ package_count.stdout }}"

  - name: Get the size of installed packages
    shell: >
      dpkg-query --show --showformat '${Installed-size}\n' 
      | awk '{s+=$1*1024} END {print s}' 
      | numfmt --to=iec-i --format='%.2fB'
    register: package_size
    changed_when: false
    failed_when: false
  - debug: msg="{{ package_size.stdout }}"

  - name: Get the disk size with du
    shell: >
      du --summarize --one-file-system /
      | numfmt --to=iec-i --format='%.2fB'
    register: du_used
    changed_when: false
    failed_when: false
  - debug: msg="{{ du_used.stdout }}"

The output is then:

TASK [Get the number of installed packages] ************************************
ok: [default]

TASK [debug] *******************************************************************
ok: [default] => {
    "msg": "592"
}

TASK [Get the size of installed packages] **************************************
ok: [default]

TASK [debug] *******************************************************************
ok: [default] => {
    "msg": "1.14GiB"
}

TASK [Get the disk size with du] ***********************************************
ok: [default]

TASK [debug] *******************************************************************
ok: [default] => {
    "msg": "1.82MiB /"
}