As I wrote previously, I’m suspecting a Linux kernel bug somewhere between versions 5.13.0-22 and 5.13.0-23, in the Ubuntu kernels. I wanted to know if the issue only surfaced in Ubuntu-flavored kernels, or also in the upstream (mainline) kernels from kernel.org.
There is an Ubuntu Mainline PPA with all the upstream kernels, but I found it a bit too opaque to use. Fortunately I found the Ubuntu Mainline Kernel Installer (UMKI), a tool for installing the latest Linux kernels on Ubuntu-based distributions.
Ubuntu Mainline Kernel Installer (UMKI)
The UMKI is pretty straightforward. It fetches a list of kernels from the Ubuntu Mainline PPA and a GUI displays available and installed kernels, regardless of how they were installed. It installs the kernel, headers and modules. There is also a CLI client.
With that out of the way, there’s the matter of deciding which kernels to try. The “interesting” Ubuntu kernels are 5.13.0-22 and 5.13.0-23, so the mainline kernels I definitely want to test, are around those versions. That means 5.13.0 and 5.13.1. I also want to try the latest 5.13.x kernel, so that’s 5.13.19, and the most recent stable kernel, 5.16.11 (as of 2022-03-01).
To summarize, I have tested these mainline kernels:
5.13.0
5.13.1
5.13.19
5.16.11
The result (after several reboots)? With all of them, my keyboard and mouse worked without a hitch. That means the issue most likely doesn’t occur in (stable) mainline kernels, only in kernels with additional patches from Ubuntu.
The operating system on my computer is Ubuntu Linux, version 21.10 (Impish Indri). Recently I had an issue that, after a kernel update (and reboot), my USB keyboard and mouse didn’t work any more in the login screen. Huh, that’s unexpected. The issue was:
At the Grub boot menu, the keyboard works: I can use the keys, the numlock led lights up, the LCD of the Logitech G19 displays a logo.
At the Ubuntu login screen, the keyboard (and the mouse) went dark: no backlight of the keys, no numlock led, no logo on the display. And the mouse cursor didn’t move on screen.
Must be a problem at my end, I initially thought, because surely, something so essential as input devices wouldn’t break by a simple kernel update? So I did some basic troubleshooting:
Have you tried to turn it off and on again?
Have you tried to turn it off and on again?
Plug the keyboard in another USB port.
Try a different keyboard.
Start with the older kernel, which was still in the Grub menu. And indeed, this gave me back control over my input devices!
So if the only thing I changed was the kernel, then maybe it’s a kernel bug after all?
I know that Ubuntu 21.10 uses kernel 5.something, and I know that I use the generic kernels. So which kernels are we talking about, actually?
$ apt-cache show linux-image-5*-generic | grep Package: | sed 's/Package: //g'
linux-image-5.13.0-19-generic
linux-image-5.13.0-20-generic
linux-image-5.13.0-21-generic
linux-image-5.13.0-22-generic
linux-image-5.13.0-23-generic
linux-image-5.13.0-25-generic
linux-image-5.13.0-27-generic
linux-image-5.13.0-28-generic
linux-image-5.13.0-30-generic
9 kernels, that’s not too bad. All of them 5.13.0-XX-generic. So I just installed all the kernels:
I tried all these kernels. The last kernel where my input devices still worked, was 5.13.0-22-generic, and the first where they stopped working, was 5.13.0-23-generic. Which leads me to assume that some unintended change was introduced between those two versions, and it hasn’t been fixed since.
For now, I’m telling Ubuntu to keep kernel 5.13.0-22-generic and not upgrade to a more recent version.
$ sudo apt-mark hold linux-image-5.13.0-22-generic
linux-image-5.13.0-22-generic set on hold.
I also want Grub to show me the known working kernel as the default change. To do that, I’ve put this in /etc/default/grub:
GRUB_DEFAULT="Advanced options for Ubuntu>Ubuntu, with Linux 5.13.0-22-generic"
followed by sudo update-grub.
I’ll do the following things next, to get to the bottom of this:
Compile kernels from source, to hopefully find the exact change that caused the USB input devices to stop working. git bisect helps a lot in narrowing down the broken commit.
Every now and then I run into some awesome open source project on GitHub, that is written in some cool programming language, and it assumes that the development tools for that language are already installed. My assumption is that they have a specific target audience in mind: an already existing developer community around that specific language. People who already have those tools installed.
The annoying thing is when someone like me, who doesn’t really need to know if a thing is written in Python or Ruby or JavaScript or whatever, tries to follow instructions like these:
$ pip install foo
Command 'pip' not found
$ gem install bar
Command 'gem' not found
$ yarn install baz
Command 'yarn' not found
$ ./configure && make && sudo make install
Command 'make' not found
By now, I already know that I first need to do sudo apt install python3-pip (or the equivalent installation commands for RubyGems, Yarn, build-essential,…). I also understand that, within the context of a specific developer community, this is so obvious that it is often assumed. That being said, I am making a promise:
For every open source project that I will henceforth publish online (on Github or any other code sharing platforms), I promise to do the following things: (1) Test the installation on at least one clean installed operating system – which will be documented. (2) Include full installation steps in the documentation, including all frameworks, development tools, etc. that would otherwise be assumed. (3) Where possible and useful, provide an installation script.
The operating system I’m currently targeting, is Ubuntu, which means I’ll include apt commands. I’m counting on Continuous Integration to help me test on other operating systems that I don’t personally use.
My laptop is a 2011 MacBook Air. I’m not a huge Apple fan, it’s just that at the time it had the most interesting hardware features compared to similar laptops. And it’s quite sturdy, so that’s nice.
Over the years I have experimented with installing Linux in parallel to the OS X operating system, but in the end I settled on installing my favorite Linux tools inside OS X using Homebrew, because having two different operating systems on one laptop was Too Much Effort™. In recent times Apple has decided, in it’s infinite wisdom (no sarcasm at all *cough*), that it will no longer provide operating system upgrades for older hardware. Okay, then. Lately the laptop had become slow as molasses anyway, so I decided to replace OS X entirely with Ubuntu. No more half measures! I chose 20.04 LTS for the laptop because reasons. 🙂
The laptop was really slow…
According to the Ubuntu Community Help Wiki, all hardware should be supported, except Thunderbolt. I don’t use anything Thunderbolt, so that’s OK for me. The installation was pretty straightforward: I just created a bootable USB stick and powered on the Mac with the Option/Alt (⌥) key pressed. Choose EFI Boot in the Startup Manager, and from there on it’s all a typical Ubuntu installation.
Startup Manager
I did not bother with any of the customizations described on the Ubuntu Wiki, because everything worked straight out of the box, and besides, the wiki is terribly outdated anyway.
The end result? I now have a laptop that feels snappy again, and that still gets updates for the operating system and the installed applications. And it’s my familiar Linux. What’s next? I’m thinking about using Ansible to configure the laptop.
To finish, I want to show you my sticker collection on the laptop. There’s still room for a lot more!
sticker collection on my laptop. Photo copyright: me.
Remember when my webserver was acting up? Well, I was so fed up with it, that I took a preconfigured Bitnami WordPress image and ran that on AWS. I don’t care how Bitnami configured it, as long as it works.
As a minor detail, postfix/procmail/dovecot were of course not installed or configured. Meh. This annoyed the Mrs. a bit because she didn’t get her newsletters. But I was so fed up with all the technical problems, that I waited a month to do anything about it.
Doing sudo apt-get -y install postfix procmail dovecot-pop3d and copying over the configs from the old server solved that.
Did I miss email during that month? Not at all. People were able to contact met through Twitter, Facebook, Telegram and all the other social networks. And I had an entire month without spam. Wonderful!
So yeah, my blog was off the air for a couple of days. So what happened?
This is what /var/log/nginx/error.log told me:
2016/06/27 08:48:46 [error] 22758#0: *21197
connect() to unix:/var/run/php5-fpm.sock failed (11: Resource temporarily unavailable) while connecting to upstream, client: 194.187.170.206, server: blog.amedee.be, request: "GET /wuala-0 HTTP/1.0", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host:
"amedee.be"
So I asked Doctor Google “connect() to unix:/var/run/php5-fpm.sock failed (11: resource temporarily unavailable)” and got this answer from StackOverflow:
The issue is socket itself, its problems on high-load cases is well-known. Please consider using TCP/IP connection instead of unix socket, for that you need to make these changes:
in php-fpm pool configuration replace listen = /var/run/php5-fpm.sock with listen = 127.0.0.1:7777
in /etc/nginx/php_location replace fastcgi_pass unix:/var/run/php5-fpm.sock; with fastcgi_pass 127.0.0.1:7777;
Because curiosity killed the cat, not because it’s useful! 😀
Start with a clean install in a virtual machine
I start with a simple Vagrantfile:
Vagrant.configure("2") do |config|
config.vm.box = "ubuntu/jammy64"
config.vm.provision "ansible" do |ansible|
ansible.playbook = "playbook.yml"
end
end
This Ansible playbook updates all packages to the latest version and removes unused packages.
- name: Update all packages to the latest version
hosts: all
remote_user: ubuntu
become: yes
tasks:
- name: Update apt cache
apt:
update_cache: yes
cache_valid_time: 3600
force_apt_get: yes
- name: Upgrade all apt packages
apt:
force_apt_get: yes
upgrade: dist
- name: Check if a reboot is needed for Ubuntu boxes
register: reboot_required_file
stat: path=/var/run/reboot-required get_md5=no
- name: Reboot the Ubuntu box
reboot:
msg: "Reboot initiated by Ansible due to kernel updates"
connect_timeout: 5
reboot_timeout: 300
pre_reboot_delay: 0
post_reboot_delay: 30
test_command: uptime
when: reboot_required_file.stat.exists
- name: Remove unused packages
apt:
autoremove: yes
purge: yes
force_apt_get: yes
Then bring up the virtual machine with vagrant up --provision.
Get the installation size
I ssh into the box (vagrant ssh) and run a couple of commands to get some numbers.
Soms moet ne mens al eens iets speciaals doen, zoals het nemen van een screenshot op een toestel dat wel Linux draait, maar geen X. Oink? Volgens StackExchange zou ik fbgrab of fbdump moeten gebruiken, maar dat is in dit concrete geval niet mogelijk because reasons.
In dit concrete geval is er een toepassing die rechtstreeks naar de framebuffer beelden stuurt. Bon, alles is een file onder Linux, dus ik ging eens piepen wat er dan eigenlijk in dat framebuffer device zat:
$ cp /dev/fb0 /tmp/framebuffer.data $ head -c 64 /tmp/framebuffer.data kkk�kkk�kkk�kkk�kkk�kkk�kkk�kkk�kkk�kkk�kkk�kkk�kkk�kkk�kkk�kkk�
IEKS!!! Alhoewel… Tiens, dat zag er verdacht regelmatig uit, telkens in groepjes van 4 bytes. “k” heeft ASCII waarde 107, of 6B hexadecimaal, en #6B6B6B is een grijstint. Ik had voorlopig nog geen enkel idee wat die “�” betekende, maar ik wist dat ik iets op het spoor was!
Ik heb framebuffer.data dan gekopieerd naar een pc met daarop Gimp. (referentie naar Contact invoegen)