Skip to content

Uncategorized

Onzin, onzin, allemaal onzin!

šŸ§Ÿā€ā™‚ļø Resurrecting a Dead Commit from the GitHub Graveyard

There comes a time in every developer’s life when you just know a certain commit existed. You remember its hash: deadbeef1234. You remember what it did. You know it was important. And yet, when you go looking for it…

šŸ’„ fatal: unable to read tree <deadbeef1234>

Great. Git has ghosted you.

That was me today. All I had was a lonely commit hash. The branch that once pointed to it? Deleted. The local clone that once had it? Gone in a heroic but ill-fated attempt to save disk space. And GitHub? Pretending like it never happened. Typical.

🪦 Act I: The Naïve Clone

“Let’s just clone the repo and check out the commit,” I thought. Spoiler alert: that’s not how Git works.

git clone --no-checkout https://github.com/user/repo.git
cd repo
git fetch --all
git checkout deadbeef1234

🧨 fatal: unable to read tree 'deadbeef1234'

Thanks Git. Very cool. Apparently, if no ref points to a commit, GitHub doesn’t hand it out with the rest of the toys. It’s like showing up to a party and being told your friend never existed.

🧪 Act II: The Desperate fsck

Surely it’s still in there somewhere? Let’s dig through the guts.

git fsck --full --unreachable

Nope. Nothing but the digital equivalent of lint and old bubblegum wrappers.

šŸ•µļø Act III: The Final Trick

Then I stumbled across a lesser-known Git dark art:

git fetch origin deadbeef1234

And lo and behold, GitHub replied with a shrug and handed it over like, ā€œOh, that commit? Why didn’t you just say so?ā€

Suddenly the commit was in my local repo, fresh as ever, ready to be inspected, praised, and perhaps even resurrected into a new branch:

git checkout -b zombie-branch deadbeef1234

Mission accomplished. The dead walk again.


ā˜ ļø Moral of the Story

If you’re ever trying to recover a commit from a deleted branch on GitHub:

  1. Cloning alone won’t save you.
  2. git fetch origin <commit> is your secret weapon.
  3. If GitHub has completely deleted the commit from its history, you’re out of luck unless:
    • You have an old local clone
    • Someone forked the repo and kept it
    • CI logs or PR diffs include your precious bits

Otherwise, it’s digital dust.


šŸ§› Bonus Tip

Once you’ve resurrected that commit, create a branch immediately. Unreferenced commits are Git’s version of vampires: they disappear without a trace when left in the shadows.

git checkout -b safe-now deadbeef1234

And there you have it. One undead commit, safely reanimated.

🧹 Tidying Up After Myself: Automatically Deleting Old GitHub Issues

At some point, I had to admit it: I’ve turned GitHub Issues into a glorified chart gallery.

Let me explain.

Over on my amedee/ansible-servers repository, I have a workflow called workflow-metrics.yml, which runs after every pipeline. It uses yykamei/github-workflows-metrics to generate beautiful charts that show how long my CI pipeline takes to run. Those charts are then posted into a GitHub Issue—one per run.

It’s neat. It’s visual. It’s entirely unnecessary to keep them forever.

The thing is: every time the workflow runs, it creates a new issue and closes the old one. So naturally, I end up with a long, trailing graveyard of “CI Metrics” issues that serve no purpose once they’re a few weeks old.

Cue the digital broom. 🧹


Enter cleanup-closed-issues.yml

To avoid hoarding useless closed issues like some kind of GitHub raccoon, I created a scheduled workflow that runs every Monday at 3:00 AM UTC and deletes the cruft:

schedule:
  - cron: '0 3 * * 1' # Every Monday at 03:00 UTC

This workflow:

  • Keeps at least 6 closed issues (just in case I want to peek at recent metrics).
  • Keeps issues that were closed less than 30 days ago.
  • Deletes everything else—quietly, efficiently, and without breaking a sweat.

It’s also configurable when triggered manually, with inputs for dry_run, days_to_keep, and min_issues_to_keep. So I can preview deletions before committing them, or tweak the retention period as needed.


šŸ“‚ Complete Source Code for the Cleanup Workflow

name: 🧹 Cleanup Closed Issues

on:
  schedule:
    - cron: '0 3 * * 1' # Runs every Monday at 03:00 UTC
  workflow_dispatch:
    inputs:
      dry_run:
        description: "Enable dry run mode (preview deletions, no actual delete)"
        required: false
        default: "false"
        type: choice
        options:
          - "true"
          - "false"
      days_to_keep:
        description: "Number of days to retain closed issues"
        required: false
        default: "30"
        type: string
      min_issues_to_keep:
        description: "Minimum number of closed issues to keep"
        required: false
        default: "6"
        type: string

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

permissions:
  issues: write

jobs:
  cleanup:
    runs-on: ubuntu-latest

    steps:
      - name: Install GitHub CLI
        run: sudo apt-get install --yes gh

      - name: Delete old closed issues
        env:
          GH_TOKEN: ${{ secrets.GH_FINEGRAINED_PAT }}
          DRY_RUN: ${{ github.event.inputs.dry_run || 'false' }}
          DAYS_TO_KEEP: ${{ github.event.inputs.days_to_keep || '30' }}
          MIN_ISSUES_TO_KEEP: ${{ github.event.inputs.min_issues_to_keep || '6' }}
          REPO: ${{ github.repository }}
        run: |
          NOW=$(date -u +%s)
          THRESHOLD_DATE=$(date -u -d "${DAYS_TO_KEEP} days ago" +%s)
          echo "Only consider issues older than ${THRESHOLD_DATE}"

          echo "::group::Checking GitHub API Rate Limits..."
          RATE_LIMIT=$(gh api /rate_limit --jq '.rate.remaining')
          echo "Remaining API requests: ${RATE_LIMIT}"
          if [[ "${RATE_LIMIT}" -lt 10 ]]; then
            echo "āš ļø Low API limit detected. Sleeping for a while..."
            sleep 60
          fi
          echo "::endgroup::"

          echo "Fetching ALL closed issues from ${REPO}..."
          CLOSED_ISSUES=$(gh issue list --repo "${REPO}" --state closed --limit 1000 --json number,closedAt)

          if [ "${CLOSED_ISSUES}" = "[]" ]; then
            echo "āœ… No closed issues found. Exiting."
            exit 0
          fi

          ISSUES_TO_DELETE=$(echo "${CLOSED_ISSUES}" | jq -r \
            --argjson now "${NOW}" \
            --argjson limit "${MIN_ISSUES_TO_KEEP}" \
            --argjson threshold "${THRESHOLD_DATE}" '
              .[:-(if length < $limit then 0 else $limit end)]
              | map(select(
                  (.closedAt | type == "string") and
                  ((.closedAt | fromdateiso8601) < $threshold)
                ))
              | .[].number
            ' || echo "")

          if [ -z "${ISSUES_TO_DELETE}" ]; then
            echo "āœ… No issues to delete. Exiting."
            exit 0
          fi

          echo "::group::Issues to delete:"
          echo "${ISSUES_TO_DELETE}"
          echo "::endgroup::"

          if [ "${DRY_RUN}" = "true" ]; then
            echo "šŸ›‘ DRY RUN ENABLED: Issues will NOT be deleted."
            exit 0
          fi

          echo "ā³ Deleting issues..."
          echo "${ISSUES_TO_DELETE}" \
            | xargs -I {} -P 5 gh issue delete "{}" --repo "${REPO}" --yes

          DELETED_COUNT=$(echo "${ISSUES_TO_DELETE}" | wc -l)
          REMAINING_ISSUES=$(gh issue list --repo "${REPO}" --state closed --limit 100 | wc -l)

          echo "::group::āœ… Issue cleanup completed!"
          echo "šŸ“Œ Deleted Issues: ${DELETED_COUNT}"
          echo "šŸ“Œ Remaining Closed Issues: ${REMAINING_ISSUES}"
          echo "::endgroup::"

          {
            echo "### šŸ—‘ļø GitHub Issue Cleanup Summary"
            echo "- **Deleted Issues**: ${DELETED_COUNT}"
            echo "- **Remaining Closed Issues**: ${REMAINING_ISSUES}"
          } >> "$GITHUB_STEP_SUMMARY"


šŸ› ļø Technical Design Choices Behind the Cleanup Workflow

Cleaning up old GitHub issues may seem trivial, but doing it well requires a few careful decisions. Here’s why I built the workflow the way I did:

Why GitHub CLI (gh)?

While I could have used raw REST API calls or GraphQL, the GitHub CLI (gh) provides a nice balance of power and simplicity:

  • It handles authentication and pagination under the hood.
  • Supports JSON output and filtering directly with --json and --jq.
  • Provides convenient commands like gh issue list and gh issue delete that make the script readable.
  • Comes pre-installed on GitHub runners or can be installed easily.

Example fetching closed issues:

gh issue list --repo "$REPO" --state closed --limit 1000 --json number,closedAt

No messy headers or tokens, just straightforward commands.

Filtering with jq

I use jq to:

  • Retain a minimum number of issues to keep (min_issues_to_keep).
  • Keep issues closed more recently than the retention period (days_to_keep).
  • Parse and compare issue closed timestamps with precision.
  • Exclude pull requests from deletion by checking the presence of the pull_request field.

The jq filter looks like this:

jq -r --argjson now "$NOW" --argjson limit "$MIN_ISSUES_TO_KEEP" --argjson threshold "$THRESHOLD_DATE" '
  .[:-(if length < $limit then 0 else $limit end)]
  | map(select(
      (.closedAt | type == "string") and
      ((.closedAt | fromdateiso8601) < $threshold)
    ))
  | .[].number
'

Secure Authentication with Fine-Grained PAT

Because deleting issues is a destructive operation, the workflow uses a Fine-Grained Personal Access Token (PAT) with the narrowest possible scopes:

  • Issues: Read and Write
  • Limited to the repository in question

The token is securely stored as a GitHub Secret (GH_FINEGRAINED_PAT).

Note: Pull requests are not deleted because they are filtered out and the CLI won’t delete PRs via the issues API.

Dry Run for Safety

Before deleting anything, I can run the workflow in dry_run mode to preview what would be deleted:

inputs:
  dry_run:
    description: "Enable dry run mode (preview deletions, no actual delete)"
    default: "false"

This lets me double-check without risking accidental data loss.

Parallel Deletion

Deletion happens in parallel to speed things up:

echo "$ISSUES_TO_DELETE" | xargs -I {} -P 5 gh issue delete "{}" --repo "$REPO" --yes

Up to 5 deletions run concurrently — handy when cleaning dozens of old issues.

User-Friendly Output

The workflow uses GitHub Actions’ logging groups and step summaries to give a clean, collapsible UI:

echo "::group::Issues to delete:"
echo "$ISSUES_TO_DELETE"
echo "::endgroup::"

And a markdown summary is generated for quick reference in the Actions UI.


Why Bother?

I’m not deleting old issues because of disk space or API limits — GitHub doesn’t charge for that. It’s about:

  • Reducing clutter so my issue list stays manageable.
  • Making it easier to find recent, relevant information.
  • Automating maintenance to free my brain for other things.
  • Keeping my tooling neat and tidy, which is its own kind of joy.

Steal It, Adapt It, Use It

If you’re generating temporary issues or ephemeral data in GitHub Issues, consider using a cleanup workflow like this one.

It’s simple, secure, and effective.

Because sometimes, good housekeeping is the best feature.


🧼✨ Happy coding (and cleaning)!

šŸ“¦ Auto-growing disks in Vagrant: because 10 GB is never enough

Have you ever fired up a Vagrant VM, provisioned a project, pulled some Docker images, ran a build… and ran out of disk space halfway through? Welcome to my world. Apparently, the default disk size in Vagrant is tiny—and while you can specify a bigger virtual disk, Ubuntu won’t magically use the extra space. You need to resize the partition, the physical volume, the logical volume, and the filesystem. Every. Single. Time.

Enough of that nonsense.

šŸ›  The setup

Here’s the relevant part of my Vagrantfile:

Vagrant.configure(2) do |config|
  config.vm.box = 'boxen/ubuntu-24.04'
  config.vm.disk :disk, size: '20GB', primary: true

  config.vm.provision 'shell', path: 'resize_disk.sh'
end

This makes sure the disk is large enough and automatically resized by the resize_disk.sh script at first boot.

✨ The script

#!/bin/bash
set -euo pipefail
LOGFILE="/var/log/resize_disk.log"
exec > >(tee -a "$LOGFILE") 2>&1
echo "[$(date)] Starting disk resize process..."

REQUIRED_TOOLS=("parted" "pvresize" "lvresize" "lvdisplay" "grep" "awk")
for tool in "${REQUIRED_TOOLS[@]}"; do
  if ! command -v "$tool" &>/dev/null; then
    echo "[$(date)] ERROR: Required tool '$tool' is missing. Exiting."
    exit 1
  fi
done

# Read current and total partition size (in sectors)
parted_output=$(parted --script /dev/sda unit s print || true)
read -r PARTITION_SIZE TOTAL_SIZE < <(echo "$parted_output" | awk '
  / 3 / {part = $4}
  /^Disk \/dev\/sda:/ {total = $3}
  END {print part, total}
')

# Trim 's' suffix
PARTITION_SIZE_NUM="${PARTITION_SIZE%s}"
TOTAL_SIZE_NUM="${TOTAL_SIZE%s}"

if [[ "$PARTITION_SIZE_NUM" -lt "$TOTAL_SIZE_NUM" ]]; then
  echo "[$(date)] Resizing partition /dev/sda3..."
  parted --fix --script /dev/sda resizepart 3 100%
else
  echo "[$(date)] Partition /dev/sda3 is already at full size. Skipping."
fi

if [[ "$(pvresize --test /dev/sda3 2>&1)" != *"successfully resized"* ]]; then
  echo "[$(date)] Resizing physical volume..."
  pvresize /dev/sda3
else
  echo "[$(date)] Physical volume is already resized. Skipping."
fi

LV_SIZE=$(lvdisplay --units M /dev/ubuntu-vg/ubuntu-lv | grep "LV Size" | awk '{print $3}' | tr -d 'MiB')
PE_SIZE=$(vgdisplay --units M /dev/ubuntu-vg | grep "PE Size" | awk '{print $3}' | tr -d 'MiB')
CURRENT_LE=$(lvdisplay /dev/ubuntu-vg/ubuntu-lv | grep "Current LE" | awk '{print $3}')

USED_SPACE=$(echo "$CURRENT_LE * $PE_SIZE" | bc)
FREE_SPACE=$(echo "$LV_SIZE - $USED_SPACE" | bc)

if (($(echo "$FREE_SPACE > 0" | bc -l))); then
  echo "[$(date)] Resizing logical volume..."
  lvresize -rl +100%FREE /dev/ubuntu-vg/ubuntu-lv
else
  echo "[$(date)] Logical volume is already fully extended. Skipping."
fi

šŸ’” Highlights

  • āœ… Uses parted with --script to avoid prompts.
  • āœ… Automatically fixes GPT mismatch warnings with --fix.
  • āœ… Calculates exact available space using lvdisplay and vgdisplay, with bc for floating point math.
  • āœ… Extends the partition, PV, and LV only when needed.
  • āœ… Logs everything to /var/log/resize_disk.log.

🚨 Gotchas

  • Your disk must already use LVM. This script assumes you’re resizing /dev/ubuntu-vg/ubuntu-lv, the default for Ubuntu server installs.
  • You must use a Vagrant box that supports VirtualBox’s disk resizing—thankfully, boxen/ubuntu-24.04 does.
  • If your LVM setup is different, you’ll need to adapt device paths.

šŸ” Automation FTW

Calling this script as a provisioner means I never have to think about disk space again during development. One less yak to shave.

Feel free to steal this setup, adapt it to your team, or improve it and send me a patch. Or better yet—don’t wait until your filesystem runs out of space at 3 AM.

🧪 GitHub Actions and Environment Variables: Static vs. Dynamic Smackdown

Let’s talk about environment variables in GitHub Actions — those little gremlins that either make your CI/CD run silky smooth or throw a wrench in your perfectly crafted YAML.

If you’ve ever squinted at your pipeline and wondered, ā€œWhere the heck should I declare this ANSIBLE_CONFIG thing so it doesn’t vanish into the void between steps?ā€, you’re not alone. I’ve been there. I’ve screamed at $GITHUB_ENV. I’ve misused export. I’ve over-engineered echo. But fear not, dear reader — I’ve distilled it down so you don’t have to.

In this post, we’ll look at the right ways (and a few less right ways) to set environment variables — and more importantly, when to use static vs dynamic approaches.


🧊 Static Variables: Set It and Forget It

Got a variable like ANSIBLE_STDOUT_CALLBACK=yaml that’s the same every time? Congratulations, you’ve got yourself a static variable! These are the boring, predictable, low-maintenance types that make your CI life a dream.

āœ… Best Practice: Job-Level env

If your variable is static and used across multiple steps, this is the cleanest, classiest, and least shouty way to do it:

jobs:
  my-job:
    runs-on: ubuntu-latest
    env:
      ANSIBLE_CONFIG: ansible.cfg
      ANSIBLE_STDOUT_CALLBACK: yaml
    steps:
      - name: Use env vars
        run: echo "ANSIBLE_CONFIG is $ANSIBLE_CONFIG"

Why it rocks:

  • šŸ‘€ Super readable
  • šŸ“¦ Available in every step of the job
  • 🧼 Keeps your YAML clean — no extra echo commands, no nonsense

Unless you have a very specific reason not to, this should be your default.


šŸŽ© Dynamic Variables: Born to Be Wild

Now what if your variables aren’t so chill? Maybe you calculate something in one step and need to pass it to another — a file path, a version number, an API token from a secret backend ritual…

That’s when you reach for the slightly more… creative option:

šŸ”§ $GITHUB_ENV to the rescue

- name: Set dynamic environment vars
  run: |
    echo "BUILD_DATE=$(date +%F)" >> $GITHUB_ENV
    echo "RELEASE_TAG=v1.$(date +%s)" >> $GITHUB_ENV

- name: Use them later
  run: echo "Tag: $RELEASE_TAG built on $BUILD_DATE"

What it does:

  • Persists the variables across steps
  • Works well when values are calculated during the run
  • Makes you feel powerful

šŸŖ„ Fancy Bonus: Heredoc Style

If you like your YAML with a side of Bash wizardry:

- name: Set vars with heredoc
  run: |
    cat <<EOF >> $GITHUB_ENV
    FOO=bar
    BAZ=qux
    EOF

Because sometimes, you just want to feel fancy.


šŸ˜µā€šŸ’« What Not to Do (Unless You Really Mean It)

- name: Set env with export
  run: |
    export FOO=bar
    echo "FOO is $FOO"

This only works within that step. The minute your pipeline moves on, FOO is gone. Poof. Into the void. If that’s what you want, fine. If not, don’t say I didn’t warn you.


🧠 TL;DR – The Cheat Sheet

ScenarioBest Method
Static variable used in all stepsenv at the job level āœ…
Static variable used in one stepenv at the step level
Dynamic value needed across steps$GITHUB_ENV āœ…
Dynamic value only needed in one stepexport (but don’t overdo it)
Need to show off with Bash skillscat <<EOF >> $GITHUB_ENV šŸ˜Ž

🧪 My Use Case: Ansible FTW

In my setup, I wanted to use:

ANSIBLE_CONFIG=ansible.cfg
ANSIBLE_STDOUT_CALLBACK=yaml

These are rock-solid, boringly consistent values. So instead of writing this in every step:

- name: Set env
  run: |
    echo "ANSIBLE_CONFIG=ansible.cfg" >> $GITHUB_ENV

I now do this:

jobs:
  deploy:
    runs-on: ubuntu-latest
    env:
      ANSIBLE_CONFIG: ansible.cfg
      ANSIBLE_STDOUT_CALLBACK: yaml
    steps:
      ...

Cleaner. Simpler. One less thing to trip over when I’m debugging at 2am.


šŸ’¬ Final Thoughts

Environment variables in GitHub Actions aren’t hard — once you know the rules of the game. Use env for the boring stuff. Use $GITHUB_ENV when you need a little dynamism. And remember: if you’re writing export in step after step, something probably smells.

Got questions? Did I miss a clever trick? Want to tell me my heredoc formatting is ugly? Hit me up in the comments or toot at me on Mastodon.


āœļø Posted by Amedee, who loves YAML almost as much as dancing polskas.
šŸ’„ Because good CI is like a good dance: smooth, elegant, and nobody falls flat on their face.
šŸŽ» Scheduled to go live on 20 August — just as Boombalfestival kicks off. Because why not celebrate great workflows and great dances at the same time?

Safer Commands with argv in Ansible: Pros, Cons, and Real Examples

When using Ansible to automate tasks, the command module is your bread and butter for executing system commands. But did you know that there’s a safer, cleaner, and more predictable way to pass arguments? Meet argv—an alternative to writing commands as strings.

In this post, I’ll explore the pros and cons of using argv, and I’ll walk through several real-world examples tailored to web servers and mail servers.


Why Use argv Instead of a Command String?

āœ… Pros

  • Avoids Shell Parsing Issues: Each argument is passed exactly as intended, with no surprises from quoting or spaces.
  • More Secure: No shell = no risk of shell injection.
  • Clearer Syntax: Every argument is explicitly defined, improving readability.
  • Predictable: Behavior is consistent across different platforms and setups.

āŒ Cons

  • No Shell Features: You can’t use pipes (|), redirection (>), or environment variables like $HOME.
  • More Verbose: Every argument must be a separate list item. It’s explicit, but more to type.
  • Not for Shell Built-ins: Commands like cd, export, or echo with redirection won’t work.

Real-World Examples

Let’s apply this to actual use cases.

šŸ”§ Restarting Nginx with argv

- name: Restart Nginx using argv
  hosts: amedee.be
  become: yes
  tasks:
    - name: Restart Nginx
      ansible.builtin.command:
        argv:
          - systemctl
          - restart
          - nginx

šŸ“¬ Check Mail Queue on a Mail-in-a-Box Server

- name: Check Postfix mail queue using argv
  hosts: box.vangasse.eu
  become: yes
  tasks:
    - name: Get mail queue status
      ansible.builtin.command:
        argv:
          - mailq
      register: mail_queue

    - name: Show queue
      ansible.builtin.debug:
        msg: "{{ mail_queue.stdout_lines }}"

šŸ—ƒļø Back Up WordPress Database

- name: Backup WordPress database using argv
  hosts: amedee.be
  become: yes
  vars:
    db_user: wordpress_user
    db_password: wordpress_password
    db_name: wordpress_db
  tasks:
    - name: Dump database
      ansible.builtin.command:
        argv:
          - mysqldump
          - -u
          - "{{ db_user }}"
          - -p{{ db_password }}
          - "{{ db_name }}"
          - --result-file=/root/wordpress_backup.sql

āš ļø Avoid exposing credentials directly—use Ansible Vault instead.


Using argv with Interpolation

Ansible lets you use Jinja2-style variables ({{ }}) inside argv items.

šŸ”„ Restart a Dynamic Service

- name: Restart a service using argv and variable
  hosts: localhost
  become: yes
  vars:
    service_name: nginx
  tasks:
    - name: Restart
      ansible.builtin.command:
        argv:
          - systemctl
          - restart
          - "{{ service_name }}"

šŸ•’ Timestamped Backups

- name: Timestamped DB backup
  hosts: localhost
  become: yes
  vars:
    db_user: wordpress_user
    db_password: wordpress_password
    db_name: wordpress_db
  tasks:
    - name: Dump with timestamp
      ansible.builtin.command:
        argv:
          - mysqldump
          - -u
          - "{{ db_user }}"
          - -p{{ db_password }}
          - "{{ db_name }}"
          - --result-file=/root/wordpress_backup_{{ ansible_date_time.iso8601 }}.sql

🧩 Dynamic Argument Lists

Avoid join(' '), which collapses the list into a single string.

āŒ Wrong:

argv:
  - ls
  - "{{ args_list | join(' ') }}"  # BAD: becomes one long string

āœ… Correct:

argv: ["ls"] + args_list

Or if the length is known:

argv:
  - ls
  - "{{ args_list[0] }}"
  - "{{ args_list[1] }}"

šŸ“£ Interpolation Inside Strings

- name: Greet with hostname
  hosts: localhost
  tasks:
    - name: Print message
      ansible.builtin.command:
        argv:
          - echo
          - "Hello, {{ ansible_facts['hostname'] }}!"


When to Use argv

āœ… Commands with complex quoting or multiple arguments
āœ… Tasks requiring safety and predictability
āœ… Scripts or binaries that take arguments, but not full shell expressions

When to Avoid argv

āŒ When you need pipes, redirection, or shell expansion
āŒ When you’re calling shell built-ins


Final Thoughts

Using argv in Ansible may feel a bit verbose, but it offers precision and security that traditional string commands lack. When you need reliable, cross-platform automation that avoids the quirks of shell parsing, argv is the better choice.

Prefer safety? Choose argv.
Need shell magic? Use the shell module.

Have a favorite argv trick or horror story? Drop it in the comments below.

šŸŽ£ The Curious Case of the Beg Bounty Bait — or: Licence to Phish

Not every day do I get an email from a very serious security researcher, clearly a man on a mission to save the internet — one vague, copy-pasted email at a time.

Here’s the message I received:

From: Peter Hooks <peterhooks007@gmail.com>
Subject: Security Vulnerability Disclosure

Hi Team,

I’ve identified security vulnerabilities in your app that may put users at risk. I’d like to report these responsibly and help ensure they are resolved quickly.

Please advise on your disclosure protocol, or share details if you have a Bug Bounty program in place.

Looking forward to your reply.

Best regards,
Peter Hooks

Right. Let’s unpack this.


🧯”Your App” — What App?

I’m not a company. I’m not a startup. I’m not even a garage-based stealth tech bro.
I run a personal WordPress blog. That’s it.

There is no ā€œapp.ā€ There are no ā€œusers at riskā€ (unless you count me, and IĢ·Ģ“Ģœā€™Ģ·Ģ‹Ģ m̪̓̓ ̹̓́a̸͙̽ḷ̵̿r̸͇̽eĢµĢˆĶ–a̶͖̋ḋ̵͓y̼̓̂ ̖̓͂b̶̠̋é̶̻y͇̓̈́oĢøĢ’Ģ£ń̸̦dĢ“Ģ†ĢŸ ̶͉͒s̶̀ͅa̶͗̔vĢ“ĶŠĶ™iĢµĢŠĶ–n̵͖̆g̸̔̔).


šŸ•µļøā€ā™‚ļø The Anatomy of a Beg Bounty Email

This little email ticks all the classic marks of what the security community affectionately calls a beg bounty — someone scanning random domains, finding trivial or non-issues, and fishing for a payout.

Want to see how common this is? Check out:


šŸ“® My (Admittedly Snarky) Reply

I couldn’t resist. Here’s the reply I sent:

Hi Peter,

Thanks for your email and your keen interest in my ā€œappā€ — spoiler alert: there isn’t one. Just a humble personal blog here.

Your message hits all the classic marks of a beg bounty reconnaissance email:

  • āœ… Generic ā€œHi Teamā€ greeting — because who needs names?
  • āœ… Vague claims of ā€œsecurity vulnerabilitiesā€ with zero specifics
  • āœ… Polite inquiry about a bug bounty program (spoiler: none here, James)
  • āœ… No proof, no details, just good old-fashioned mystery
  • āœ… Friendly tone crafted to reel in easy targets
  • āœ… Email address proudly featuring ā€œ007ā€ — very covert ops of you

Bravo. You almost had me convinced.

I’ll be featuring this charming little interaction in a blog post soon — starring you, of course. If you ever feel like upgrading from vague templates to actual evidence, I’m all ears. Until then, happy fishing!

Cheers,
Amedee


😢 No Reply

Sadly, Peter didn’t write back.

No scathing rebuttal.
No actual vulnerabilities.
No awkward attempt at pivoting.
Just… silence.


#crying
#missionfailed


šŸ›”ļø A Note for Fellow Nerds

If you’ve got a domain name, no matter how small, there’s a good chance you’ll get emails like this.

Here’s how to handle them:

  • Stay calm — most of these are low-effort probes.
  • Don’t pay — you owe nothing to random strangers on the internet.
  • Don’t panic — vague threats are just that: vague.
  • Do check your stuff occasionally for actual issues.
  • Bonus: write a blog post about it and enjoy the catharsis.

For more context on this phenomenon, don’t miss:


🧵 tl;dr

If your ā€œsecurity researcherā€:

  • doesn’t say what they found,
  • doesn’t mention your actual domain or service,
  • asks for a bug bounty up front,
  • signs with a Gmail address ending in 007

…it’s probably not the start of a beautiful friendship.


Got a similar email? Want help crafting a reply that’s equally professional and petty?
Feel free to drop a comment or reach out — I’ll even throw in a checklist.

Until then: stay patched, stay skeptical, and stay snarky. šŸ˜Ž

Creating 10 000 Random Files & Analyzing Their Size Distribution: Because Why Not? šŸ§šŸ’¾

Ever wondered what it’s like to unleash 10 000 tiny little data beasts on your hard drive? No? Well, buckle up anyway — because today, we’re diving into the curious world of random file generation, and then nerding out by calculating their size distribution. Spoiler alert: it’s less fun than it sounds. šŸ˜

Step 1: Let’s Make Some Files… Lots of Them

Our goal? Generate 10 000 files filled with random data. But not just any random sizes — we want a mean file size of roughly 68 KB and a median of about 2 KB. Sounds like a math puzzle? That’s because it kind of is.

If you just pick file sizes uniformly at random, you’ll end up with a median close to the mean — which is boring. We want a skewed distribution, where most files are small, but some are big enough to bring that average up.

The Magic Trick: Log-normal Distribution šŸŽ©āœØ

Enter the log-normal distribution, a nifty way to generate lots of small numbers and a few big ones — just like real life. Using Python’s NumPy library, we generate these sizes and feed them to good old /dev/urandom to fill our files with pure randomness.

Here’s the Bash script that does the heavy lifting:

#!/bin/bash

# Directory to store the random files
output_dir="random_files"
mkdir -p "$output_dir"

# Total number of files to create
file_count=10000

# Log-normal distribution parameters
mean_log=9.0  # Adjusted for ~68KB mean
stddev_log=1.5  # Adjusted for ~2KB median

# Function to generate random numbers based on log-normal distribution
generate_random_size() {
    python3 -c "import numpy as np; print(int(np.random.lognormal($mean_log, $stddev_log)))"
}

# Create files with random data
for i in $(seq 1 $file_count); do
    file_size=$(generate_random_size)
    file_path="$output_dir/file_$i.bin"
    head -c "$file_size" /dev/urandom > "$file_path"
    echo "Generated file $i with size $file_size bytes."
done

echo "Done. Files saved in $output_dir."

Easy enough, right? This creates a directory random_files and fills it with 10 000 files of sizes mostly small but occasionally wildly bigger. Don’t blame me if your disk space takes a little hit! šŸ’„

Step 2: Crunching Numbers — The File Size Distribution šŸ“Š

Okay, you’ve got the files. Now, what can we learn from their sizes? Let’s find out the:

  • Mean size: The average size across all files.
  • Median size: The middle value when sizes are sorted — because averages can lie.
  • Distribution breakdown: How many tiny files vs. giant files.

Here’s a handy Bash script that reads file sizes and spits out these stats with a bit of flair:

#!/bin/bash

# Input directory (default to "random_files" if not provided)
directory="${1:-random_files}"

# Check if directory exists
if [ ! -d "$directory" ]; then
    echo "Directory $directory does not exist."
    exit 1
fi

# Array to store file sizes
file_sizes=($(find "$directory" -type f -exec stat -c%s {} \;))

# Check if there are files in the directory
if [ ${#file_sizes[@]} -eq 0 ]; then
    echo "No files found in the directory $directory."
    exit 1
fi

# Calculate mean
total_size=0
for size in "${file_sizes[@]}"; do
    total_size=$((total_size + size))
done
mean=$((total_size / ${#file_sizes[@]}))

# Calculate median
sorted_sizes=($(printf '%s\n' "${file_sizes[@]}" | sort -n))
mid=$(( ${#sorted_sizes[@]} / 2 ))
if (( ${#sorted_sizes[@]} % 2 == 0 )); then
    median=$(( (sorted_sizes[mid-1] + sorted_sizes[mid]) / 2 ))
else
    median=${sorted_sizes[mid]}
fi

# Display file size distribution
echo "File size distribution in directory $directory:"
echo "---------------------------------------------"
echo "Number of files: ${#file_sizes[@]}"
echo "Mean size: $mean bytes"
echo "Median size: $median bytes"

# Display detailed size distribution (optional)
echo
echo "Detailed distribution (size ranges):"
awk '{
    if ($1 < 1024) bins["< 1 KB"]++;
    else if ($1 < 10240) bins["1 KB - 10 KB"]++;
    else if ($1 < 102400) bins["10 KB - 100 KB"]++;
    else bins[">= 100 KB"]++;
} END {
    for (range in bins) printf "%-15s: %d\n", range, bins[range];
}' <(printf '%s\n' "${file_sizes[@]}")

Run it, and voilĆ  — instant nerd satisfaction.

Example Output:

File size distribution in directory random_files:
---------------------------------------------
Number of files: 10000
Mean size: 68987 bytes
Median size: 2048 bytes

Detailed distribution (size ranges):
&lt; 1 KB         : 1234
1 KB - 10 KB   : 5678
10 KB - 100 KB : 2890
>= 100 KB      : 198

Why Should You Care? šŸ¤·ā€ā™€ļø

Besides the obvious geek cred, generating files like this can help:

  • Test backup systems — can they handle weird file size distributions?
  • Stress-test storage or network performance with real-world-like data.
  • Understand your data patterns if you’re building apps that deal with files.

Wrapping Up: Big Files, Small Files, and the Chaos In Between

So there you have it. Ten thousand random files later, and we’ve peeked behind the curtain to understand their size story. It’s a bit like hosting a party and then figuring out who ate how many snacks. šŸæ

Try this yourself! Tweak the distribution parameters, generate files, crunch the numbers — and impress your friends with your mad scripting skills. Or at least have a fun weekend project that makes you sound way smarter than you actually are.

Happy hacking! šŸ”„

How I Tamed Duplicity’s Buggy Versions — and Made Sure I Always Have a Backup šŸ›”ļøšŸ’¾

If you’re running Mail-in-a-Box like me, you might rely on Duplicity to handle backups quietly in the background. It’s a great tool — until it isn’t. Recently, I ran into some frustrating issues caused by buggy Duplicity versions. Here’s the story, a useful discussion from the Mail-in-a-Box forums, and a neat trick I use to keep fallback versions handy. Spoiler: it involves an APT hook and some smart file copying! šŸš€


The Problem with Duplicity Versions

Duplicity 3.0.1 and 3.0.5 have been reported to cause backup failures — a real headache when you depend on them to protect your data. The Mail-in-a-Box forum post ā€œSomething is wrong with the backupā€ dives into these issues with great detail. Users reported mysterious backup failures and eventually traced it back to specific Duplicity releases causing the problem.

Here’s the catch: those problematic versions sometimes sneak in during automatic updates. By the time you realize something’s wrong, you might already have upgraded to a buggy release. 😩


Pinning Problematic Versions with APT Preferences

One way to stop apt from installing those broken versions is to use APT pinning. Here’s an example file I created in /etc/apt/preferences/pin_duplicity.pref:

Explanation: Duplicity version 3.0.1* has a bug and should not be installed
Package: duplicity
Pin: version 3.0.1*
Pin-Priority: -1

Explanation: Duplicity version 3.0.5* has a bug and should not be installed
Package: duplicity
Pin: version 3.0.5*
Pin-Priority: -1

This tells apt to refuse to install these specific buggy versions. Sounds great, right? Except — it often comes too late. You could already have updated to a broken version before adding the pin.

Also, since Duplicity is installed from a PPA, older versions vanish quickly as new releases push them out. This makes rolling back to a known good version a pain. 😤


My Solution: Backing Up Known Good Duplicity .deb Files Automatically

To fix this, I created an APT hook that runs after every package operation involving Duplicity. It automatically copies the .deb package files of Duplicity from apt’s archive cache — and even from my local folder if I’m installing manually — into a safe backup folder.

Here’s the script, saved as /usr/local/bin/apt-backup-duplicity.sh:

#!/bin/bash
set -x

mkdir -p /var/backups/debs/duplicity

cp -vn /var/cache/apt/archives/duplicity_*.deb /var/backups/debs/duplicity/ 2>/dev/null || true
cp -vn /root/duplicity_*.deb /var/backups/debs/duplicity/ 2>/dev/null || true

And here’s the APT hook configuration I put in /etc/apt/apt.conf.d/99backup-duplicity-debs to run this script automatically after DPKG operations:

DPkg::Post-Invoke { "/usr/local/bin/apt-backup-duplicity.sh"; };

Use apt-mark hold to Lock a Working Duplicity Version šŸ”’

Even with pinning and local .deb backups, there’s one more layer of protection I recommend: freezing a known-good version with apt-mark hold.

Once you’ve confirmed that your current version of Duplicity works reliably, run:

sudo apt-mark hold duplicity

This tells apt not to upgrade Duplicity, even if a newer version becomes available. It’s a great way to avoid accidentally replacing your working setup with something buggy during routine updates.

🧠 Pro Tip: I only unhold and upgrade Duplicity manually after checking the Mail-in-a-Box forum for reports that a newer version is safe.

When you’re ready to upgrade, do this:

sudo apt-mark unhold duplicity
sudo apt update
sudo apt install duplicity

If everything still works fine, you can apt-mark hold it again to freeze the new version.


How to Use Your Backup Versions to Roll Back

If a new Duplicity version breaks your backups, you can easily reinstall a known-good .deb file from your backup folder:

sudo apt install --reinstall /var/backups/debs/duplicity/duplicity_<version>.deb

Replace <version> with the actual filename you want to roll back to. Because you saved the .deb files right after each update, you always have access to older stable versions — even if the PPA has moved on.


Final Thoughts

While pinning bad versions helps, having a local stash of known-good packages is a game changer. Add apt-mark hold on top of that, and you have a rock-solid defense against regressions. 🪨✨

It’s a small extra step but pays off hugely when things go sideways. Plus, it’s totally automated with the APT hook, so you don’t have to remember to save anything manually. šŸŽ‰

If you run Mail-in-a-Box or rely on Duplicity in any critical backup workflow, I highly recommend setting up this safety net.

Stay safe and backed up! šŸ›”ļøāœØ

🧱 Let’s Get Hard (Links): Deduplicating My Linux Filesystem with Hadori

File deduplication isn’t just for massive storage arrays or backup systems—it can be a practical tool for personal or server setups too. In this post, I’ll explain how I use hardlinking to reduce disk usage on my Linux system, which directories are safe (and unsafe) to link, why I’m OK with the trade-offs, and how I automated it with a simple monthly cron job using a neat tool called hadori.


šŸ”— What Is Hardlinking?

In a traditional filesystem, every file has an inode, which is essentially its real identity—the data on disk. A hard link is a different filename that points to the same inode. That means:

  • The file appears to exist in multiple places.
  • But there’s only one actual copy of the data.
  • Deleting one link doesn’t delete the content, unless it’s the last one.

Compare this to a symlink, which is just a pointer to a path. A hardlink is a pointer to the data.

So if you have 10 identical files scattered across the system, you can replace them with hardlinks, and boom—nine of them stop taking up extra space.


šŸ¤” Why Use Hardlinking?

My servers run a fairly standard Ubuntu install, and like most Linux machines, the root filesystem accumulates a lot of identical binaries and libraries—especially across /bin, /lib, /usr, and /opt.

That’s not a problem… until you’re tight on disk space, or you’re just a curious nerd who enjoys squeezing every last byte.

In my case, I wanted to reduce disk usage safely, without weird side effects.

Hardlinking is a one-time cost with ongoing benefits. It’s not compression. It’s not archival. But it’s efficient and non-invasive.


šŸ“ Which Directories Are Safe to Hardlink?

Hardlinking only works within the same filesystem, and not all directories are good candidates.

āœ… Safe directories:

  • /bin, /sbin – system binaries
  • /lib, /lib64 – shared libraries
  • /usr, /usr/bin, /usr/lib, /usr/share, /usr/local – user-space binaries, docs, etc.
  • /opt – optional manually installed software

These contain mostly static files: compiled binaries, libraries, man pages… not something that changes often.

āš ļø Unsafe or risky directories:

  • /etc – configuration files, might change frequently
  • /var, /tmp – logs, spools, caches, session data
  • /home – user files, temporary edits, live data
  • /dev, /proc, /sys – virtual filesystems, do not touch

If a file is modified after being hardlinked, it breaks the deduplication (the OS creates a copy-on-write scenario), and you’re back where you started—or worse, sharing data you didn’t mean to.

That’s why I avoid any folders with volatile, user-specific, or auto-generated files.


🧨 Risks and Limitations

Hardlinking is not magic. It comes with sharp edges:

  • One inode, multiple names: All links are equal. Editing one changes the data for all.
  • Backups: Some backup tools don’t preserve hardlinks or treat them inefficiently.
    āž¤ Duplicity, which I use, does not preserve hardlinks. It backs up each linked file as a full copy, so hardlinking won’t reduce backup size.
  • Security: Linking files with different permissions or owners can have unexpected results.
  • Limited scope: Only works within the same filesystem (e.g., can’t link / and /mnt if they’re on separate partitions).

In my setup, I accept those risks because:

  • I’m only linking read-only system files.
  • I never link config or user data.
  • I don’t rely on hardlink preservation in backups.
  • I test changes before deploying.

In short: I know what I’m linking, and why.


šŸ” What the Critics Say About Hardlinking

Not everyone loves hardlinks—and for good reasons. Two thoughtful critiques are:

The core arguments:

  • Hardlinks violate expectations about file ownership and identity.
  • They can break assumptions in software that tracks files by name or path.
  • They complicate file deletion logic—deleting one name doesn’t delete the content.
  • They confuse file monitoring and logging tools, since it’s hard to tell if a file is “new” or just another name.
  • They increase the risk of data corruption if accidentally modified in-place by a script that assumes it owns the file.

Why I’m still OK with it:

These concerns are valid—but mostly apply to:

  • Mutable files (e.g., logs, configs, user data)
  • Systems with untrusted users or dynamic scripts
  • Software that relies on inode isolation or path integrity

In contrast, my approach is intentionally narrow and safe:

  • I only deduplicate read-only system files in /bin, /sbin, /lib, /lib64, /usr, and /opt.
  • These are owned by root, and only changed during package updates.
  • I don’t hardlink anything under /home, /etc, /var, or /tmp.
  • I know exactly when the cron job runs and what it targets.

So yes, hardlinks can be dangerous—but only if you use them in the wrong places. In this case, I believe I’m using them correctly and conservatively.


⚔ Does Hardlinking Impact System Performance?

Good news: hardlinks have virtually no impact on system performance in everyday use.

Hardlinks are a native feature of Linux filesystems like ext4 or xfs. The OS treats a hardlinked file just like a normal file:

  • Reading and writing hardlinked files is just as fast as normal files.
  • Permissions, ownership, and access behave identically.
  • Common tools (ls, cat, cp) don’t care whether a file is hardlinked or not.
  • Filesystem caches and memory management work exactly the same.

The only difference is that multiple filenames point to the exact same data.

Things to keep in mind:

  • If you edit a hardlinked file, all links see that change because there’s really just one file.
  • Some tools (backup, disk usage) might treat hardlinked files differently.
  • Debugging or auditing files can be slightly trickier since multiple paths share one inode.

But from a performance standpoint? Your system won’t even notice the difference.


šŸ› ļø Tools for Hardlinking

There are a few tools out there:

  • fdupes – finds duplicates and optionally replaces with hardlinks
  • rdfind – more sophisticated detection
  • hardlink – simple but limited
  • jdupes – high-performance fork of fdupes

šŸ“Œ About Hadori

From the Debian package description:

This might look like yet another hardlinking tool, but it is the only one which only memorizes one filename per inode. That results in less memory consumption and faster execution compared to its alternatives. Therefore (and because all the other names are already taken) it’s called “Hardlinking DOne RIght”.

Advantages over other tools:

  • Predictability: arguments are scanned in order, each first version is kept
  • Much lower CPU and memory consumption compared to alternatives

This makes hadori especially suited for system-wide deduplication where efficiency and reliability matter.


ā±ļø How I Use Hadori

I run hadori once per month with a cron job. Here’s the actual command:

/usr/bin/hadori --verbose /bin /sbin /lib /lib64 /usr /opt

This scans those directories, finds duplicate files, and replaces them with hardlinks when safe.

And here’s the crontab entry I installed in the file /etc/cron.d/hadori:

@monthly root /usr/bin/hadori --verbose /bin /sbin /lib /lib64 /usr /opt

šŸ“‰ What Are the Results?

After the first run, I saw a noticeable reduction in used disk space, especially in /usr/lib and /usr/share. On my modest VPS, that translated to about 300–500 MB saved—not huge, but non-trivial for a small root partition.

While this doesn’t reduce my backup size (Duplicity doesn’t support hardlinks), it still helps with local disk usage and keeps things a little tidier.

And because the job only runs monthly, it’s not intrusive or performance-heavy.


🧼 Final Thoughts

Hardlinking isn’t something most people need to think about. And frankly, most people probably shouldn’t use it.

But if you:

  • Know what you’re linking
  • Limit it to static, read-only system files
  • Automate it safely and sparingly

…then it can be a smart little optimization.

With a tool like hadori, it’s safe, fast, and efficient. I’ve read the horror stories—and decided that in my case, they don’t apply.


āœ‰ļø This post was brought to you by a monthly cron job and the letters i-n-o-d-e.

šŸ” How I Accidentally Discovered Power Query

A few weeks ago, I was knee-deep in CSV files. Not the fun kind. These were automatically generated reports from Cisco IronPort, and they weren’t exactly what I’d call analysis-friendly. Think: dozens of columns wide, thousands of rows, with summary data buried in awkward corners.

I was trying to make sense of incoming mail categories—Spam, Clean, Malware—and the numbers that went with them. Naturally, I opened the file in Excel, intending to wrangle the data manually like I usually do. You know: transpose the table, delete some columns, rename a few headers, calculate percentages… the usual grunt work.

But something was different this time. I noticed the ā€œGet & Transformā€ section in Excel’s Data ribbon. I had clicked it before, but this time I gave it a real shot. I selected ā€œFrom Text/CSVā€, and suddenly I was in a whole new environment: Power Query Editor.


🤯 Wait, What Is Power Query?

For those who haven’t met it yet, Power Query is a powerful tool in Excel (and also in Power BI) that lets you import, clean, transform, and reshape data before it even hits your spreadsheet. It uses a language called M, but you don’t really have to write code—although I quickly did, of course, because I can’t help myself.

In the editor, every transformation step is recorded. You can rename columns, remove rows, change data types, calculate new columns—all through a clean interface. And once you’re done, you just load the result into Excel. Even better: you can refresh it with one click when the source file updates.


🧪 From Curiosity to Control

Back to my IronPort report. I used Power Query to:

  • Transpose the data (turn columns into rows),
  • Remove columns I didn’t need,
  • Rename columns to something meaningful,
  • Convert text values to numbers,
  • Calculate the percentage of each message category relative to the total.

All without touching a single cell in Excel manually. What would have taken 15+ minutes and been error-prone became a repeatable, refreshable process. I even added a ā€œPercentā€ column that showed something like 53.4%—formatted just the way I wanted.


šŸ¤“ The Geeky Bit (Optional)

I quickly opened the Advanced Editor to look at the underlying M code. It was readable! With a bit of trial and error, I started customizing my steps, renaming variables for clarity, and turning a throwaway transformation into a well-documented process.

This was the moment it clicked: Power Query is not just a tool; it’s a pipeline.


šŸ’” Lessons Learned

  • Sometimes it pays to explore what’s already in the software you use every day.
  • Excel is much more powerful than most people realize.
  • Power Query turns tedious cleanup work into something maintainable and even elegant.
  • If you do something in Excel more than once, Power Query is probably the better way.

šŸŽÆ What’s Next?

I’m already thinking about integrating this into more of my work. Whether it’s cleaning exported logs, combining reports, or prepping data for dashboards, Power Query is now part of my toolkit.

If you’ve never used it, give it a try. You might accidentally discover your next favorite tool—just like I did.


Have you used Power Query before? Let me know your tips or war stories in the comments!