🚀 Executive Summary

TL;DR: Docker hosts frequently run out of disk space due to accumulated old, untagged, or unused images. This guide provides solutions ranging from immediate manual cleanup using `docker system prune` to robust, automated scheduling with systemd timers to prevent future alerts.

🎯 Key Takeaways

  • Docker’s layered file system retains old, untagged images (dangling images) and unreferenced base images, causing `/var/lib/docker` to consume excessive disk space.
  • The `docker system prune -a -f` command is a powerful, aggressive tool for immediate cleanup, removing stopped containers, unused networks, and all unused Docker images.
  • Automating cleanup with systemd timers (using a service and timer file) is the preferred method over cron jobs, offering robust logging and `Persistent=true` to ensure cleanup runs even if the machine was off during the scheduled time.
  • For high-churn environments like CI/CD runners, `docker image prune -a -f –filter “until=Xh”` provides time-based aggressive cleanup, but must be used with extreme caution on critical production hosts.

Automated docker image clean up on Docker Host. What do you do?

Tired of disk space alerts from forgotten Docker images? Learn the practical, in-the-trenches methods for automated Docker image cleanup, from the quick one-liner to a robust systemd timer setup.

So Your Docker Host Ran Out of Space Again? Let’s Talk Cleanup.

I still remember the 2:00 AM PagerDuty alert. “CRITICAL: Disk Usage > 95% on ci-runner-03”. I rolled over, grabbed my laptop, and SSH’d in, my eyes blurry. My first thought was log files, always the log files. But `du -sh *` told a different story. The culprit was `/var/lib/docker`. It was holding hundreds of gigabytes of… stuff. Old, untagged, long-forgotten images from a dozen abandoned feature branches. We’ve all been there. You set up a host, things are running great, and then death by a thousand layers. It’s a rite of passage, but it’s one we can easily automate our way out of.

First, Why Does This Even Happen?

Before we jump to the fix, let’s understand the disease. Every time you build a new version of an image (e.g., `my-app:latest`), Docker doesn’t just overwrite the old one. It creates a new image and simply retags `latest` to point to it. The old image is still there, now “untagged,” just sitting there taking up space. These are often called dangling images. Add to that any base images you pulled for a build but are no longer referenced, and you get a bloated host fast. It’s not a bug; it’s a feature of how Docker’s layered file system works, but it requires good hygiene.

The Solutions: From Quick Fix to Permanent Cure

I’ve seen a lot of approaches over the years. Some are elegant, some are brute force. Here are the three main strategies I recommend, depending on your situation.

1. The Quick & Dirty: docker system prune

This is your first line of defense. It’s the command you run when you get that disk space alert and need to fix it right now. Docker has a built-in utility that’s pretty powerful.

To get rid of just the dangling images (the ones without a tag), you can run:

docker image prune

But let’s be real, you usually want to be more aggressive. The real hero is `system prune`.

# This will remove:
#  - all stopped containers
#  - all networks not used by at least one container
#  - all dangling images
#  - all dangling build cache

docker system prune -f

Pro Tip: The `-f` or `–force` flag is your friend here; it skips the “Are you sure?” prompt. If you want to get rid of all unused images (not just dangling ones), add the `-a` flag. But be careful! This will remove any images you might have pulled for later use that aren’t currently running in a container.

# WARNING: This is much more aggressive.
docker system prune -a -f

This is a great manual fix, but you’re not here for manual fixes. You’re here to solve this problem for good.

2. The Set-and-Forget: Automation with Cron or Systemd

This is the permanent, grown-up solution. We’re going to run the cleanup command on a schedule. You have two solid choices here: the old-school `cron` or the more modern `systemd` timers.

Using a Cron Job

It’s simple and it works. Edit the crontab for the root user (`sudo crontab -e`) and add a line like this:

# Run Docker prune every Sunday at 3:00 AM
0 3 * * 0 /usr/bin/docker system prune -a -f > /dev/null 2>&1

This is reliable, but it has its downsides. Logging is opaque (we’re dumping it to `/dev/null`), and if the machine is off at 3:00 AM, the job just doesn’t run. That’s where systemd timers come in.

Using a Systemd Timer (My Preferred Method)

This is a bit more setup but far more robust. You create two files: a service file to define the job and a timer file to define the schedule.

1. Create the service file: /etc/systemd/system/docker-prune.service

[Unit]
Description=Prune unused Docker data

[Service]
Type=oneshot
ExecStart=/usr/bin/docker system prune -a -f

2. Create the timer file: /etc/systemd/system/docker-prune.timer

[Unit]
Description=Run docker-prune weekly

[Timer]
OnCalendar=weekly
Persistent=true

[Install]
WantedBy=timers.target

The `OnCalendar=weekly` is a nice shorthand, and `Persistent=true` means if the machine was off when it was supposed to run, it will run as soon as it boots up. That’s a huge win.

Now, enable and start the timer:

sudo systemctl enable docker-prune.timer
sudo systemctl start docker-prune.timer

You can check its status with `systemctl list-timers`. Now you have automated, logged, and reliable cleanup.

3. The ‘Scorched Earth’ Approach for CI Runners

Sometimes, especially on ephemeral CI/CD build agents, you need something even more aggressive. These machines build dozens of images a day, and waiting a week to clean up isn’t an option. For these, I use a more tailored filter-based approach.

The goal here is to remove any images that haven’t been used in a while, say, the last 24 hours, regardless of whether they are tagged or not. The `prune` command has a great `–filter` flag for this.

# WARNING: Use with extreme caution. This is for non-critical, high-churn hosts.
# This removes all images that haven't been used in the last 72 hours.
docker image prune -a -f --filter "until=72h"

Seriously, be careful. Running this on a production host like `prod-db-01` could accidentally remove the exact image your app needs for a restart or rollback if it hasn’t been used recently. This is for build agents that can be rebuilt from scratch without a second thought.

You can stick this command in a cron job or systemd timer that runs every night. It’s the ultimate solution for keeping build runners lean and mean.

Choosing Your Weapon

So what’s the right call? Here’s how I break it down.

Method Best For… Risk Level
1. Manual prune Emergency cleanup; one-off development machines. Low to Medium (with -a)
2. Systemd/Cron Most persistent servers (staging, UAT, even production). It’s the standard, reliable practice. Low
3. Filtered Prune High-churn, non-critical hosts like CI/CD runners or ephemeral test environments. High (if used on the wrong host)

At the end of the day, running out of disk space due to Docker bloat is an entirely preventable problem. Pick a strategy, automate it, and go back to worrying about more important things. Like why that pipeline is still failing.

Darian Vance - Lead Cloud Architect

Darian Vance

Lead Cloud Architect & DevOps Strategist

With over 12 years in system architecture and automation, Darian specializes in simplifying complex cloud infrastructures. An advocate for open-source solutions, he founded TechResolve to provide engineers with actionable, battle-tested troubleshooting guides and robust software alternatives.


🤖 Frequently Asked Questions

âť“ What causes Docker hosts to run out of disk space?

Docker hosts run out of disk space because the layered file system retains old, untagged images (dangling images) and unreferenced base images when new versions are built or pulled, leading to `/var/lib/docker` accumulating excessive data.

âť“ How do cron jobs and systemd timers compare for automating Docker cleanup?

Cron jobs are simpler but lack robust logging and won’t execute if the machine is off during the scheduled time. Systemd timers, while requiring more setup, offer better logging, `Persistent=true` for missed runs, and `OnCalendar` for flexible scheduling, making them more reliable and robust.

âť“ What is a common implementation pitfall when automating Docker image cleanup?

A common pitfall is using aggressive commands like `docker image prune -a -f –filter “until=Xh”` on production hosts. This can accidentally remove images critical for restarts or rollbacks if they haven’t been used recently, making it suitable only for high-churn, non-critical environments like CI/CD runners.

Leave a Reply

Discover more from TechResolve - SaaS Troubleshooting & Software Alternatives

Subscribe now to keep reading and get access to the full archive.

Continue reading