🚀 Executive Summary
TL;DR: Servers reporting ‘Disk Full’ despite ‘df -h’ showing free space often suffer from inode exhaustion, where the system lacks metadata pointers for new files. The solution involves using ‘df -i’ to diagnose, then either clearing excessive small files, re-architecting applications to use external data stores like Redis for sessions, or, for specific workloads, reformatting the filesystem with a higher inode density.
🎯 Key Takeaways
- Inode exhaustion, not data block usage, is a common cause of ‘Disk Full’ errors when ‘df -h’ indicates free space.
- The ‘df -i’ command is essential for diagnosing inode usage, revealing the percentage of used inodes on a filesystem.
- Permanent solutions typically involve re-architecting applications to store high volumes of small files (like sessions or cache) in dedicated databases (e.g., Redis or Memcached) rather than directly on the filesystem.
Your disk isn’t really full, your inodes are. A senior DevOps engineer explains why your server lies about free space and provides three real-world fixes to solve inode exhaustion for good.
“My Disk Has Space, But It’s Full?” A DevOps Guide to Inode Hell
It was 2:47 AM on a Tuesday. PagerDuty was screaming about `prod-web-04` being ‘Disk Full’. I SSH’d in, heart pounding, ran the reflexive `df -h` and… nothing. 70% free space. My first thought? “Great, the monitoring agent is broken.” My second thought, after the alerts kept firing and the site started throwing 500 errors, was that I was in for a very, very long night. This wasn’t a space problem; it was a metadata nightmare. A problem I see junior engineers trip over all the time.
So, What’s Actually Happening? The Library Analogy
Before we fix anything, you need to understand why this happens. Every Linux filesystem has two distinct parts:
- Data Blocks: These are the large chunks where the actual content of your files (the text, the image, the video) is stored. This is what `df -h` (the ‘h’ is for ‘human-readable’) shows you.
- Inodes: Think of these as the index cards in a library’s card catalog. Each file and directory has one. It doesn’t contain the data, but it contains all the metadata about it: who owns it, its permissions, and, most importantly, where its data blocks are located on the disk.
Your server is like a library. You might have plenty of empty shelves (data blocks), but if you run out of index cards (inodes), you can’t add any new books. The system sees it can’t create the metadata for a new file, so it throws a “No space left on device” error, even when there are gigabytes of free space. You can see this discrepancy by comparing the output of `df -h` with `df -i` (the ‘i’ is for ‘inodes’).
# This looks fine...
$ df -h /dev/sda1
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 100G 30G 70G 30% /
# ...but this is the real problem. 100% full!
$ df -i /dev/sda1
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/sda1 6553600 6553600 0 100% /
The usual culprit? An application generating millions of tiny files—think session files, cache entries, or job queue logs.
The Fixes: From Duct Tape to a New Foundation
Okay, theory’s over. The site is down and your boss is sending you question marks on Slack. Let’s get you back online.
Solution 1: The “Stop the Bleeding” Triage
This is the emergency fix. We need to find the directory that’s hoarding all the inodes and clear it out. We don’t have time for elegance; we need results now.
Run this command to count the number of files in each top-level directory. It can be slow, but it gets the job done.
# Find the directory with the most files/inodes
for d in /*; do echo "$d"; find "$d" -type f | wc -l; done
Often, you’ll find a path like `/var/tmp/sessions` or `/var/cache/app_cache` with millions of files. Once you’ve identified the culprit, you can carefully delete the files. BE CAREFUL. Don’t just `rm -rf *` a directory you don’t understand.
Pro Tip: Using `rm -rf *` on a directory with millions of files can be slow and overwhelm the shell. A safer and often faster way is to use `find`:
# Example: Deleting old PHP session files find /var/lib/php/sessions -type f -name 'sess_*' -delete
This will get your server breathing again, but you haven’t fixed the underlying problem. You’ve just applied a bandage.
Solution 2: The “Permanent” Architectural Fix
The triage bought you time. Now, do the real work. Why is your application creating so many files? This is where you put on your architect hat.
In my 2 AM incident, it was a legacy PHP application storing user sessions as individual files on the disk. With a spike in traffic, it was creating tens of thousands of session files an hour. The fix wasn’t to delete them faster; it was to stop creating them in the first place.
The permanent solution was to reconfigure the application to store sessions in a database better suited for this task, like Redis or Memcached. This involved:
- Spinning up a small ElastiCache for Redis instance.
- Changing one line in the application’s `php.ini` file to point session handling to Redis.
- Deploying the change and watching the file count in `/var/lib/php/sessions` drop to zero.
This is the right way. Investigate the source, find a better tool for the job, and re-architect that small piece of the system. Don’t let your filesystem be a poor man’s database.
Solution 3: The “Nuke and Pave” Option
Sometimes, you can’t change the application, or the server’s purpose is to handle millions of small files (like a mail server). In this case, the filesystem itself was configured incorrectly for the workload from the start.
This is your nuclear option: re-format the partition with a different inode ratio. When you create an `ext4` filesystem, there’s a default ratio of bytes-per-inode (often 16384). If you know you’ll be storing mostly small files, you can create a new filesystem with a lower ratio, giving you more inodes for the same amount of disk space.
Warning: This is a destructive operation. It will wipe all data on the target partition. You will need a solid backup and a maintenance window. Do not run this on your production database server `prod-db-01` without a plan.
# Example: Creating a new filesystem with one inode per 4096 bytes
# This gives you 4x the default number of inodes.
# DANGER: THIS WILL DESTROY ALL DATA ON /dev/sdb1
mkfs.ext4 -i 4096 /dev/sdb1
This is a last resort. It requires downtime, data migration, and a deep understanding of your storage needs. But for specific, high-volume, small-file workloads, it’s the only long-term solution.
Choosing Your Path
To make it simple, here’s how I decide which path to take.
| Solution | Best For | Downside |
|---|---|---|
| 1. Triage | Immediate emergencies. Getting the system back online RIGHT NOW. | The problem will come back. It’s a temporary patch, not a fix. |
| 2. Architectural Fix | 99% of cases. An application is misbehaving and using the filesystem as a database. | Requires code/configuration changes and a deeper understanding of the application stack. |
| 3. Nuke and Pave | Specialized servers where storing millions of small files is the intended purpose. | High-risk, requires downtime, and is difficult to undo. Massive overkill for most problems. |
So next time you get that “Disk Full” alert and `df -h` tells you everything is fine, don’t doubt your monitoring. Check your inodes with `df -i`. You’ve just stepped into one of the most classic SysAdmin “gotchas” there is. Now you know how to get out of it.
🤖 Frequently Asked Questions
âť“ Why does my server say ‘Disk Full’ when ‘df -h’ shows plenty of free space?
Your server is likely experiencing inode exhaustion. This means it has run out of available inodes (metadata pointers for files), even if data blocks are free. Use ‘df -i’ to confirm inode usage.
âť“ How does addressing inode exhaustion compare to simply adding more disk space?
Adding more disk space only increases data blocks, not inodes, so it will not resolve inode exhaustion. The solution requires either clearing existing files, re-architecting the application to reduce file creation, or reformatting the filesystem with a higher inode density.
âť“ What’s a common implementation pitfall when trying to clear files to fix inode exhaustion?
A common pitfall is using ‘rm -rf *’ on a directory with millions of files, which can overwhelm the shell and be extremely slow. A safer and often faster method is to use ‘find’ with the ‘-delete’ option, such as ‘find /var/lib/php/sessions -type f -name ‘sess_*’ -delete’.
Leave a Reply