🚀 Executive Summary
TL;DR: Linux servers can become unresponsive due to inode exhaustion, even with available disk space, as seen when a rogue process created millions of tiny files. This guide provides an automated Python script using psutil and os.statvfs to monitor inode usage across filesystems and send webhook alerts if a predefined threshold is exceeded, preventing such outages.
🎯 Key Takeaways
- Inode exhaustion is a silent server killer that can render a Linux system unresponsive, even with ample disk space, often caused by the creation of numerous small files.
- The solution involves a Python script utilizing the `psutil` library to enumerate disk partitions and `os.statvfs` to retrieve inode statistics for each mounted filesystem.
- The script calculates the percentage of used inodes and sends a formatted alert message to a configurable webhook URL (e.g., Slack, Teams) if the usage surpasses a defined threshold.
- Automation is achieved by scheduling the Python script to run periodically using a cron job, ensuring continuous monitoring and proactive detection of low inode availability.
- It is recommended to ignore temporary or virtual filesystems like `tmpfs`, `devtmpfs`, and `squashfs` in the monitoring process to reduce alert noise and focus on persistent storage.
Alert on Low Inodes availability on Linux Filesystems
Hey there, Darian here. Let’s talk about a silent server killer: inode exhaustion. A few years back, I had a production server go completely unresponsive. The disk space looked fine, maybe 60% full, but no one could write new files, not even the system itself. Turns out, we had run out of inodes because a rogue process had created millions of tiny session files. It was a painful lesson. We now monitor inode usage as religiously as we monitor disk space, and I’m going to show you how to set up an automated alert so you never have to face that kind of outage.
Prerequisites
- Python 3 installed on the target Linux machine.
- A way to receive alerts. In this guide, I’ll use a webhook URL (for Slack, Teams, etc.), but you could adapt it for email.
- Basic familiarity with running Python scripts and scheduling tasks.
The Guide: Setting Up Your Inode Monitor
Step 1: Project Setup and Dependencies
First, you’ll want to set up a dedicated directory for your script. I’ll skip the standard virtualenv creation steps since you probably have your own workflow for that. The key dependency we need is psutil, a fantastic cross-platform library for retrieving information on running processes and system utilization.
You can install it by running the command to install python packages, targeting ‘psutil’. For example, pip3 install psutil.
We’ll also need a way to handle secrets, like our webhook URL. I always use a simple config.env file for this. Just create a file with that name and add your webhook URL to it.
# This is the content for your config.env file
WEBHOOK_URL="https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
Step 2: The Python Monitoring Script
Now, let’s get to the core logic. Create a Python file, maybe call it check_inodes.py. The script will iterate through all disk partitions, check their inode usage, and fire an alert if any of them cross a defined threshold.
Here’s the complete script. I’ll break down what each part does below.
import os
import psutil
import requests
import json
def load_config():
"""Loads configuration from a config.env file."""
config = {}
try:
with open('config.env', 'r') as f:
for line in f:
if '=' in line:
key, value = line.strip().split('=', 1)
config[key] = value.strip('"')
except FileNotFoundError:
print("Error: config.env file not found.")
return None
return config
def send_alert(message):
"""Sends a formatted alert to a webhook."""
config = load_config()
if not config or 'WEBHOOK_URL' not in config:
print("Webhook URL not configured. Cannot send alert.")
return
webhook_url = config['WEBHOOK_URL']
headers = {'Content-Type': 'application/json'}
payload = {'text': message}
try:
response = requests.post(webhook_url, headers=headers, data=json.dumps(payload), timeout=10)
response.raise_for_status()
print(f"Alert sent successfully!")
except requests.exceptions.RequestException as e:
print(f"Error sending alert: {e}")
def check_inode_usage(threshold_percent):
"""
Checks inode usage for all local filesystems and alerts if usage exceeds the threshold.
"""
print(f"Checking inode usage with a threshold of {threshold_percent}%...")
alerts_to_send = []
# These are filesystems I typically ignore in production.
# You might want to customize this list.
filesystems_to_ignore = {'tmpfs', 'devtmpfs', 'squashfs'}
partitions = psutil.disk_partitions()
for p in partitions:
if p.fstype in filesystems_to_ignore:
continue
try:
# The 'df -i' equivalent using psutil
usage = psutil.disk_usage(p.mountpoint)
# Get inode information
statvfs = os.statvfs(p.mountpoint)
total_inodes = statvfs.f_files
free_inodes = statvfs.f_ffree
if total_inodes == 0:
continue # Skip if filesystem doesn't support inode counts
used_inodes = total_inodes - free_inodes
used_percent = (used_inodes / total_inodes) * 100
print(f"Filesystem: {p.mountpoint}, Inode Usage: {used_percent:.2f}%")
if used_percent > threshold_percent:
message = (f"🚨 *CRITICAL INODE ALERT* 🚨\n"
f"Filesystem `{p.mountpoint}` on server `my-server-name` "
f"has reached {used_percent:.2f}% inode usage.")
alerts_to_send.append(message)
except OSError as e:
# This can happen for things like CD-ROM drives with no media
print(f"Could not check {p.mountpoint}: {e}")
if alerts_to_send:
full_alert_message = "\n\n".join(alerts_to_send)
send_alert(full_alert_message)
else:
print("All filesystems are within the inode usage threshold.")
if __name__ == "__main__":
# I set my threshold to 85%. Adjust as needed.
INODE_THRESHOLD = 85
check_inode_usage(INODE_THRESHOLD)
Breaking it down:
load_config(): A simple helper to read ourconfig.envfile. This keeps our secrets out of the main script.send_alert(): This function takes a message, formats it into a JSON payload, and sends it to the webhook URL from our config. It’s built for Slack, but the JSON payload is generic enough for most services.check_inode_usage(): This is the heart of the script. It usespsutil.disk_partitions()to get a list of all mounted filesystems. Then, for each one, it usesos.statvfs()to get the inode statistics. We calculate the percentage of used inodes and compare it against ourthreshold_percent. If it’s over, we craft an alert message.
Pro Tip: Notice the
filesystems_to_ignoreset in the script. In my production setups, I always ignore temporary or virtual filesystems liketmpfsandsquashfsbecause they don’t represent persistent storage and can create a lot of noise. You should review the output ofdf -ion your system and customize this list.
Step 3: Scheduling the Check
An alert is only useful if it’s automated. The classic way to do this on Linux is with cron. You’ll want to add an entry to your crontab to run this script on a regular schedule. I find that running it once or twice a day is usually sufficient unless you’re on a very high-churn system.
A cron entry to run the script every day at 2 AM would look like this. Remember to navigate to the script’s directory first.
0 2 * * * python3 check_inodes.py
Make sure the script is executable and the user running the cron job has the necessary permissions to read filesystem stats and access the network for sending the alert.
Common Pitfalls
Here are a few places where I’ve stumbled in the past, so you can avoid them:
- Incorrect Webhook URL: I’ve spent more time than I’d like to admit debugging a script only to realize I had a typo in the webhook URL inside my
config.envfile. Double-check it! - Firewall Rules: The script needs to make an outbound HTTPS request. If your server has strict egress firewall rules, ensure it’s allowed to connect to your alerting service (e.g., hooks.slack.com).
- Permissions: The user running the script via cron needs permission to read the Python file and the
config.envfile. A simple permission denied error can stop the whole thing from working, and you won’t know until it’s too late.
Conclusion
And there you have it. This isn’t a massive, complex script, but it provides a huge amount of value and peace of mind. By automating inode monitoring, you’re proactively protecting your systems from a tricky, non-obvious failure mode. It’s a perfect example of how a little bit of DevOps automation can save you from a major headache down the road. Stay vigilant!
🤖 Frequently Asked Questions
❓ What problem does this script solve and how does it prevent server outages?
This script solves the problem of silent server outages caused by inode exhaustion, where a filesystem runs out of available inodes for new files despite having free disk space. It prevents outages by proactively monitoring inode usage on Linux filesystems using `psutil` and `os.statvfs`, and sending an alert via a webhook if usage exceeds a configured threshold, allowing administrators to intervene before a critical failure.
❓ How does this inode monitoring solution compare to simply using the ‘df -i’ command?
While `df -i` provides a manual snapshot of inode usage, this solution automates the check across all relevant filesystems, calculates usage percentages, and sends proactive alerts to a specified webhook. It also allows for ignoring specific filesystem types (`tmpfs`, `devtmpfs`, `squashfs`) to reduce noise, making it a robust, continuous, and unattended monitoring solution compared to a one-time command execution.
❓ What are the common pitfalls to avoid when implementing this inode monitoring script?
Common implementation pitfalls include an incorrect webhook URL in the `config.env` file, restrictive egress firewall rules preventing the script from connecting to the alerting service, and insufficient permissions for the user running the cron job to read the Python script, configuration file, or access filesystem statistics. Thoroughly checking these aspects is crucial for successful deployment and operation.
Leave a Reply