🚀 Executive Summary

TL;DR: Kubernetes storage issues, particularly full Persistent Volume Claims (PVCs), can lead to critical outages. The primary solutions involve leveraging StorageClass volume expansion capabilities via `kubectl patch` for quick fixes or proactively configuring expandable StorageClasses to prevent future emergencies.

🎯 Key Takeaways

  • Kubernetes storage abstracts physical disks using PersistentVolumeClaims (PVCs) as requests, PersistentVolumes (PVs) as fulfillment, and StorageClasses with CSI drivers for provisioning.
  • Live PVC expansion is achievable by patching the PVC to request more space, but only if the associated StorageClass has `allowVolumeExpansion: true`.
  • Proactive storage management involves defining StorageClasses with `allowVolumeExpansion: true` from the outset, simplifying future capacity adjustments.
  • For StorageClasses that do not support expansion, a manual data migration strategy is required, involving scaling down applications, creating a new larger PVC, copying data with a helper pod, and redeploying.

Best way to manage storage in your own k8s?

Tired of Kubernetes storage alerts at 3 AM? Learn the real-world, in-the-trenches strategies for managing and expanding Persistent Volume Claims (PVCs) without losing data or your mind.

So, Your Kubernetes Pod Ran Out of Disk Space. Now What?

I remember it like it was yesterday. 2:17 AM. My phone buzzes with a PagerDuty alert that makes my blood run cold: ‘PrometheusDown’. Then another: ‘GrafanaUnreachable’. Our entire observability stack for the main production cluster was blind. The cause? A tiny, forgotten log file on our Prometheus server had finally tipped the scales, filling its 100GB Persistent Volume Claim (PVC) to 100%. We were flying blind during a critical batch processing window. That night, fueled by lukewarm coffee, I learned a lesson that isn’t always clear in the official docs: managing storage in Kubernetes isn’t a “set it and forget it” affair. It’s an active, critical part of keeping the lights on.

The “Why”: Understanding the K8s Storage Disconnect

Before we dive into the fixes, you need to understand why this happens. Kubernetes is brilliant at abstracting infrastructure, and storage is no exception. Here’s the core concept you’re fighting against:

  • A PersistentVolumeClaim (PVC) is a request for storage by a user (or pod). You say, “I need 50Gi of fast storage.”
  • A PersistentVolume (PV) is the fulfillment of that request. It’s a piece of storage in the cluster that has been provisioned by an administrator or dynamically by a StorageClass.
  • The StorageClass and its underlying CSI (Container Storage Interface) driver are the magic that connects the request (PVC) to a real disk (like an AWS EBS volume or a GCE Persistent Disk).

The problem is that once that link is made and the pod is running, Kubernetes mostly considers its job done. Your application writes data, the disk fills up, and the pod crashes with a “No space left on device” error. K8s won’t automatically expand the disk for you unless you’ve explicitly configured it to do so. That’s where we, the engineers, come in.

The Fixes: From Duct Tape to a New Engine

Depending on your setup and how much time you have, there are a few ways to tackle this. I’ve used all three in different situations.

Solution 1: The Quick and Dirty (Live Expansion)

This is your go-to when production is down and management is breathing down your neck. The goal is to expand the existing PVC without taking the application offline for long. This only works if your StorageClass supports volume expansion.

First, check if your StorageClass allows it:

kubectl get sc your-storage-class-name -o yaml

Look for the line allowVolumeExpansion: true. If it’s there, you’re in luck. If it’s not, or it’s set to `false`, you can try to patch the StorageClass, but jump to Solution 2 or 3 if that’s not an option.

If expansion is allowed, you can simply patch the PVC to request more space. Let’s say we want to resize `prod-postgres-pvc` from 100Gi to 150Gi:

kubectl patch pvc prod-postgres-pvc -p '{"spec":{"resources":{"requests":{"storage":"150Gi"}}}}'

After a few moments, K8s will work with your cloud provider’s CSI driver to resize the underlying disk. The pod’s filesystem should then expand automatically. This is the ideal scenario.

Pro Tip: Sometimes, even with CSI expansion, the filesystem inside the container doesn’t know it has more space. You might have to `exec` into the pod and run a command like `resize2fs /dev/your-device` to make the OS aware of the new space. It’s a pain, but better than being down.

Solution 2: The Right Way (The Proactive Setup)

The best way to fix a problem is to prevent it. This involves setting up your StorageClasses correctly from the beginning. A proper StorageClass is the foundation of sane storage management.

Here’s an example of a good `StorageClass` manifest for AWS using the EBS CSI driver. The key is that `allowVolumeExpansion: true` line.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ebs-gp3-expandable
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
parameters:
  type: gp3
  fsType: ext4

By making this your default StorageClass, any new PVCs will be expandable from day one. This turns a late-night emergency into a simple `kubectl patch` command (as seen in Solution 1). This is the setup we have at TechResolve now, and it’s saved us countless headaches.

Solution 3: The ‘Nuclear’ Option (Data Migration)

What if your StorageClass doesn’t support expansion and you can’t change it? This happens with older setups or certain types of storage. In this case, you have to perform a manual data migration. It’s risky, requires downtime, and you should be extremely careful.

Here’s a high-level plan for migrating data from `old-pvc` to `new-larger-pvc`:

1. Scale Down: Scale your application deployment to 0 replicas to ensure no new data is written to the old volume. kubectl scale deployment/my-app --replicas=0
2. Create New PVC: Create a new, larger PVC manifest (e.g., `new-pvc.yaml`) and apply it. kubectl apply -f new-pvc.yaml
3. Deploy a Helper Pod: Create a temporary “helper” pod (a simple Alpine or Ubuntu image) that mounts both the old PVC and the new PVC at different paths (e.g., `/data-old` and `/data-new`).
4. Copy the Data: `exec` into the helper pod and use a reliable copy tool like `rsync` to move everything.

kubectl exec -it helper-pod -- /bin/sh
# Inside the pod:
rsync -avh --progress /data-old/ /data-new/
5. Update & Redeploy: Update your application’s deployment YAML to reference the new-larger-pvc instead of the old one. Apply the change and scale it back up. kubectl scale deployment/my-app --replicas=1
6. Cleanup: Once you’ve thoroughly verified the application is working correctly with the new volume, you can safely delete the helper pod, the old PVC, and its underlying PV.

Warning: I call this the nuclear option for a reason. There is a real risk of data loss or corruption if you’re not careful. Always try to snapshot the original volume before you start, if your provider supports it. Measure twice, cut once.

At the end of the day, Kubernetes gives you the tools, but it’s on us to build the right processes around them. Start with Solution 2, use Solution 1 for emergencies, and keep Solution 3 in your back pocket for that one legacy system that just won’t cooperate. Good luck out there.

Darian Vance - Lead Cloud Architect

Darian Vance

Lead Cloud Architect & DevOps Strategist

With over 12 years in system architecture and automation, Darian specializes in simplifying complex cloud infrastructures. An advocate for open-source solutions, he founded TechResolve to provide engineers with actionable, battle-tested troubleshooting guides and robust software alternatives.


🤖 Frequently Asked Questions

âť“ How do I expand an existing Kubernetes Persistent Volume Claim (PVC)?

To expand a PVC, first confirm its StorageClass supports `allowVolumeExpansion: true`. Then, use `kubectl patch pvc -p ‘{“spec”:{“resources”:{“requests”:{“storage”:”“}}}}’` to update the requested storage size. The underlying volume will be resized by the CSI driver.

âť“ What are the different strategies for managing and expanding Kubernetes storage?

The strategies include: 1) Quick live expansion by patching PVCs if the StorageClass allows it. 2) Proactive setup by configuring StorageClasses with `allowVolumeExpansion: true` for all new PVCs. 3) The ‘nuclear’ option of manual data migration, involving scaling down, creating a new larger PVC, copying data via a helper pod, and redeploying the application.

âť“ What should I do if the filesystem inside a Kubernetes pod doesn’t recognize expanded PVC space?

After a PVC expansion, the filesystem within the container might not automatically detect the new space. You may need to `exec` into the pod and manually run a command like `resize2fs /dev/your-device` (or equivalent for other filesystems) to make the operating system inside the pod aware of the increased volume size.

Leave a Reply

Discover more from TechResolve - SaaS Troubleshooting & Software Alternatives

Subscribe now to keep reading and get access to the full archive.

Continue reading