🚀 Executive Summary

TL;DR: Graviton EC2 instances can exhibit slow single-thread performance because the Linux scheduler sometimes incorrectly assigns bursty, latency-critical tasks to slower Efficiency-cores (E-cores) instead of Performance-cores (P-cores). This issue is resolved by explicitly pinning such processes to P-cores using methods like systemd’s CPUAffinity directive, ensuring consistent high performance.

🎯 Key Takeaways

  • Modern hybrid CPUs, including Graviton processors, utilize both high-speed Performance-cores (P-cores) for demanding tasks and lower-power Efficiency-cores (E-cores) for background processes.
  • The Linux kernel scheduler can misplace bursty, single-threaded workloads onto E-cores, leading to unexpected high latency and performance degradation for critical applications like Redis.
  • Using `lscpu –extended` is essential to diagnose CPU core topology, identifying the clock speed differences between P-cores and E-cores on an EC2 instance.
  • The `systemd` `CPUAffinity` directive is the recommended permanent, infrastructure-as-code friendly solution to consistently pin specific services to P-cores, surviving reboots and service restarts.

m8azn single-thread performance tops EC2 benchmarks

Seeing bizarrely slow single-thread performance on your new Graviton EC2 instances? The culprit is likely the CPU scheduler getting confused by performance and efficiency cores. Here’s how to diagnose the issue and implement a fix.

That Time Our Shiny New Graviton EC2s Were Slower Than My Laptop

I still remember the 2 AM PagerDuty alert. We’d just migrated a critical Redis cache workload from an older x86 `m5.large` to a shiny new ARM-based `m7g.large` instance. The migration was supposed to be a huge cost and performance win. Instead, latency was through the roof. My junior engineer, bless his heart, was frantically checking everything: network, disk I/O, memory. All the CloudWatch graphs looked flat and healthy, yet our application was timing out. He was convinced the ARM architecture was a bust. But I’d been burned by something similar before, and it had nothing to do with ARM being “slow.”

So, What’s Actually Going On? The P-Core vs. E-Core Dilemma

This isn’t an “ARM vs x86” problem. This is a “modern CPUs are complicated” problem. The latest Graviton processors (and many modern Intel chips, for that matter) use a hybrid architecture with two types of cores:

  • Performance-cores (P-cores): These are the beasts. They run at a high clock speed and are designed for heavy, single-threaded tasks.
  • Efficiency-cores (E-cores): These are slower, lower-power cores designed for background tasks and improving multi-core throughput without melting the server.

The Linux kernel scheduler is supposed to be smart enough to place demanding tasks on the P-cores and background noise on the E-cores. But sometimes, especially with a bursty, single-threaded process that idles and then wakes up, the scheduler gets it wrong. It sees a “sleeping” process and parks it on a slow E-core to save power. When that process suddenly needs to do real work, it’s stuck in the slow lane, and your application latency goes to the moon.

You can see this topology for yourself. SSH into your instance and run lscpu --extended. You’ll see a list of CPUs and their clock speeds, and the difference will be obvious.

# Example output might show a clear difference in MHz
CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE MAX_MHZ    MIN_MHZ
0   0    0      0    0:0:0:0       yes    3200.0000  1000.0000  <-- P-core
1   0    0      1    1:1:1:0       yes    3200.0000  1000.0000  <-- P-core
...
8   0    0      8    8:8:8:1       yes    2400.0000  800.0000   <-- E-core
9   0    0      9    9:9:9:1       yes    2400.0000  800.0000   <-- E-core

Pro Tip: Never assume a vCPU is a vCPU. Before you roll out a new instance family to production, pop a shell on one and run lscpu --extended. Understanding the hardware topology you’re running on can save you a world of pain later.

The Fixes: From Emergency Patch to Permanent Solution

Okay, so we know the problem. Our critical `redis-server` process is getting stuck on a slow core. How do we fix it? Here are three ways, from the quick-and-dirty to the clean-and-permanent.

1. The Quick Fix: `taskset` to the Rescue

It’s 2 AM, the site is slow, and you just need it to work right now. This is where taskset comes in. It’s a command-line utility that lets you manually set the CPU affinity of a running process.

First, find the Process ID (PID) of your slow application:

pgrep redis-server
# Let's say it returns 1234

Next, use taskset to “pin” that process to a known P-core (like CPU 0 from our example above).

# Pin PID 1234 to only run on CPU 0
sudo taskset -pc 0 1234

The -p flag is for an existing PID, and -c specifies the CPU list. Instantly, your process is moved to the fast lane. The incident is resolved. But remember, this is a temporary fix. If the service restarts, it will forget this affinity, and you’ll be right back where you started.

2. The Permanent Fix: Modify the `systemd` Service

Waking up at 2 AM is no fun. Let’s make this permanent and infrastructure-as-code friendly. Most services on modern Linux are managed by `systemd`. We can edit the service’s unit file to tell it which CPUs it’s allowed to run on every single time it starts.

Find the service file for your application (e.g., `redis.service`). You can use systemctl status redis-server to find its location. Then, edit the file (or better, create an override file).

sudo systemctl edit redis-server.service

In the editor, add the CPUAffinity directive under the [Service] section. You can specify a range of your P-cores.

[Service]
CPUAffinity=0-7

Save the file, then reload the systemd daemon and restart your service to apply the change.

sudo systemctl daemon-reload
sudo systemctl restart redis-server.service

Now, your Redis process will only ever run on the fast P-cores you specified, surviving reboots and service restarts. This is the correct, idempotent way to solve the problem.

3. The ‘Nuclear’ Option: Disable the E-Cores Entirely

Sometimes you have a complex application with many short-lived processes, and you can’t pin all of them. Or maybe you’re just fed up and want to guarantee nothing ever runs on a slow core again. In that case, you can disable the E-cores at the kernel level.

This is a drastic measure. You’re effectively paying for cores you’re not going to use, but for some latency-critical workloads, it’s a valid trade-off. You can do this by setting a kernel boot parameter.

On an Amazon Linux 2023 or similar system with GRUB, you would edit /etc/default/grub and add maxcpus=N to the GRUB_CMDLINE_LINUX variable, where `N` is the number of P-cores you want to keep active.

# In /etc/default/grub
GRUB_CMDLINE_LINUX="... maxcpus=8"

After editing, you need to update your grub configuration and reboot the instance.

sudo grub2-mkconfig -o /boot/grub2/grub.cfg
sudo reboot

When the machine comes back up, lscpu will only show the first 8 (P-core) CPUs. The E-cores will be completely offline. The problem is gone, but so is some of your instance’s potential multi-core throughput.

Solution Pros Cons
1. `taskset` Immediate, great for emergencies, no restart required. Temporary, doesn’t survive restarts, manual process.
2. `systemd` `CPUAffinity` Permanent, declarative, survives restarts, can be managed with IaC. Requires a service restart, specific to `systemd`.
3. Kernel Boot Params System-wide, guarantees no process can use E-cores. Requires a full reboot, wastes CPU resources you pay for, reduces total throughput.

In the end, we went with solution #2 for our Redis server. It’s the cleanest approach that solves the problem for the specific workload without kneecapping the entire server. The lesson here is that modern hardware is powerful but complex. Don’t just trust the defaults; sometimes you need to get your hands dirty and tell the scheduler exactly what you want.

Darian Vance - Lead Cloud Architect

Darian Vance

Lead Cloud Architect & DevOps Strategist

With over 12 years in system architecture and automation, Darian specializes in simplifying complex cloud infrastructures. An advocate for open-source solutions, he founded TechResolve to provide engineers with actionable, battle-tested troubleshooting guides and robust software alternatives.


🤖 Frequently Asked Questions

âť“ Why might my Graviton EC2 instance show slow single-thread performance despite healthy CloudWatch metrics?

This often occurs because the Linux scheduler incorrectly places latency-critical, single-threaded tasks onto slower Efficiency-cores (E-cores) instead of the faster Performance-cores (P-cores), leading to unexpected application latency.

âť“ How do the different solutions (`taskset`, `systemd CPUAffinity`, kernel boot parameters) compare for fixing this core affinity issue?

`taskset` provides an immediate, temporary fix for a running process. `systemd CPUAffinity` offers a permanent, declarative, service-specific solution. Kernel boot parameters (e.g., `maxcpus`) provide a system-wide fix by disabling E-cores entirely, but this wastes CPU resources and reduces total throughput.

âť“ What is a common implementation pitfall when migrating critical workloads to new EC2 instance families with hybrid cores?

A common pitfall is assuming all vCPUs are equal. Always use `lscpu –extended` to verify the underlying hardware topology and distinguish between P-cores and E-cores before deploying latency-sensitive applications to avoid performance surprises.

Leave a Reply

Discover more from TechResolve - SaaS Troubleshooting & Software Alternatives

Subscribe now to keep reading and get access to the full archive.

Continue reading