System Operations

Why You Should Pay Attention to PSI Instead of Just CPU Usage

November 26, 2025

Asked By BreezyPenguin42 On November 26, 2025

I've noticed an interesting difference in two Linux servers. Server A is running at nearly 100% CPU usage but has low latency and is processing requests quickly while encoding video. On the other hand, Server B is only at 40% CPU but is experiencing timeouts with API calls and lagging SSH connections. This situation made me realize that just looking at CPU graphs can be misleading. Server A may seem worse based on CPU percentage, but it's actually busy handling requests, whereas Server B is under pressure with tasks waiting for CPU time. It's common to see alerts that trigger when CPU usage exceeds 80% for over five minutes. However, CPU percentage alone doesn't indicate whether tasks are stuck; it merely reflects that cores are busy. Starting from Linux version 4.20, there's a feature called Pressure Stall Information (PSI) that provides better insight into how long tasks are stalled due to CPU, memory, or I/O issues. For instance, PSI can show that, in the last 60 seconds, tasks were stalled 5.23% of the time due to CPU unavailability. I've switched to using PSI for my observability project instead of load average, and it significantly reduced false alarms. I'm curious if anyone here is utilizing PSI in their production alerts.

5 Answers

Answered By ScalingSeeker987 On November 28, 2025

Why not just scale based on API latency instead?

LatencyLover88 - November 28, 2025

Latency won’t always improve with scaling, though. It’s sometimes better to scale before you notice latency issues popping up.

ThroughputThom - November 28, 2025

Good point, some tasks might not even depend on API responses, like video encoding.

DatabaseDweller99 - November 28, 2025

Exactly, if the bottleneck is in the database, just adding more servers can worsen latency due to more connections.

Answered By PracticalPam On November 27, 2025

I'm glad you shared this! I honestly wasn’t aware of PSI before. It seems really useful for alerting, but it definitely requires a deeper look into the overall system behavior.

Answered By AnalyticalAndy On November 27, 2025

This example is a bit odd because Server A would likely perform better with extra CPU resources. Just scaling up Server B won't help—it's more of a software problem. So, in relation to your CPU usage point, the anecdote seems a bit off.

DataDrivenDylan - November 28, 2025

That's a fair point! Server A maxing out isn’t great, and just looking at CPU stats doesn’t necessarily mean you should scale without considering the workload and the underlying issues.

Answered By BusyBeeBill On November 27, 2025

Yeah, we've started tracking CPU, memory, and disk stall metrics in our monitoring. Our time is stretched thin, though, so fixing the foundational problems isn't happening right now.

Answered By TechieTom123 On November 27, 2025

PSI is great to include as an additional signal, but hitting 100% CPU utilization doesn't usually end well. If your system is constantly switching contexts or experiencing stalled I/O threads, you can lead to cascading failures with just one more request pushing it over the edge. If a process is timing out while only using 40% CPU, it indicates issues elsewhere. Things like single-threaded processes pegged at 100% or slow I/O could be behind those delays. It's critical to assess the overall health of the system based on multiple signals rather than just relying on CPU.

CuriousCoder56 - November 28, 2025

I totally agree! I don’t think anyone is saying to ignore CPU usage altogether, but just focusing on that metric can be misleading. PSI gives a clearer picture of task stalling which is definitely more helpful in diagnosing issues.

SysAdminSally77 - November 28, 2025

Exactly! If you're running at capacity and seeing heavy context switching, that's already a sign that your system is in trouble. It just proves that all signals need to be observed together.

Why You Should Pay Attention to PSI Instead of Just CPU Usage

5 Answers

Related Questions

Can't Load PhpMyadmin On After Server Update

Redirect www to non-www in Apache Conf

How To Check If Your SSL Cert Is SHA 1

Windows TrackPad Gestures

LEAVE A REPLY Cancel reply