I recently faced a significant issue with Virtual Threads in JDK 25 that brought up a fairness problem. The bug originated not in our own service, but in a library we relied on, which performed heavy CPU work instead of the expected simple I/O operations. This caused a few Virtual Threads to hog their carrier threads, leading to a cascade of starvation for thousands of unrelated threads. Overall latency spiked, and the system became sluggish.
This situation highlighted how a small oversight in third-party code can severely impact the whole system. I suspect this could happen frequently where I/O-bound tasks inadvertently turn CPU-bound due to slow serialization, fallback logic, or inefficient retry loops. With platform threads, issues tend to be contained, but with Virtual Threads, the problem can spread more widely as they share the same carrier threads.
I've noticed that Go handles some of these issues with non-cooperative preemption, where long-running goroutines get preempted by the runtime. This acts as a safety net against those unexpected CPU-heavy tasks. Are there discussions or plans in the Loom project about implementing a similar non-cooperative preemptive scheduling to enhance the fairness of Virtual Threads during unforeseen CPU-heavy operations?
1 Answer
It's interesting that you bring this up! Just to clarify, Virtual Thread preemption is already non-cooperative. The issue you're highlighting relates more to time-sharing. We didn't activate it because we haven't encountered real-world examples demonstrating its necessity, unlike the scenario in Go. The theory is that time-sharing would only be useful under specific thread circumstances, and finding those conditions seems rare.

Got it! I'll prepare a detailed description and test case to share with the loom-dev team. It could be helpful for future improvements.