I'm working on a full-stack Java framework called gzb-one and have been trying to benchmark its performance in a virtual machine. I've noticed significant fluctuations in the benchmark results that seem uncontrollable. For example, I ran stress tests using one thread and got about 700k+ QPS, and then on the next run, it dropped to around 600k+. It's been inconsistent across multiple tests.
My main concern is whether these fluctuations are caused by VM jitter. I'm looking for advice on how to stabilize the results during stress testing, particularly when both the server and the benchmarking tool are running in the same VM. Also, if anyone has access to a bare-metal setup, I would love to get a second opinion on the framework's performance data. You can view my current benchmark report on GitHub for more context.
2 Answers
Have you thought about renting two separate VMs from a cloud provider? This way, you can isolate the benchmarking environment from the rest of your processes, which might help you get more consistent results.
I often see developers use Java Microbenchmark Harness (JMH) for performance testing. It's worth checking out! Fluctuations can arise from warmup or caching effects, plus the GraalVM might behave differently than the traditional Hotspot VM. Make sure to look into JFR Events as well. However, if you feel JMH might be too high-level for your needs, I totally get that.

Thanks for the suggestion!
I did consider JMH, but since I’m benchmarking the whole network stack, I prefer wrk's Pipeline mode for a more genuine ‘black box’ throughput view. I suspect the issue might be more about JVM CPU scheduling rather than warm-up since I've already done a decent amount of warming up.