Troubleshooting a SSL Performance Bottleneck Using JFR#
At a certain point in time, one instance of our microservice suddenly experienced a dramatic spike in CPU usage:
Simultaneously, we noticed an unusual number of database connections being established:
Interestingly, other instances weren’t showing this behavior at all.
Root Cause Investigation#
Given the high number of database connections, our initial hypothesis was that the database might be performing poorly. However, when we examined the SQL statistics during this timeframe, the database performance looked perfectly normal:
The hot SQL queries from our microservice during this period were executing without any noticeable delays. So what could be causing this issue? We considered several possibilities: garbage collection overhead, safepoint operations, or prolonged lock acquisition times (for more details, check out: Java Monitoring with JFR Complete Guide). To get to the bottom of this, we decided to capture a JFR dump and analyze safepoint events, GC activities, and Monitor Blocked incidents.
Our first stop was examining GC behavior. We found that all garbage collections were Young GC events with acceptable pause times:
Next, we looked at safepoint operations. While we did capture some safepoint events, their pause durations weren’t particularly concerning:
Finally, we investigated Java Monitor Blocks and discovered something interesting - numerous instances of prolonged lock waiting:
The stack trace revealed that threads were getting blocked at: void sun.security.provider.SecureRandom.engineNextBytes(byte[])
. This pointed us to a classic Java performance issue related to random number generation. Looking at the relevant code in NativePRNG
:
// name of the *System* property, takes precedence over PROP_RNDSOURCE
private static final String PROP_EGD = "java.security.egd";
// name of the *Security* property
private static final String PROP_RNDSOURCE = "securerandom.source";
private static final boolean useLegacyDSA =
Boolean.parseBoolean(GetPropertyAction.privilegedGetProperty
("jdk.security.legacyDSAKeyPairGenerator"));
static final String URL_DEV_RANDOM = "file:/dev/random";
static final String URL_DEV_URANDOM = "file:/dev/urandom";
This involves two different approaches for generating random number seeds: “file:/dev/random” and “file:/dev/urandom”. You can specify which one to use by setting the system property java.security.egd
, with “file:/dev/random” being the default choice.
Understanding Random Number Generation and the Solution#
Prior to Linux 4.8:
From Linux 4.8 onwards:
Here’s the key insight: when the entropy pool runs low, the default “file:/dev/random” will block and wait, while “file:/dev/urandom” continues operating without blocking. For our use case, “file:/dev/urandom” provides sufficient randomness, so we can resolve this by setting the system property -Djava.security.egd=file:/dev/./urandom
to use urandom and eliminate the blocking behavior.