Skip to main content

Troubleshooting

Tackling a Mysterious JVM Safepoint Issue: A Journey from Problem to Solution

Tackling a Mysterious JVM Safepoint Issue: A Journey from Problem to Solution

·1004 words·5 mins
A deep dive into diagnosing and resolving a production JVM issue where applications would freeze during hourly log synchronization tasks. We explore safepoint analysis, JVM log output blocking, asynchronous logging implementation, and WebFlux optimization to achieve a complete solution.
Solving JVM Safepoint Delays: A Journey from EFS Integration to Async Logging

Solving JVM Safepoint Delays: A Journey from EFS Integration to Async Logging

·970 words·5 mins
An in-depth investigation into mysterious JVM safepoint delays after upgrading to Java 17 and implementing centralized log collection with AWS EFS. We discovered how file I/O blocking during log output can freeze entire JVM processes and solved it using async logging and proper WebFlux implementation.
MySQL Optimizer Statistics: Why Your Queries Choose the Wrong Index

MySQL Optimizer Statistics: Why Your Queries Choose the Wrong Index

·1600 words·8 mins
A deep dive into MySQL’s InnoDB optimizer statistics and how sampling inaccuracies can lead to poor index selection, causing dramatic performance differences between similar queries. Learn practical solutions to prevent slow SQL queries caused by optimizer misjudgments.
A Peculiar Bug Hunt: When Exceptions Lose Their Voice

A Peculiar Bug Hunt: When Exceptions Lose Their Voice

·1195 words·6 mins
A deep dive into a mysterious production issue where exception logs mysteriously disappeared, leading us through Arthas debugging, Log4j2 internals, and the discovery that an exception’s getMessage() method was itself throwing exceptions due to Guava-Guice version incompatibility.
A Hidden Production Issue Discovered Through SQL Optimization

A Hidden Production Issue Discovered Through SQL Optimization

·1101 words·6 mins
When our operations team brought us a complex SQL query that was taking forever to execute, we thought it was just a performance issue. Little did we know, this investigation would uncover a deeply hidden character encoding mismatch that had been silently causing full table scans in our production database.