Return to site

🔥🗺️ FLAME GRAPHS: SPOT HOTSPOTS FAST (AND FIX THE RIGHT THING)

· devops,observability

🔸 TL;DR ✅

A flame graph turns thousands of profiler samples into a picture where width = where resources go. Learn to read it once, and you’ll stop guessing performance bottlenecks. ⚙️🔎

Section image

🔸 WHY FLAME GRAPHS CHANGED PROFILING 🚀

Before flame graphs, many profilers gave you a list of functions + percentages. Useful… but missing the story.

▪️ You could see “Function X = 18% CPU”

▪️ But not why it was called, or which parent path triggered it

▪️ Root-cause analysis became a mental puzzle 🧩

Flame graphs (invented by Brendan Gregg) brought the missing context: the full call stack, visually. 🔥

🔸 THE FOUNDATION: CALL STACKS + SAMPLING 🧠

A flame graph is built from statistical profiling (regular sampling):

▪️ A profiler snapshots the current call stack many times per second

▪️ Identical stacks get aggregated

▪️ The more a function appears in samples, the wider its block

Result: a “big picture” map of where your app spends CPU / allocates / waits. 🗺️

🔸 HOW TO READ A FLAME GRAPH (THE RULES) 📏

▪️ Each rectangle = a function (stack frame)

▪️ Y-axis = stack depth (bottom = callers, top = deeper calls / leaves)

▪️ X-axis = aggregated samples (⚠️ not a timeline). Width ≈ time/resource share because it reflects how often the frame appeared in samples.

▪️ Width = frequency in samples → usually correlates with cost

▪️ Colors often just help separation (unless your tool says otherwise) 🎨

🔸 HOW TO SPOT A REAL ISSUE 👀🔥

Look for hotspots: wide “flames” / broad plateaus, especially near the top.

▪️ A wide top block = something runs a lot (hot)

▪️ A wide parent + wide tall stack above = one expensive execution path dominates

▪️ Many thin towers = usually noise / low impact

👉 The goal: find disproportionate width for what should be “normal work”.

🔸 PRACTICAL EXAMPLE (CPU HOT PATH) 🧪

Imagine you see a massive plateau like:

▪️ calculateComplexMatrix()

▪️ with wide children: matrixMultiplication(), vectorDotProduct(), allocateTemporary()

▪️ taking ~40% of the total width

That’s your sign: a big chunk of CPU cycles is spent on that path.

Optimization should target that function or its children. ⚙️

🔸 BEST PRACTICES ✅

▪️ Profile in production-like conditions (realistic load + data)

▪️ Start broad, then zoom into a hotspot 🔎

▪️ Treat flame graphs as the compass, then inspect code for the exact fix 🧭

🔸 TAKEAWAYS ✅

▪️ Flame graphs turn profiler noise into actionable hotspots

▪️ Width is your #1 signal

▪️ CPU graphs show “work”; off-CPU graphs show “waiting”

▪️ The best optimizations are the ones you can prove with before/after 🔥

#performance #flamegraph #java

Go further with Java certification:

Java👇

Spring👇

SpringBook👇

JavaBook👇