🔸 TL;DR
▪️ Use startupProbe for apps with slow or spiky boot times.
▪️ It delays liveness checks until your app has finished starting, preventing CrashLoopBackOff storms.
▪️ Keep readinessProbe for traffic gating and livenessProbe for hung-process recovery. Each probe has a different job. 💡
🔸 WHAT IT IS
▪️ A probe that tells K8s: “⏳ Don’t kill me yet—I’m still starting.”
▪️ While startupProbe is failing, livenessProbe is ignored; once it succeeds, liveness takes over.
🔸 WHEN TO USE IT
▪️ Heavy frameworks (e.g., Spring Boot with lots of auto-config) or apps warming caches/JIT on cold start.
▪️ Images that run DB migrations or contact external systems on boot.
▪️ Any service that sometimes takes >30–60s to become healthy.
🔸 HOW IT WORKS (MENTAL MODEL)
▪️ startupProbe = boot watchdog (one-time gate).
▪️ readinessProbe = traffic gate (can flap).
▪️ livenessProbe = hang detector (restart if stuck).
🔸 MINIMAL EXAMPLE (HTTP)
🔸 TUNING TIPS
▪️ Set startupProbe.failureThreshold × periodSeconds ≈ worst-case boot time (be generous).
▪️ Prefer fast, dependency-light endpoints (no DB calls).
▪️ For TCP-only apps, use tcpSocket; for scripts, use exec.
▪️ Keep liveness stricter (fast check, lower thresholds) than readiness.
▪️ Avoid identical endpoints for all three probes—use dedicated health groups if possible.
🔸 COMMON PITFALLS
▪️ ❌ Too-short startup window → unnecessary restarts.
▪️ ❌ Using readiness only—pods receive traffic before they’re ready.
▪️ ❌ Heavy health endpoint (does DB/mq calls) → probe becomes the bottleneck.
▪️ ❌ Forgetting timeoutSeconds: slow network can cause false negatives.
🔸 TAKEAWAYS
▪️ startupProbe prevents premature restarts, stabilizing deployments with slow cold starts.
▪️ Combine startup + readiness + liveness for clear, complementary responsibilities.
▪️ Right sizing thresholds = fewer CrashLoopBackOffs, smoother rollouts, happier on-calls. ✅
#Kubernetes #Containers #CloudNative #DevOps #SRE #PlatformEngineering #Observability #SpringBoot #LivenessProbe #ReadinessProbe #StartupProbe