Inline Functions in Golang: How the Go Compiler Secretly Carries Your Code's Performance

Inlining is one of those compiler tricks that feels boring until it saves your hot path from death by a thousand tiny function calls. You write a helper because you are a responsible adult. The compiler looks at it, squints, and sometimes says: “This function is small enough. I will just paste the body where it is called.” No ceremony. No keyword. No Slack announcement. Just quiet performance work happening under the hood. ...

June 6, 2026 · 13 min · Sang Tran

Green Tea GC: How Go 1.26 Redesigned Garbage Collection Around Pages

Go 1.26 ships a new garbage collector called Green Tea. Despite the name, this is not a generational GC — it’s a fundamental redesign of the mark-sweep algorithm that works with memory pages instead of individual objects. The result: 10-40% reduction in GC CPU time. This post covers how it works, why it’s faster, and what the AVX-512 vector acceleration does. 1. The Problem with the Old Mark Phase The current GC spends ~90% of its time in marking. The remaining ~10% is sweep. At least 35% of marking time is wasted on memory access stalls — the CPU is waiting for data from RAM. ...

March 30, 2026 · 8 min · Sang Tran

Go GC Internals: How the Garbage Collector Actually Works

Go’s garbage collector is a concurrent, tri-color, mark-and-sweep collector. It runs alongside your application without stopping it for long pauses. This post covers the full picture — the algorithm, the four GC phases, write barriers, the pacer, how the GC cooperates with the scheduler, and what GOGC and GOMEMLIMIT actually control. Based on Go 1.24 runtime. 1. Design Constraints Go’s GC operates under constraints that are different from Java’s or Python’s: ...

June 19, 2025 · 13 min · Sang Tran

Go Scheduler Internals: How Goroutines Actually Get Executed

Every Go developer uses goroutines. Few understand what happens after go func(). This post walks through the Go scheduler’s internals — the GMP model, how goroutines get picked to run, preemption, syscall handling, and the netpoller. Based on Go 1.24 runtime source. 1. Why Go Needs Its Own Scheduler OS threads are expensive. Each one costs ~1MB of stack memory and context-switching between them requires a trip to the kernel. If every goroutine mapped 1:1 to an OS thread, a program with 100k goroutines would need 100GB of stack space before it did any real work. ...

January 29, 2024 · 10 min · Sang Tran