Accepted proposal: a goroutine leak profile in the Go standard library
Table of contents
Go 1.27 is getting a goroutine leak detector in runtime/pprof. The proposal
was accepted
in April.
A few common goroutine leaks
A goroutine leaks when it blocks on a channel or lock that nothing will ever release, so it lingers for the life of the process. I’ve been using uber-go/goleak to catch them in tests.
One is an early return that strands a sender, which I covered in Early return and goroutine leak . It looks like this:
func run(tasks []func() error) error {
errs := make(chan error) // unbuffered
var wg sync.WaitGroup
for _, task := range tasks {
wg.Go(func() { errs <- task() }) // (1)
}
for range tasks {
if err := <-errs; err != nil {
return err // (2)
}
}
wg.Wait()
return nil
}Here:
- (1) each task sends its result on the unbuffered channel through
wg.Go - (2) the first error returns early, so the tasks still queued to send block forever
Giving errs a buffer big enough for every task, or draining all the results before
returning, keeps the sends from blocking.
A related leak shows up when you send a request to several replicas and keep only the first answer:
func replicate(replicas []func() string) string {
results := make(chan string) // unbuffered
for _, r := range replicas {
go func() { results <- r() }() // (1)
}
return <-results // (2)
}Here:
- (1) every replica races to send its answer on the unbuffered channel
- (2) the first answer returns, and the slower replicas block forever on their sends
Same as before, a buffer sized for every replica lets the slower ones send and exit.
Another is a forgotten close:
func stream(work []int) {
out := make(chan int)
go func() {
for v := range out { // (1)
handle(v)
}
}()
for _, v := range work {
out <- v
}
// (2) no close(out)
}Here:
- (1) the range keeps pulling from
outuntil it’s closed - (2)
streamreturns withoutclose(out), so the range never ends and the goroutine leaks
The fix is to close(out) after the last send, which ends the range and lets the goroutine
return.
They’re obvious once you spot them, but easy to let slip past under an early return or once
the surrounding code grows. goleak catches them in tests. In production you’ve got the
regular /debug/pprof/goroutine profile. It shows what each goroutine is blocked on, not
whether it will ever unblock, so you’re guessing which are stuck for good and which are just
idle.
This list is nowhere near exhaustive, and not every leak is in your own code. A dependency, or one of its transitive deps, can leak one too. Uber catalogued the patterns across its Go monorepo .
The stdlib leak profile can now find them
It came out of Uber, the same place as goleak, and was designed by Vlad Saioc and Milind Chabbi. The detection rides on the garbage collector . A goroutine is leaked when it’s blocked on a channel or lock that no runnable goroutine can reach, directly or through another goroutine a runnable one could unblock. Nothing can ever wake it, so the GC flags it.
It works differently from goleak. goleak doesn’t prove a goroutine is stuck. It just reports
the ones still running that you didn’t tell it to expect. That suits a test, where nothing
should be left running when it ends. A live server is the opposite. Most of its goroutines
are blocked on purpose, waiting for the next request. Run goleak’s Find there and those
healthy ones get reported right next to any real leak.
The profile tells the difference. The GC checks whether anything could ever unblock a goroutine and flags it only when nothing can. So it works on a live process, the kind of in-production detection Uber built . Whatever it flags is stuck for good, not just slow or idle. That’s the no false positives guarantee.
The profile ships without goleak’s VerifyNone(t) or VerifyTestMain(m). The test
section
shows how to roll your own.
The API is tiny. There’s no new type or function, just a profile named goroutineleak. It
ships registered, and the standard pprof tooling reads it like any other profile.
You can pull the profile in the usual four ways
Note
Until Go 1.27 the profile is behind a build flag. Run the examples below with
GOEXPERIMENT=goroutineleakprofile, or pprof.Lookup("goroutineleak") returns nil. From
1.27 on, it’s generally available
and the flag is gone.
From your own code
You pull the profile and write it out yourself. Start with debug=0, which dumps a gzipped
protobuf to a file:
func main() {
f, _ := os.Create("leak.pb.gz")
pprof.Lookup("goroutineleak").WriteTo(f, 0)
}pprof.Lookup returns the profile, and WriteTo runs the leak-detecting GC cycle before
writing it out. Open the file with go tool pprof, the same as a CPU or heap profile.
WriteTo’s second argument is the debug level. 1 and 2 give text instead, which you
can send straight to os.Stdout. At debug=1, a signal handler lets kill -USR1 <pid>
dump leaks on demand:
// kill -USR1 <pid> to dump leaks
sig := make(chan os.Signal, 1)
signal.Notify(sig, syscall.SIGUSR1)
go func() {
for range sig {
pprof.Lookup("goroutineleak").WriteTo(os.Stdout, 1)
}
}()The text points straight at the goroutines that leaked:
goroutineleak profile: total 2
1 @ ...
# 0x... main.leakSend.func1+0x27 formats/main.go:15
1 @ ...
# 0x... main.leakRange.func1+0x33 formats/main.go:21debug=2 is a full goroutine dump, with the leaked goroutines tagged (leaked):
goroutine 7 [chan send (leaked)]:
main.leakSend.func1()
formats/main.go:15 +0x28
created by main.leakSend in goroutine 1
formats/main.go:15 +0x6c
goroutine 8 [chan receive (leaked)]:
main.leakRange.func1()
formats/main.go:21 +0x34
created by main.leakRange in goroutine 1
formats/main.go:20 +0x6cA normal dump reads [chan send] and [chan receive]. The (leaked) suffix is what the
profile adds.
In a test
One helper runs the detection and returns whatever’s stuck:
func leaked() (string, bool) {
p := pprof.Lookup("goroutineleak")
if p == nil {
return "", false // experiment off, nothing to detect
}
var b bytes.Buffer
p.WriteTo(&b, 1)
return b.String(), p.Count() > 0
}For one test, wrap it in a verifyNone you defer:
// verifyNone mirrors goleak.VerifyNone.
func verifyNone(t *testing.T) {
t.Helper()
if report, ok := leaked(); ok {
t.Fatalf("leaked goroutines:\n%s", report)
}
}
func TestRun(t *testing.T) {
defer verifyNone(t)
// ... exercise the code under test ...
}For the whole suite, write a verifyTestMain and call it from TestMain:
// verifyTestMain mirrors goleak.VerifyTestMain.
func verifyTestMain(m *testing.M) {
code := m.Run()
if code == 0 {
if report, ok := leaked(); ok {
fmt.Fprintf(os.Stderr, "leaked goroutines:\n%s", report)
code = 1
}
}
os.Exit(code)
}
func TestMain(m *testing.M) {
verifyTestMain(m)
}Over HTTP
Importing net/http/pprof registers it on the default mux. Serve that mux, the nil
handler below, and the endpoint is live with no extra code:
import _ "net/http/pprof" // registers /debug/pprof/goroutineleak on http.DefaultServeMux
http.ListenAndServe("localhost:6060", nil)Then read the profile off the endpoint:
$ curl 'localhost:6060/debug/pprof/goroutineleak?debug=1'
goroutineleak profile: total 1
1 @ ...
# 0x... main.main.func1+0x27 server/main.go:13
With go tool pprof
go tool pprof reads it like any other profile, pointed at that endpoint or a saved
debug=0 dump:
$ go tool pprof -top 'http://localhost:6060/debug/pprof/goroutineleak'
Type: goroutineleak
flat flat% sum% cum cum%
1 100% 100% 1 100% runtime.gopark
0 0% 100% 1 100% main.main.func1
0 0% 100% 1 100% runtime.chansend
What it can’t catch
It won’t catch every leak. The Go 1.27 notes admit it can’t catch every case and only promise a large class of them.
That comes from leaning on reachability. If the channel or lock a stuck goroutine is waiting on is still reachable, through a global or the locals of a running goroutine, the GC counts it as live and leaves the goroutine alone. The leaks it does report are real. A few real ones just slip through.
Every snippet here is a runnable program in the example repo . I ran them on the 1.26 toolchain and the profile flagged each leak at the exact line.