Accepted proposal: a goroutine leak profile in the Go standard library

Table of contents

Go 1.27 is getting a goroutine leak detector in runtime/pprof. The proposal was accepted in April.

A few common goroutine leaks

A goroutine leaks when it blocks on a channel or lock that nothing will ever release, so it lingers for the life of the process. I’ve been using uber-go/goleak to catch them in tests.

One is an early return that strands a sender, which I covered in Early return and goroutine leak . It looks like this:

func run(tasks []func() error) error {
    errs := make(chan error) // unbuffered
    var wg sync.WaitGroup
    for _, task := range tasks {
        wg.Go(func() { errs <- task() }) // (1)
    }
    for range tasks {
        if err := <-errs; err != nil {
            return err // (2)
        }
    }
    wg.Wait()
    return nil
}

Here:

  • (1) each task sends its result on the unbuffered channel through wg.Go
  • (2) the first error returns early, so the tasks still queued to send block forever

Giving errs a buffer big enough for every task, or draining all the results before returning, keeps the sends from blocking.

A related leak shows up when you send a request to several replicas and keep only the first answer:

func replicate(replicas []func() string) string {
    results := make(chan string) // unbuffered
    for _, r := range replicas {
        go func() { results <- r() }() // (1)
    }
    return <-results // (2)
}

Here:

  • (1) every replica races to send its answer on the unbuffered channel
  • (2) the first answer returns, and the slower replicas block forever on their sends

Same as before, a buffer sized for every replica lets the slower ones send and exit.

Another is a forgotten close:

func stream(work []int) {
    out := make(chan int)
    go func() {
        for v := range out { // (1)
            handle(v)
        }
    }()
    for _, v := range work {
        out <- v
    }
    // (2) no close(out)
}

Here:

  • (1) the range keeps pulling from out until it’s closed
  • (2) stream returns without close(out), so the range never ends and the goroutine leaks

The fix is to close(out) after the last send, which ends the range and lets the goroutine return.

They’re obvious once you spot them, but easy to let slip past under an early return or once the surrounding code grows. goleak catches them in tests. In production you’ve got the regular /debug/pprof/goroutine profile. It shows what each goroutine is blocked on, not whether it will ever unblock, so you’re guessing which are stuck for good and which are just idle.

This list is nowhere near exhaustive, and not every leak is in your own code. A dependency, or one of its transitive deps, can leak one too. Uber catalogued the patterns across its Go monorepo .

The stdlib leak profile can now find them

It came out of Uber, the same place as goleak, and was designed by Vlad Saioc and Milind Chabbi. The detection rides on the garbage collector . A goroutine is leaked when it’s blocked on a channel or lock that no runnable goroutine can reach, directly or through another goroutine a runnable one could unblock. Nothing can ever wake it, so the GC flags it.

It works differently from goleak. goleak doesn’t prove a goroutine is stuck. It just reports the ones still running that you didn’t tell it to expect. That suits a test, where nothing should be left running when it ends. A live server is the opposite. Most of its goroutines are blocked on purpose, waiting for the next request. Run goleak’s Find there and those healthy ones get reported right next to any real leak.

The profile tells the difference. The GC checks whether anything could ever unblock a goroutine and flags it only when nothing can. So it works on a live process, the kind of in-production detection Uber built . Whatever it flags is stuck for good, not just slow or idle. That’s the no false positives guarantee.

The profile ships without goleak’s VerifyNone(t) or VerifyTestMain(m). The test section shows how to roll your own.

The API is tiny. There’s no new type or function, just a profile named goroutineleak. It ships registered, and the standard pprof tooling reads it like any other profile.

You can pull the profile in the usual four ways

Note

Until Go 1.27 the profile is behind a build flag. Run the examples below with GOEXPERIMENT=goroutineleakprofile, or pprof.Lookup("goroutineleak") returns nil. From 1.27 on, it’s generally available and the flag is gone.

From your own code

You pull the profile and write it out yourself. Start with debug=0, which dumps a gzipped protobuf to a file:

func main() {
    f, _ := os.Create("leak.pb.gz")
    pprof.Lookup("goroutineleak").WriteTo(f, 0)
}

pprof.Lookup returns the profile, and WriteTo runs the leak-detecting GC cycle before writing it out. Open the file with go tool pprof, the same as a CPU or heap profile.

WriteTo’s second argument is the debug level. 1 and 2 give text instead, which you can send straight to os.Stdout. At debug=1, a signal handler lets kill -USR1 <pid> dump leaks on demand:

// kill -USR1 <pid> to dump leaks
sig := make(chan os.Signal, 1)
signal.Notify(sig, syscall.SIGUSR1)
go func() {
    for range sig {
        pprof.Lookup("goroutineleak").WriteTo(os.Stdout, 1)
    }
}()

The text points straight at the goroutines that leaked:

goroutineleak profile: total 2
1 @ ...
#    0x...    main.leakSend.func1+0x27    formats/main.go:15
1 @ ...
#    0x...    main.leakRange.func1+0x33    formats/main.go:21

debug=2 is a full goroutine dump, with the leaked goroutines tagged (leaked):

goroutine 7 [chan send (leaked)]:
main.leakSend.func1()
    formats/main.go:15 +0x28
created by main.leakSend in goroutine 1
    formats/main.go:15 +0x6c

goroutine 8 [chan receive (leaked)]:
main.leakRange.func1()
    formats/main.go:21 +0x34
created by main.leakRange in goroutine 1
    formats/main.go:20 +0x6c

A normal dump reads [chan send] and [chan receive]. The (leaked) suffix is what the profile adds.

In a test

One helper runs the detection and returns whatever’s stuck:

func leaked() (string, bool) {
    p := pprof.Lookup("goroutineleak")
    if p == nil {
        return "", false // experiment off, nothing to detect
    }
    var b bytes.Buffer
    p.WriteTo(&b, 1)
    return b.String(), p.Count() > 0
}

For one test, wrap it in a verifyNone you defer:

// verifyNone mirrors goleak.VerifyNone.
func verifyNone(t *testing.T) {
    t.Helper()
    if report, ok := leaked(); ok {
        t.Fatalf("leaked goroutines:\n%s", report)
    }
}

func TestRun(t *testing.T) {
    defer verifyNone(t)
    // ... exercise the code under test ...
}

For the whole suite, write a verifyTestMain and call it from TestMain:

// verifyTestMain mirrors goleak.VerifyTestMain.
func verifyTestMain(m *testing.M) {
    code := m.Run()
    if code == 0 {
        if report, ok := leaked(); ok {
            fmt.Fprintf(os.Stderr, "leaked goroutines:\n%s", report)
            code = 1
        }
    }
    os.Exit(code)
}

func TestMain(m *testing.M) {
    verifyTestMain(m)
}

Over HTTP

Importing net/http/pprof registers it on the default mux. Serve that mux, the nil handler below, and the endpoint is live with no extra code:

import _ "net/http/pprof" // registers /debug/pprof/goroutineleak on http.DefaultServeMux

http.ListenAndServe("localhost:6060", nil)

Then read the profile off the endpoint:

$ curl 'localhost:6060/debug/pprof/goroutineleak?debug=1'
goroutineleak profile: total 1
1 @ ...
#    0x...    main.main.func1+0x27    server/main.go:13

With go tool pprof

go tool pprof reads it like any other profile, pointed at that endpoint or a saved debug=0 dump:

$ go tool pprof -top 'http://localhost:6060/debug/pprof/goroutineleak'
Type: goroutineleak
      flat  flat%   sum%        cum   cum%
         1   100%   100%          1   100%  runtime.gopark
         0     0%   100%          1   100%  main.main.func1
         0     0%   100%          1   100%  runtime.chansend

What it can’t catch

It won’t catch every leak. The Go 1.27 notes admit it can’t catch every case and only promise a large class of them.

That comes from leaning on reachability. If the channel or lock a stuck goroutine is waiting on is still reachable, through a global or the locals of a running goroutine, the GC counts it as live and leaves the goroutine alone. The leaks it does report are real. A few real ones just slip through.

Every snippet here is a runnable program in the example repo . I ran them on the 1.26 toolchain and the profile flagged each leak at the exact line.