A tour of txtar
Table of contents
I ran into txtar
today while poking around cmd/go’s testdata
directory and got curious
about why every test file looked like it had a tiny diff embedded in it. Turns out it’s a
trivial archive format Russ Cox introduced in 2018
, and once I noticed it I started seeing
it everywhere in Go tooling.
A txtar archive looks like this:
Lines up here are the comment.
-- hello.txt --
hello, world
-- nested/foo.go --
package nested
func Foo() string { return "foo" }
Two file markers, two files, a free-text comment up top. That’s the whole format. The package doc says it outright: “There are no possible syntax errors in a txtar archive.”
The format #
The package doc comment is the spec. The rules:
- A marker line is exactly
-- FILENAME --at the start of a line. Three bytes--<space>open the marker, three bytes<space>--close it. - Anything before the first marker is the comment.
- File data runs from one marker to the next, or to EOF.
- Whitespace inside the marker is trimmed, so
-- foo.go --parses asfoo.go. - A missing trailing newline on the final file is treated as if it were there.
The format is text only. The package doc [explicitly] rules out binary content, file modes,
symlinks, and any escape mechanism for the marker syntax. The escape gap matters when a file
body happens to contain a line beginning with -- and ending with --. The parser treats
it as a new marker, and there’s no way to quote it. Reach for tar or zip when any of
that matters.
It stays this small because it was purpose-built
around three goals listed in the package
doc: stay hand-editable, store trees of text files for go command test cases, and diff
cleanly in git history and code reviews. It first landed inside the early vgo modules
prototype in 2018.
Parse it from a string #
The Go API lives in golang.org/x/tools/txtar . Two types and two functions cover the common case:
type Archive struct {
Comment []byte
Files []File
}
type File struct {
Name string
Data []byte
}
func Parse(data []byte) *Archive
func ParseFile(file string) (*Archive, error)
Parse doesn’t return an error. The format can’t fail to parse.
package main
import (
"fmt"
"golang.org/x/tools/txtar"
)
const archive = `Lines up here are the comment.
-- hello.txt --
hello, world
-- nested/foo.go --
package nested
func Foo() string { return "foo" }
`
func main() {
ar := txtar.Parse([]byte(archive))
fmt.Printf("comment: %q\n", ar.Comment)
for _, f := range ar.Files {
fmt.Printf("%s (%d bytes)\n", f.Name, len(f.Data))
}
}
comment: "Lines up here are the comment.\n\n"
hello.txt (14 bytes)
nested/foo.go (51 bytes)
Parse returns slices that alias the input. Mutating the bytes you passed in will corrupt
the archive on the next read.
Read it from disk #
ParseFile does what you’d expect:
ar, err := txtar.ParseFile("fixture.txt")
if err != nil {
log.Fatal(err)
}
Same *Archive, just fed by os.ReadFile instead of a string literal.
Mount it as an fs.FS #
txtar.FS, added in July 2024
, hands you a read-only fs.FS view over the archive
without ever touching disk:
fsys, err := txtar.FS(ar)
if err != nil {
log.Fatal(err)
}
fs.WalkDir(fsys, ".", func(p string, d fs.DirEntry, err error) error {
if err != nil {
return err
}
fmt.Printf("%s (dir=%v)\n", p, d.IsDir())
return nil
})
. (dir=true)
hello.txt (dir=false)
nested (dir=true)
nested/foo.go (dir=false)
Anything that takes an fs.FS works against the archive directly. A parser, a template
engine, or a static-site generator reads its fixtures straight from the archive in memory.
You don’t need a tempdir or an extraction step.
Format it back to bytes #
txtar.Format(*Archive) []byte is the inverse of Parse:
ar := &txtar.Archive{
Comment: []byte("generated\n"),
Files: []txtar.File{
{Name: "main.go", Data: []byte("package main\n")},
{Name: "go.mod", Data: []byte("module example\n")},
},
}
os.Stdout.Write(txtar.Format(ar))
generated
-- main.go --
package main
-- go.mod --
module example
The package doesn’t ship a write-files-to-disk helper. The canonical pattern is to walk
ar.Files, validate each path stays inside your destination, and write yourself. Go’s
cmd/internal/script
package has an ExtractFiles method that does exactly that. The
golang.org/x/exp/cmd/txtar
CLI is another option, with txtar -x for extracting and
txtar <path> for archiving a file or directory.
A golden test in one file #
Say you’ve got a function Format(in []byte) []byte that pretty-prints some text format you
care about. JSON, SQL, markdown, whatever. You want to feed it a stack of inputs without
scattering ten-line files all over testdata. One txtar archive per case covers it. The
comment up top says what’s being tested, -- in -- is the input, -- want -- is the
expected output.
testdata/empty_object.txt:
the empty object collapses
-- in --
{
}
-- want --
{}
testdata/nested.txt:
nested objects re-indent
-- in --
{"a":{"b":1}}
-- want --
{
"a": {
"b": 1
}
}
The test globs the directory, parses each archive, and compares Format(in) against want:
func TestFormat(t *testing.T) {
fixtures, _ := filepath.Glob("testdata/*.txt")
for _, path := range fixtures {
t.Run(filepath.Base(path), func(t *testing.T) {
ar, err := txtar.ParseFile(path)
if err != nil {
t.Fatal(err)
}
files := map[string][]byte{}
for _, f := range ar.Files {
files[f.Name] = f.Data
}
got := Format(files["in"])
if !bytes.Equal(got, files["want"]) {
t.Errorf("got %q, want %q", got, files["want"])
}
})
}
}
Adding a case is one new file in testdata/. The comment documents intent, and the input
and expected output sit side by side. You skip the per-test setup boilerplate and the
separate golden/ directory that always drifts out of sync.
The same shape works for a multi-file fixture. Add a -- go.mod -- and three -- *.go --
files to one archive and you’ve got a hermetic mini-module to feed your linter, refactorer,
or codegen tool. That’s what cmd/go’s script tests and gopls’s marker tests do, with
hundreds of fixtures each.
When the comment is a script #
The previous example used the archive as static data. cmd/go’s script tests use it
differently: the comment is a sequence of commands to run, and the file entries are the
workspace those commands operate on. rogpeppe/go-internal/testscript
is the same engine
packaged as a library you can call from any test.
Say you want a smoke test for tree. You hand it a small project shaped via the file
entries, and assert on the tree it prints. One archive does both jobs:
testdata/tree.txt:
# tree should walk the workspace it was extracted into
exec tree --noreport --charset=utf-8 -I want
cmp stdout want
-- README.md --
# project
-- src/main.go --
package main
-- src/util.go --
package main
-- want --
.
├── README.md
└── src
├── main.go
└── util.go
exec and cmp are testscript commands, not shell. exec runs a process, cmp compares
its captured stdout against the file named want. testscript materializes every file entry
into a fresh temp directory before the script runs, so tree walks the four files above.
The -I want flag tells tree to skip the want file itself, since it’d otherwise show up
in the listing.
The Go side is one function:
package tree_test
import (
"testing"
"github.com/rogpeppe/go-internal/testscript"
)
func TestTree(t *testing.T) {
testscript.Run(t, testscript.Params{Dir: "testdata"})
}
testscript.Run globs testdata/*.txt, parses each archive, drops the file entries into a
fresh temp directory, runs the script line by line, and reports a diff if cmp fails.
Adding another case is one more .txt archive under testdata/.
In the wild #
The 900+ .txt files in src/cmd/go/testdata/script/
are txtar archives. The comment up
top is the script the test runs, and the files below are the workspace it runs in. The
README
in that directory says “Each script is a text archive.”
Sharing a multi-file snippet on the Go Playground encodes a txtar archive into the share
URL. The code is in playground/txtar.go
, added by Brad Fitzpatrick in 2019. The pre-marker
comment is treated as prog.go, which keeps single-file shares backwards compatible.
gopls’s marker tests under gopls/internal/test/marker/testdata/
are txtar files too. One
archive packs the Go source, the golden output, a -- flags -- section, and the gopls
settings into one end-to-end LSP test case.
Russ Cox’s refactoring tool rsc.io/rf
keeps each test case as a txtar archive. The comment
is the refactor command, the files are the input, and -- stdout -- plus -- stderr --
carry the expected output.
rsc.io/script
and rogpeppe/go-internal/testscript
both extract the script engine from
cmd/go so you can run script fixtures in your own packages. Russ covers them in Go
Testing By Example
under “Use txtar for multi-file test cases”.
The pattern repeats across most of them. Each archive holds one test case, with the script
or input on top, the workspace below, and a -- name -- section for golden output when you
need one. gopls’s marker tests put everything in named files like flags and
settings.json instead of a top/bottom split.
Why it caught on #
A directory full of fixture files is hard to review and harder to paste into a bug report.
txtar collapses all of it into one plain-text file that diffs cleanly in Gerrit and GitHub
and drops into a chat or an issue. With txtar.FS you don’t need to extract anything to run
a test against it.
The format is small enough that you’d implement it in an afternoon. The reason to use the
upstream package is that everyone else in Go tooling already does, so your fixtures are
portable to testscript, rsc.io/script, the Playground, and anything else that adds a
txtar reader later.
Gist
- txtar is
-- filename --markers separating file bodies, with free text on top as a comment. The format is text only, has no possible syntax errors, and uses the package doc comment as its spec. - The package is
golang.org/x/tools/txtar.Parsereads bytes,ParseFilereads a path,Formatwrites back.txtar.FSmounts an archive as a read-onlyfs.FSso tests can run without ever touching disk. - rogpeppe/go-internal/testscript runs an archive as a script: the comment becomes shell-like commands, the file entries become the workspace those commands run against. Use it to drive a CLI test from one fixture per case.
- It’s the file shape behind
cmd/go’s 900+ script tests, the Go Playground’s multi-file shares, gopls’s marker tests, andrsc.io/rf. Reach for it when you want one PR-friendly file per test case.