[archives]

Benchmarking Go-lang Map Population


On the job we spend a lot of time using the goavro library to populate avro serialized records from our go-lang programs. The library uses map[string]interface{} as its main data structure. In order to add type safety and encapsulate some encoding specific logic we have built a large set of library functions on top of goavro.

For instance:

...
var m map[string]interface{}
m = DecodeMapFromSomeAvroRecord(record)

//Gets the "id" field out of the m map, and turn it into an int64, error if missing or incorrect type
num, err := GetInt64(m, "id")
...

For the Get functions there is a lot of value being provided by the typed functions. We also provide a group of Set functions:

...
var m map[string]interface{}
m = DecodeMapFromSomeAvroRecord(record)

//Gets the "id" field out of the m map, and turn it into an int64, error if missing or incorrect type
SetInt64(m, "id", int65(89))
...

There are several Set functions that encapsulate important encoding details, but many of them are just passthroughs to a map set. They are provided for uniformity of code and to encourage the use of the setters that are important without remembering specifically what they are. The question is if we are paying a performance penalty for using them?

My intuition is that for the pass through Set functions the should be in-lined by the compiler and thus the answer would be “no”. This sort of question is one of the few that can be investigated cleanly with micro-benchmarks. I wrote:

func BenchmarkMapInit(b *testing.B) {
	for n := 0; n < b.N; n++ {
		m := map[string]interface{}{
			"foo": "bar",
			"moo": int32(7),
			"goo": map[string]interface{}{"boo": "baz"},
		}

		if m["foo"] != "bar" || m["moo"] != int32(7) || m["goo"] == nil {
			b.Errorf("your test is broken: %v", m)
		}
	}
}

func BenchmarkMapSet(b *testing.B) {
	for n := 0; n < b.N; n++ {
		m := make(map[string]interface{})
		m["foo"] = "bar"
		m["moo"] = int32(7)
		m["goo"] = map[string]interface{}{"boo": "baz"}

		if m["foo"] != "bar" || m["moo"] != int32(7) || m["goo"] == nil {
			b.Errorf("your test is broken: %v", m)
		}
	}
}

func BenchmarkMapFunctions(b *testing.B) {
	for n := 0; n < b.N; n++ {
		m := make(map[string]interface{})
		SetString(m, "foo", "bar")
		SetInt32(m, "moo", int32(7))
		SetMap(m, "goo", map[string]interface{}{"boo": "baz"})

		if m["foo"] != "bar" || m["moo"] != int32(7) || m["goo"] == nil {
			b.Errorf("your test is broken: %v", m)
		}
	}
}

func SetString(m map[string]interface{}, field string, value string) {
	m[field] = value
}

func SetInt32(m map[string]interface{}, field string, value int32) {
	m[field] = value
}

func SetMap(m map[string]interface{}, field string, value map[string]interface{}) {
	m[field] = value
}

I then ran the test with the -gcflags -m parameters to see if inlining was happening:

$ go test -run=x -bench=. -gcflags -m .
# github.com/kklipsch/mapbench
./bench_test.go:35:12: inlining call to SetString
./bench_test.go:36:11: inlining call to SetInt32
./bench_test.go:37:9: inlining call to SetMap
...

It was as I assumed, so we should see no noticeable performance differences:

$ go test -run=x -bench=. -memprofile=mem.out -cpuprofile=cpu.out -benchmem .
goos: darwin
goarch: amd64
pkg: github.com/kklipsch/mapbench
BenchmarkMapInit-8               3000000               508 ns/op             672 B/op          4 allocs/op
BenchmarkMapSet-8                3000000               505 ns/op             672 B/op          4 allocs/op
BenchmarkMapFunctions-8          3000000               590 ns/op             692 B/op          6 allocs/op
PASS
ok      github.com/kklipsch/mapbench    17.243s

Except the benchmark shows 2 extra allocations and a time difference for the function based approach vs the non-function passing approach. Looking at the cpu.out profile in pprof gives us the following picture:

mapbench_1.png

In this we can see that the vast majority of cpu time added in on the Setter function test is from the SetString function, and that it is calling runtime.convT2Estring and none of the other paths are. Pecking around the go-lang github repository its not obvious to me what that function is for but there is at least 1 issue indicating there are opportunities for optimizations by changing that function.

Because in our real usages we are unlikely to be using constant values, I thought I’d try to use some variable indirection to see what impact that had on the benchmarks so I added:


var (
	fooKey   = "foo"
	fooValue = "bar"

	mooKey   = "moo"
	mooValue = int32(7)

	gooKey   = "goo"
	gooValue = map[string]interface{}{"boo": "baz"}
)

func BenchmarkMapInitVars(b *testing.B) {
	for n := 0; n < b.N; n++ {
		m := map[string]interface{}{
			fooKey: fooValue,
			mooKey: mooValue,
			gooKey: gooValue,
		}

		if m["foo"] != "bar" || m["moo"] != int32(7) || m["goo"] == nil {
			b.Errorf("your test is broken: %v", m)
		}
	}
}

func BenchmarkMapSetVars(b *testing.B) {
	for n := 0; n < b.N; n++ {
		m := make(map[string]interface{})
		m[fooKey] = fooValue
		m[mooKey] = mooValue
		m[gooKey] = gooValue

		if m["foo"] != "bar" || m["moo"] != int32(7) || m["goo"] == nil {
			b.Errorf("your test is broken: %v", m)
		}
	}
}

func BenchmarkMapFunctionsVars(b *testing.B) {
	for n := 0; n < b.N; n++ {
		m := make(map[string]interface{})
		SetString(m, fooKey, fooValue)
		SetInt32(m, mooKey, mooValue)
		SetMap(m, gooKey, gooValue)

		if m["foo"] != "bar" || m["moo"] != int32(7) || m["goo"] == nil {
			b.Errorf("your test is broken: %v", m)
		}
	}
}

Which output:

$ go test -run=x -bench=. -memprofile=mem.out -cpuprofile=cpu.out -benchmem .
goos: darwin
goarch: amd64
pkg: github.com/kklipsch/mapbench
BenchmarkMapInit-8               3000000               515 ns/op             672 B/op          4 allocs/op
BenchmarkMapSet-8                3000000               507 ns/op             672 B/op          4 allocs/op
BenchmarkMapFunctions-8          3000000               566 ns/op             692 B/op          6 allocs/op
BenchmarkMapInitVars-8           5000000               368 ns/op             356 B/op          4 allocs/op
BenchmarkMapSetVars-8            5000000               366 ns/op             356 B/op          4 allocs/op
BenchmarkMapFunctionsVars-8      5000000               370 ns/op             356 B/op          4 allocs/op
PASS
ok      github.com/kklipsch/mapbench    13.211s

All three ways to populate the maps have parity under the vars situation (and a huge speed up assumedly due to the shared variables not being allocated).

The pprof graph shows the variable based functions not calling the runtime.convT2EString.

mapbench_1.png

For completeness I also did a version of the functions that reused the map, as expected that showed the biggest performance improvement:

$ go test -run=x -bench=. -memprofile=mem.out -cpuprofile=cpu.out -benchmem .
goos: darwin
goarch: amd64
pkg: github.com/kklipsch/mapbench
BenchmarkMapInit-8               3000000               511 ns/op             672 B/op          4 allocs/op
BenchmarkMapSet-8                3000000               504 ns/op             672 B/op          4 allocs/op
BenchmarkMapFunctions-8          3000000               571 ns/op             692 B/op          6 allocs/op
BenchmarkMapInitVars-8           5000000               373 ns/op             356 B/op          4 allocs/op
BenchmarkMapSetVars-8            5000000               368 ns/op             356 B/op          4 allocs/op
BenchmarkMapFunctionsVars-8      5000000               390 ns/op             356 B/op          4 allocs/op
BenchmarkMapSetAlloc-8           5000000               305 ns/op             336 B/op          2 allocs/op
BenchmarkMapFunctionsAlloc-8     5000000               356 ns/op             356 B/op          4 allocs/op
PASS
ok      github.com/kklipsch/mapbench    17.348s

TLDR: Yes the functions do get inlined, but there can be in some cases a performance hit for using them. If you are worried about that hit you likely want to move to reusing maps before you move away from the functions.

You can find the code for this post in my github repository.