Reducing Malloc/Free traffic to cgo
Cgo overhead is a little higher than many are comfortable with (at the time of this writing, a simple call tends to run between 4-6x an equivalent JNI call). Where they really get you, though, is the data marshalling. Each individual call to malloc or free is another cgo call with a 30-50ns overhead.
This library provides an Allocator interface which can be used to provide alternative allocators to C.malloc and C.free. It also provides a Destroy method, which will clean up any overhead allocated via cgo, as well as make a best-effort to panic if any memory has been allocated and not freed via the destroyed Allocator. This functionality uses whatever information the allocator in question happens to have available, so it should not be considered definitive.
More importantly, it provides an allocator
FixedBlockAllocator which sits on top of another Allocator and allows you to malloc large buffers that are doled out in blocks, amortizing the malloc and free calls across the life of a program.
DefaultAllocator– calls cgo for Malloc and Free
FallbackAllocator– Accepts a FixedBlockAllocator and one other allocator- if the malloc can fit in the FBA, it uses that, otherwise it mallocs in the other allocator. You can use this to fall back on the default allocator for large requests. You could also use several to set up a multi-tiered FBA, I suppose.
ArenaAllocator– sits on top of another allocator. Exposes a FreeAll method which will free all memory allocated through the ArenaAllocator. ArenaAllocator is optimized for
FreeAlland ordinary frees have a cost of O(N)
Are these thread-safe?
The DefaultAllocator is! And as slow as cgo is, it’s still far faster than any locking mechanism in existence, so if you need thread safety, that’s what you should use.
What’s the performance like?
In terms of memory overhead, it’s kind of bad! I use a lot of maps and slices to track allocated-but-not-freed data. In terms of speed:
BenchmarkDefaultTemporaryData BenchmarkDefaultTemporaryData-16 12792590 94.58 ns/op BenchmarkDefaultGrowShrink BenchmarkDefaultGrowShrink-16 11286946 104.7 ns/op
BenchmarkFBATemporaryData BenchmarkFBATemporaryData-16 123561244 9.714 ns/op BenchmarkFBAGrowShrink BenchmarkFBAGrowShrink-16 64682006 34.83 ns/op
BenchmarkMultilayerTemporaryData BenchmarkMultilayerTemporaryData-16 72288720 17.06 ns/op BenchmarkMultilayerGrowShrink BenchmarkMultilayerGrowShrink-16 48367983 35.78 ns/op
BenchmarkArenaTemporaryData BenchmarkArenaTemporaryData-16 40963460 29.24 ns/op