Optimizing Real World Go

donio · on Jan 7, 2013

I have just tried cross-compiling this with GOARCH=arm and the resulting statically linked executable works nicely on Android. I am sure ps_mem can be made to work too but this is easier and handy.

(Android has some other ways to get this sort of data too but the more tools the better)

laumars · on Jan 7, 2013

I really wish Google released an SDK for writing fully fledged Android apps in Go.

yareally · on Jan 8, 2013

I've been wishing for this for quite a while. Though if they didn't ditch Dalvik as well, it would be more like syntactic sugar than anything. Native code performance on Android is probably wishful thinking for now.

nteon · on Jan 7, 2013

wow, thats super cool!

alec · on Jan 7, 2013

Completely off-topic to optimizing Go, but you may want to look at smem - it does what you implemented (including looking through a subset of processes) but includes a few more useful measures of memory usage that help in the presence of multiple processes from the same binary - http://www.selenic.com/smem/

rartichoke · on Jan 7, 2013

Nice post, I have a question on one piece of it though.

In your final version of splitSpaces() you are calculating the length of b - 1 in the condition of the for loop.

Is Go calculating len(b) - 1 in every iteration or is it smart enough to move it out of the condition at compile time?

mseepgood · on Jan 7, 2013

Go slices (and strings) know their length: http://research.swtch.com/godata So it's not a computation, just a struct member access. And len() is a builtin, not a real function call.

donio · on Jan 7, 2013

Isn't bytes.Fields what you were looking for with splitSpaces?

wolf550e · on Jan 7, 2013

http://golang.org/src/pkg/bytes/bytes.go?s=6894:6924#L282

It uses `unicode.IsSpace` which might be slower than necessary for this use case (after all, how likely is the proc filesystem to use \u2000?). If this were the bottleneck and the program wasn't IO bound anyway, I bet someone could hand-code something clever that skipped multiple space and/or tab characters at a time.

nteon · on Jan 7, 2013

bytes.IndexByte is clever in that way, using SSE instructions: http://golang.org/src/pkg/bytes/asm_amd64.s

I tried using it (https://github.com/bpowers/psm/commit/55bdd3f51c9c61a9247fec...), but it wasn't very helpful, I think because the lines in /proc/$PID/smaps are relatively short.

wolf550e · on Jan 7, 2013

IndexByte is overkill. I meant something like reading four bytes into a register, comparing the value of the register to 0x20202020 and skipping four bytes.

I see that you're using ReadLine(), this has to read the input and look for "\n". As the person who wrote gnu grep said, avoid splitting the input into lines.

After looking at: http://lxr.free-electrons.com/source/fs/proc/task_mmu.c#L549

I suggest the following: For future proofing, first read the lines of the first mapping, and verify that: 1. The "Pss", "Private_clean" and "Swap" lines come in this same order. 2. The numeric values do not begin earlier than byte 17 in each line (i.e. "KernelPageSize: " is still there).

If these assumptions hold, go to the fast-path code. If not, use your existing code as the safe code path (and output a warning that says that since /proc/*/smaps format changed you program's code needs maintenance for performance but probably not correctness).

In the fast-path, do not split into lines. Instead, use Boyer–Moore to look for "\nPss:", "\nPrivate_Clean:" and "\nSwap:" in the input (after pss, lookup private_clean, after private_clean lookup swap, after swap lookup pss). In each of those, skip to byte 17, and fast-skip spaces. Then read digits until "\n" and perform the next string search.

If you verify that not only did the order of the lines not change but no new lines were added between them, you can hard-code the offsets of "\nPrivate_Clean:" and "\nSwap:" from "\nPss:" and not lookup those. Then you only need to lookup the next "\nPss:" (because the file path is variable length).

nteon · on Jan 8, 2013

excellent suggestion, I will definitely try something like that soon.

There are in fact only 2 variable sized lines - the first VMA info line and the last VmFlags line. In the fast path the middle hunk of map info can be accessed as a single []byte of 392 bytes, with constant offsets for the Pss, Private_* and Swap values.

nteon · on Jan 7, 2013

haha, yes that does seem to do what I wanted. Not sure how I missed it. I will test it, but I imagine that since it uses unicode.IsSpace it will be slightly slower.

nteon · on Jan 7, 2013

I've updated the article to mention this, thanks.

willvarfar · on Jan 7, 2013

Lovely! Thank you for sharing. I hope this gets into the standard packaging so it doesn't die unknown.

cmwelsh · on Jan 7, 2013

The viewport is set incorrectly in my iPhone. I can't seem to zoom out either.

nteon · on Jan 7, 2013

sorry to hear that. I believe I've fixed it, but don't have any iDevices to test with.

cmwelsh · on Jan 7, 2013

It works perfectly now, thanks. Great article.