This blog post will be a reflection on our recent experience of porting a reasonably large (~30KLOC) JRuby application to Google Go, talking about the many things we liked about the language and ecosystem, and the couple of things that I found grating about it.
TL;DR
It was a good experience, Go is generally very nice, and we are all comfortable that this was the correct decision for many reasons. I'd recommend that anyone curious about Golang to give it a try; in particular, commit to persisting through the first 2 weeks, where you will most likely find some of Golang's ideologies grating at first (like "why is every line an if statement?")
Why Go?
At UpGuard we are pushing towards agentless deployments. Previously, there was an agent installed on every machine, and the agent would just scan the machine it was running on. Agent performance was never a concern previously because there was never a resource bottleneck. Our future deployments will involves a small number of 'connection managers,' which are agents that SSH into target nodes to scan them remotely. In this scenario, there will be many concurrent SSH connections taking place. The critical problem for us was that the Ruby SSH library is not thread safe when operating on multiple SSH sessions per SSH channel. We rely on this heavily for performance when scanning remote nodes. So we could not achieve our goals using threads within Ruby. One possibility is process separation rather than thread separation, which has many other advantages anyway, but one of the costs of JRuby is the billions of gigs of ram that anything in the JVM requires. If we use process isolation, we must instantiate multiple JVMs, and napkin math yields saddening conclusions quickly. 8 processes at a billion gigs of ram each is 8 billion gigs of ram, which is more than my laptop currently has installed, and is awkward to write down in requirements documents.
So a decision was made to re-write. That's never an easy decision–it means there will be a time when no new business value is being generated as we work to attain feature parity–but now that we've completed the project, we're satisfied that we have made the correct decision. After several months immersed in Golang, I now have a 90% good / 10% bad view of it. Prior to this project I had never used Go before.
Great Aspects
What was great about go for us? In approximate order of importance.
- It is approachable for the whole team. Competent young programmers can pick up Go without having seen it before and be productive within hours. Go guides you towards writing good code with appropriate error handling. There aren't many bad defaults that result in crap. Little discipline/training/experience is needed to start doing the 'right' thing compared to most other languages I have encountered so far.
- Go handles growing pain well, through both code size and developer count. The code remains 'nice' through crank-the-handle feature development. Large codebases with many developers tend to grow accretively; people very rarely want to pull the rug out and 'fix all the shit.' Go handles both well. The type system and good built-in testing allow one to fix major chunks and have the program keep ticking afterwards.
- It has good libraries for JSON, XML (XML still hasn't died yet unfortunately), SSH, database connectivity, and a variety of other things. The libraries are well designed; they compose very well. For example, something used frequently for us is the Dial function provided by an SSH session (basically ssh -L), which has satisfied a particular need for our application very elegantly. It is easy to then attach net/http, connect to remote databases through the forwarded port, etc.
- Good language level support for concurrency & parallelism. Typed channels built into the language is a wonderful feature. Writing parallelized solutions becomes an interesting design challenge and the implementation goes smoothly. The language quietly helps you to realize your implementation, which usually just works after only 20 attempts. Threading in many other languages is a scary and burdensome experience, where I'm usually left worrying about what I've forgotten and exactly how it is going to explode later. Go is one of the more pleasant language I have encountered so far for multithreaded code (albeit with two glaring deficiencies... more on this later), and certainly the most pleasant of the mainstream languages that I have used: c/c++/c#/java/python/ruby/perl/scala.
- Garbage collection. None of the usual anti-GC arguments apply to our use case (or to 95-99% of software for that matter): our application is not real-time, not a game engine, not running on embedded systems, not a kernel, the primary performance bottleneck will be network i/o wait so using C/C++ will not yield a substantial performance gain anyway.
- Go is adequately mature and stable, and there is an active community continuing to push the libraries forwards; the language doesn't seem to be moving anymore. Google appears to be committed to the continued improvement of golang.
- Simple distributables. Go emits a self-contained binary that requires only libc, libpthread and some other standard things. (Well.... libxml as well for us, and some other database libraries). We can make a complete docker image, with the agent + all prerequisites in less than 20MB.
- The Go tools are all great; gofmt, go test, go yacc, goxc was fun whilst it lasted, but doesn't work when using extra libraries (libxml (WAY TO RUIN EVERYTHING XML)).
- Bonus points for at least some cool factor, which will hopefully attract talented engineers.
- It's genuinely exciting to create a new cool shiny thing. This kind of excitement yields long term rewards for business value, because it helps to keep developers motivated and enthusiastic in all timeframes. Constantly band-aiding shitty systems is a bizarre torture that management sometimes inflicts upon teams, which I've never quite understood.
My personal, bike-sheddy, highly opinionated (correct) list of nice things about go includes:
- Strong static type safety is the correct choice for all programs > ~5KLOCs.
- Native > interpreted
- Good language level support for multi-threading and multi-process IPC.
- Bonus points for not being JavaScript
- Bonus points for not being Java
- Bonus points for not being node.js (see points 1, 2, 3, 4, and 6)
- Bonus points for avoiding semantic whitespace (and global interpreter locks (and an attitude that a good FFI makes up for language deficiencies (Hi Tim!)))
Go satisfies all of these points. Especially point 6.
Poor Aspects
What did we find bad about it? (Gratuitous rant warning...)
- IDE support is not quite what other languages have. MSVC is still hard to beat. RubyMine was very nice for the older codebase. Java has a lot of excellent tooling (something needs to auto-generate all the repetitious code & interface stubs...). Go isn't in the same league as these languages, but it is a much younger language too. I'm confident that awesome things will appear soon.
- Go's ideologies are grating at first. 2/3 the code is if blocks for any error that might occur, although I'm not sure what kind of application would have more potential failure paths than what we're building. Eventually this isn't annoying anymore, but when coupled with the next issue...
- No -Wshadow. This has caused problems many times for us. Introduction of new variables that shadow older variables can be problematic, particularly for functions that return multiple values. It is too easy to refactor something and clobber a pre-existing variable in some inner loop. -Wshadow is a much safer default. Perhaps some controlled de-scoping of a variable would be nice, or perhaps introducing an operator that allows persistence of a variable outside an if statement, like this:if $resultToKeep, err := someOperation(foo); err != nil {
// error
}
resultToKeep remains in scope but err doesn't. The alternative is to have linear or cascaded if/else statements, which is ugly, and falls apart as soon as you need to add a loop or something. One nice heuristic that I would like to try is that small names (tmp, err, ok) are allowed to persist and clobber each other, but >3 char variables may not clobber. I'll bet that 'i', 'err' and 'ok' are the most common variable names in most codebases. - Parametric polymorphism. Generics are good. Interfaces are NOT generics. Casting everything through interface{} is dumb. Many times I have wanted to use generics to create small container types around other arbitrary things. Often it is possible (through great refactoring and no net gain) to achieve what I want with interface somehow, but the resultant code is far less clear than generics would have been. Generating Go code from templates is one possibility, but it would require crap like:
Blah<T> {
Data T
}
buildsystem:
make_me_a_Blah_for(int)
make_me_a_Blah_for(string)
make_me_a_Blah_for(ArbitraryComplexTypeThatActuallyExistsInPractice)
- Mutable non-range limited slices. This one can be really dangerous, particularly in multithreaded apps. Consider the code fragment:
array := []string{"path", "with", "a", "glob", "**", "in", "it"}
pathBeforeGlob := array[:4]
pathRemaining := array[4:]
recurseThroughGlobs(pathBeforeGlob, pathRemaining)
// somewhere in recurseThroughGlobs:
filesInThisDirectory := ...
for _, file := range filesInThisDirectory {
if thisDirectoryLooksInteresting(file) {
recurseThroughGlobs(append(pathBeforeGlob, file), pathRemaining)
// BANG!
}
}
What actually happens here is that the slice 'pathBeforeGlob' will just keep using the memory it once had. Appending to pathBeforeGlob will corrupt pathRemaining.
Proof here.
Even simpler, more shocking example here.
Shit. Go knows that pathBeforeGlob was range limited somehow. Rather than corrupt the pathRemaining slice, copy-on-append-past-end-of-range would be a much, much safer default.
General Unstructured Complaining
There have been a few decades of language progress since C that were seemingly ignored... no operator overloading (except for string, which is apparently important enough to deserve it). Nil (not null, the spelling has been ...upgraded?) pointers are no better than they were back in C/C++/Java land 20 years ago. Some syntactic sugar to allow list comprehensions (map, filter etc) would be extremely helpful. Tim Sweeney (whom is a god of sorts) produced a slide deck in 2006 with one snippet that I found particularly interesting:
"For loops in Unreal:
40% are functional comprehensions
50% are functional folds"
Guess which codebases other than the Unreal engine this is true in? ALL OF THEM. The golang justification seems to be that 5 lines to declare a new list, iterate, perform a transform, write next result into the new list is not much. And as a one-off or a ten-off, it isn't much. There are hundreds of these in our (...every) codebase, all adding a paragraph of mess and potential bugs. Various other 'programming good practices' like separating concerns, passing minimal amounts of information etc does have a tendency to encourage constructs like map & filter. The very dangerous alternative is to just pass big objects around, which results in a more tightly coupled (a.k.a. worse) codebase.
Even without language level support for map/filter/folds, generics would go 99% of the way towards solving this because I could just implement this:
func Map(inputList []<InputType>, fn func(<InputType>) <OutputType>) []<OutputType> {
outputList := make([]<OutputType>, len(inputList))
for i, elt := range inputList {
outputList[i] = fn(elt)
}
return outputList
}
But I can't do this either.
And for the grand ranting finale.... the **worst** thing about Go (even worse than the corruptible slices): lack of const correctness, or any compiler enforced immutability except for basic constants. This was a terrible mistake, especially when trying to use concurrency extensively. When using libraries with some complex types, const is a great way to ensure that this particular call graph will not mutate my data structure. It can force some structural separation in the code of the mutable vs immutable data.
Slices and maps in Go are basically pass by reference. Yes, passing a reference by value is passing a reference. Slices can introduce pointer aliasing, and maps are not thread safe through mutation. Concurrent access that will mutate a map must be protected with mutexes, or serialized through a dedicated mutator goroutine, with a few channels set up for the goroutines to talk amongst themselves. What if the data is immutable? Perhaps there is a routine to prepare a data structure, then several other threads that act on the data once available. What I would like to do is have call graph A prepare the data structure, then pass it to call graph B with a const qualifier. B can spawn as many goroutines as is appropriate, read as much as it wants, never write, therefore access to the data within call graph B doesn't require mutexes or any controlled mutator routines. This basic pattern (prepare rw, use threaded ro) is very common in our codebase. Now later on, someone unfamiliar with the codebase realises that they can support X feature requests with a few small tweaks to the data structure in call graph B. Without language level const operators there is no way to prevent this from happening except for red tape. In C/C++, call graph B would have a const-qualified reference to the data, and we're done. In Go, we just have to hope that:
- Whoever sees the code will understand it, either by reading comments, asking us, or just understanding how several hundred lines of code work at a glance.
- They will have the skill and discipline to do the right thing (add flags to A, mutate elsewhere, restructure graph B such that mutation is all in one place, etc)
What is actually likely to happen: a competent engineer will get a feature request, then have a thought process that resembles: "This looks like an easy place to add it. Few lines here... ok done. Add a unit test, existing tests pass, cool I'll post it for review, job done". This is a perfectly reasonable approach, but they have actually introduced a bug. And no, unit tests do not reliably capture subtle race conditions (however, I must admit that I need to play with the go race detector). Three months later, the section starts crashing more regularly, then we burn a week debugging, decide we need to re-write the section because it is has race conditions, and possibly introduce a regression after changing it.
What would be infinitely better:
"Hey I want to do X but your const is being a pain in the arse. What am I supposed to do?"
"Put it there or there instead and I'll be all good."
"Ok rad, thanks."
And then there are no landmines left to step on a few months later. Const makes this sort of thing enforceable, basically inflicting better design upon the codebase through certain restrictions. The long term affect is better code, fewer bugs, higher overall productivity.
Conclusion
Go is a really nice language, with rich, easily composable, well written libraries. It is easier to write a clean, fast, parallel program in go than in most other languages, and the crucial thing that Go does well is to allow a team to write a clean, fast, parallel program with relative ease. The pros described here far outweigh my boo-hoo list of why-can't-Go-be-rust (but seriously? no const?). I don't see this portion of the product requiring a tech refresh for a long time.
Feel free to flame me in comments, but if you don't like const then you're categorically wrong.