Dynamic Pools in Go

Recently, I wanted to make use of theĀ pool pattern, which is generally pretty simple in Go. Specifically, however, I wanted to be able to dynamically cap the level of concurrency for any given set of tasks submitted to the pool.

To clarify, let’s say we have a pool that consists of N workers. For any given job A, consisting of tasks a_1, a_2, ldots, a_n, we want no more than k of the n tasks in A to run concurrently, where k leq n and k leq N. My use case is a system to test HTTP resources. Each job might be a specific set of endpoints. I might want to hit some more “gently” than others, hence the need to dynamically cap the level of concurrency.

Having worked with the Erlang ecosystem a bit, I really like the idea of passing messages between independent “processes”. This is a very natural and fairly simple abstraction.

Go is a little different, though. In Erlang you create a process and then pass around its PID, which can be used to send it messages (like having its address). In Go, a goroutine (which we can think of as a process) is decoupled and independent (although shared mutable state is still possible).

In order to communicate between goroutines, Go makes use of channels, which are like pipes or queues and can be one- or two-way. This means that if you want to spawn a goroutine and then pass it messages, you need to give it a reference to the channel you plan to use.

I’ve included a very simple example below (note that there is a race condition in this code, it doesn’t matter because the point is to illustrate how channels work). In this case I have made the channel accessible to the goroutine using a closure, but I could have also passed it into the function.

https://gist.github.com/glesica/96c398c83f2648c6eed9

This basic pattern can be used to construct a goroutine pool. We can spawn several goroutines that listen on a channel until they receive a task, complete it, send the result back through another channel, then start listening again. They’ll stop listening when the channel is “closed”. We can use a Wait Group, which is similar to a semaphore, to make sure we don’t move on until all the workers are finished.

https://gist.github.com/glesica/7db5e0308589dbfe149a

This is great except for the fact that, given a set of tasks as described above, we might execute up to N of them concurrently depending on the workload of our pool. We need a way to group tasks together into what I called “jobs” above.

One solution (there may be others) is to take advantage of the fact that channels in Go are themselves just values, so they can be passed through other channels. Instead of workers that pull jobs from a shared queue, they can pull queues (channels) from a shared queue (channel).

https://gist.github.com/glesica/2c4ddcc5c9e71cba442b

Note that now our jobs channel is a channel of channels of integers. So before we submit the tasks associated with a particular job, we decide how many of the workers may work on these tasks concurrently and we submit the task channel that many times. The we feed the tasks into the task channel and, at most, that many workers receive our tasks.

A couple of caveats are in order. First, it is perfectly possible that fewer than the maximum number of workers will process the tasks if the rest are busy. In this case a worker will grab a task channel that has already been closed and immediately discard it. For this reason, this strategy might not be the best for long-running pools (eventually you could end up with a lot of closed channels in your queue, maybe that causes a problem for you, maybe it doesn’t).

Another thing to note is that each job (group of tasks) now requires its own channel. This might not be great for situations where each job is quite small and there are many jobs.

In any event, you can play around with the code and see for yourself that it works. Change the “1” on line 26 to a “3” and you should notice that the results come back mixed up instead of in order.

Image credit: Thomas Hawk

Advertisements

Emacs.

The other day I opened up Vim and a bunch of formatting was messed up and things weren’t refreshing properly. Some update had probably broken something. Then I realized that my Vim config was a massive mess (you never realize stuff like that until something breaks).

I’d intended to switch to Emacs eventually, it had been kind of an elaborate dance, but I had always suspected I would end up there. I really like the idea of Lisp and I think using Emacs is actually one of the better ways to get comfortable with it, plus it’s a decent editor, or so I hear.

So now, perhaps sooner than expected, I am an Emacs user (since a couple weeks ago). I’ve got several friends helping me out and providing suggestions, and I’ve already got quite a bit of useful stuff set up. My configuration is on GitHub, because why not?

So far I am quite pleased, but wow, this is going to be a long, interesting journey.

Image credit: XKCD: Real Programmers

Maintenance

I started on a new project last week. I started the way I usually do, with a sketch of how it should work, inputs, outputs, and a general idea of the data flow. Then I did a rough prototype. Progress was pretty fast, partly because the concept is simple, and partly because I wrote something similar for a past employer. After hacking on it over the weekend, I spent yesterday doing “cleanup” to get it ready for actual use. Yesterday afternoon I realized that, despite getting quite a bit done, I felt I had hardly made a dent in my “todo” list for the project.

This made me ponder. I feel as though the speed at which I can complete a given project has fallen since I was a kid. I remember being 13 or so and starting something after dinner, staying up all night, and having a working application in the morning. Has something changed since then? Or is it just my perception or poor memory?

On the one hand, the “quality” of the code I write today is much higher. For instance, when I was a kid I saw no problem with using a text field as the canonical storage for a piece of data. I also remember a lot of deeply nested branching statements and extremely long functions. Code quality certainly explains some of the “slowdown” I perceive.

But there is more to it. I didn’t write crappy code as a kid because I had no choice. In some cases I actually knew better, and I certainly had no shortage of books from which I could have learned the rest. As I thought about my younger self during my walk to the metro station last night I realized that the greatest difference between my younger self and my present self is that my younger self didn’t expect to have to maintain any of the code he wrote. Once it was finished, it was finished, I moved on to the next project (in a way, this reflected the prevailing software release cycle of the time, I’m not sure if that influenced me in any way).

Today, I expect to have to maintain the code I write. Every time I write a line I unconsciously consider whether I’ve just written a check I’ll be asked to cash later. This means I rewrite more lines than I used to, or take longer to write them the first time (more thinking, less coding). It also means that I test and document more.

Oh well, back to (slowly) writing code.

To Mac or not to Mac

I have been a loyal GNU/Linux users since Ubuntu 5.04 (side rant, I have no idea what stupid animal name it had and it drives me crazy that people insist on referring to them by their codenames). Over the years I owned two ThinkPads, a T61 and then later a T430s. I bought ThinkPads because they would “Just Work” with virtually all GNU/Linux distributions.

Recently, however, when it came time for a new laptop I bought a Mac and switched to OSX. I made this choice for three reasons.

First, the quality of the ThinkPad hardware, at least for my purposes, has been falling. You might have noticed, if you’re familiar with ThinkPad model numbers, that I had my T61 for quite a few years, but the T430s is still only one generation old. Why did I replace it so soon? It turned out that if you spill even a tiny amount of liquid (a few drops, caused by dropping a cookie into some milk) in the right place on a 430 series ThinkPad, the trackpad, and the TrackPoint device will stop working, permanently. In fact, if you don’t then disable their drivers in the kernel, you can’t even use the keyboard reliably. To me, this is the result of poor design. I had my T61 for so many years because it stood up to the occasional minor accident.

The second reason I bought a Mac, and this might be the most important, is the battery life. Back when I used a desktop computer I didn’t care much about power efficiency. When I started using a laptop, it was such a step up that plugging in everywhere I went didn’t really bother me. But more recently I found myself frustrated that I was basically tied to the nearest outlet everywhere I went. A MacBook is effectively a giant battery with a computer strapped to it, and that’s just fine with me.

Finally, screen quality played a role in my decision. Back when I bought my T61, pretty much all laptops had dim, washed-out screens. But I expected better by the time I bought my T430s. Unfortunately, Lenovo didn’t deliver. Many, many years ago I owned a Toshiba Satellite with a passive matrix display (the kind where the mouse pointer would get “lost”). I didn’t mind because it was a laptop and that was basically the coolest thing in the world. But my eyes aren’t what they once were, and I actually have real work to do now, so fiddling with (and squinting at) a laptop display is no longer on my list of acceptable activities.

I hope to return to GNU/Linux at some point in the future. But until the hardware ecosystem works itself out, I’ll be sticking with a Mac.

Skip Properties When Deserializing POJOs

A commonly requested feature in Jackson is the ability to “skip” certain properties during deserialization, but include them during serialization. This might be desirable in the case of a computed property.

It might also be desirable if certain properties should never be read from a user request for security or other reasons, but may need to be returned to the user as part of a response. This could be true for several field types such as a user ID, an API key, or, as previously mentioned, a computed field.

Since it appears that the feature won’t be making it into the library anytime soon, I found a reasonably clean way of doing it. There may be other ways to achieve this behavior, mind you, I have only recently begun using Jackson (and Java, really, at least for significant projects) so I’m definitely no expert.

https://gist.github.com/glesica/dc5a1e2059fe5fa9a0b2

The RequestJson class may then be used for deserialization without worry and the ResponseJson class may be used to return the extra properties to the user. In some sense, this is the whole point of inheritance. Requests do not include certain properties (like a user ID), but responses do, in addition to the request properties. I was a bit surprised that in several minutes of Googling, I didn’t see this solution suggested.