Never Complains

Occasionally I hear someone compliment a software developer by observing that the individual never complains, even when things get ugly. Now, I realize that sometimes things happen that are beyond our control. I also realize that complaining about things we can’t change, while sometimes cathartic, is almost never materially helpful. I also realize that too much negativity can reduce the effectiveness of a team and its members.

All that being said…

Another way to spell “complaint” is “feedback”, and we know that feedback can be helpful. We deliberately solicit feedback from users (and quite often what we get are actually complaints). So why wouldn’t we encourage feedback (even if it sometimes rises to the level of complaints) from our teammates?

When we encourage one another to keep silent about problems in order not to be seen as “complaining” we can miss opportunities to improve our development processes and team dynamics. We may also miss out on potential process innovations that could improve life for ourselves and our users.

So, rather than encourage silent acceptance of whatever might occur, try to promote constructive complaints (feedback) among your teammates.

Talking about Kotlin

A couple weeks ago I gave a talk about Kotlin to a Montana Programmers / Missoula GDG meetup. It seemed to go over rather well, so I’m linking the slide here.

I think the most interesting takeaway I had was that despite the fact that Kotlin interfaces with Java pretty much seamlessly, you can’t let them mingle too much or you’ll give up the null-safety you would otherwise get from Kotlin, and you definitely don’t want to give it up.

Privilege for computer scientists

As I was walking home from work a few nights ago I was thinking about privilege. I was also thinking about the AI Google built that beat a human champion at Go. Most algorithms that play Go use some form of the Monte Carlo tree search (MCTS) algorithm, the Google algorithm is no exception, though MCTS is only a relatively small part of it.

I know a little bit about MCTS, having read some of the papers on it and implemented it in school (my AI played Connect Four). MCTS is generally most applicable on game trees with a finite depth. In other words, the games must definitely end at some point. This is not true of games like chess or checkers where, in theory, the game could go on forever if the players repeatedly make neutral moves (like moving two pieces back and forth forever).

The reason for this is that MCTS works by choosing moves at random for both players until the game ends, then recording who won. This process is repeated many times (usually until a computational or time budget is exhausted), at which point the move with the best simulated results is chosen. Obviously, the random games must be guaranteed to come to an end or the algorithm wouldn’t work very well.

MCTS operates based on the density of winning outcomes on a particular branch of the game tree. If move A at a particular point in the game results in a win (when random moves are chosen) 70% of the time, and move B results in a win 45% of the time, the algorithm will choose branch A (in reality it is a little more complicated than this, but the idea is the same).

This is pretty much how “privilege” works in real life. At any given node in the tree representing all the decisions each of us makes in our lives, there is some probability that a particular choice will lead to a good outcome. In other words, each branch has a particular density of good outcomes.

Privilege, then, is when a person has a higher density of good outcomes on all of his or her branches.

For example, I read recently that a child with wealthy parents who does not attend college is over twice as likely to end up wealthy than a child with poor parents who does attend college. So when young people decide whether or not to attend college, those with rich parents face better probable outcomes across the whole range of choices. That, in a nutshell, is privilege.

Grouping Tabs

This evening I wrote and uploaded my first Chrome extension (well, technically I wrote one a few years ago, but I never really finished it).

What does it do? It lets you group related tabs together and keeps them grouped together.

Why would anyone want to do this? At any given moment while I’m at work I’m monitoring at least two or three pull requests. I try to keep their corresponding tabs grouped together for easy access, but inevitably they become lost among the 20-30 other tabs I have open.

I could pin them, but that changes the semantics of the tabs themselves and hides the title (even when the title would otherwise be visible). So I have my email and calendar pinned, because I never close those. But I wanted an intermediate state for things like pull requests. Enter “Pseudo Pins“.

Pseudo Pins allows the user to specify one or more regular expressions, which are then matched against the URLs of the tabs in each window. Tabs matching a given expression are pulled to the left and grouped together. The leftmost tabs then correspond to the first regular expression in the list, and so on rightward. The list of expressions is persisted across browser sessions (and will sync across devices if Chrome is set up to do so).

The GitHub repo is here if you are interested:

Reproducible Research

According to this article, two economists attempted to reproduce a number of economics papers that were published in top journals. They were unable to do so in most cases, even when they enlisted the original authors to help.

This result didn’t shock me even a little bit. When I wrote my MA thesis in economics I wanted to employ a particular, and rather obscure statistical technique. I couldn’t find a single book on statistics or econometrics that contained a full description of the technique, it was apparently rather specific to the sub-sub-field in which I was working.

Over a dozen papers claimed to have used it or otherwise discussed it, but zero contained an actual description of what was done to the data to bring about the result.

I finally found a proper description in a masters thesis from someone at a university in Sweden (if I recall correctly) whose adviser had apparently just happened to know what was being done to the data and who had actually taken the time to describe it. The thesis was never peer-reviewed (although the student had apparently graduated successfully, so I felt comfortable relying on it). So while I still had to implement it myself and verify my results, at least I knew where to start.

The situation is even worse given that it is fairly rare (in my admittedly limited experience) for authors in economics (or other social sciences) to publish their original datasets and (perhaps even more importantly) the code that they ran to do their analyses. I suspect that many couldn’t even if they wanted to due to a reliance on tools like Excel and SPSS that do not lend themselves to replicability without significant extra effort.

This is not to say that economists are evil or that there is some kind of conspiracy (although some are evil, and there are almost certainly “conspiracies”, the replicability problem just isn’t evidence of it).

Part of the problem, I think, is that many, or even most, economists never learn about tools they could use to do a better job at promoting replicability. Version control (Git, Subversion) and tools like GitHub or self-hosted alternatives (why don’t universities run these for their faculty?) are a great start. Using proper statistical languages and doing 100% of analysis using code, not “interactive mode” would help as well.

However, the real key, in my opinion, is for people to get comfortable working out in the open. I publicly publish virtually every line of non-trivial code I write. A lot of it is complete garbage, but I publish it because there is simply no reason not to do so. I’m writing my computer science thesis entirely in the open, from the very first paragraph.

I do realize, of course, that the stakes for me are very, very low. Academia is not my career, so I don’t worry about getting “scooped”, or about being attacked by a colleague with a vendetta. But that just means that some people might want to employ a self-imposed embargo before releasing their work. Wait until your grant runs out, or until the paper is actually published, and then put everything online (and no, simply dumping a PDF on arXiv doesn’t count, do it right). I honestly believe that every field would be better off if this were the norm rather than the happy exception.

As an aside, for anyone interested, Roger Peng, a biostatistician at Johns Hopkins, has an excellent Coursera course on reproducible research. I watched some of the lectures and it seemed like a great course on an important topic. As a bonus, Dr. Peng is a fantastic lecturer, his courses on the R programming language are also top notch (and accessible enough for reasonably bright social science students).

Image credit: Janneke Staaks