George Lesica

Uhoh

March 3, 2021 George Lesica1 Comment

GitHub is great, but it’s always nice to have a backup plan in case something goes wrong. That’s where Uhoh comes in. Uhoh lets you back up your GitHub repos quickly and conveniently using, well, Git.

Uhoh queries the GitHub API for a list of repos (which can be filtered by owner and name), then checks its backup location for a clone. If it finds one, then it runs a git pull. If not, it runs a git clone. Either way, you end up with a backup copy of your repos and a tiny bit less to worry about.

Check out Uhoh on GitHub: https://github.com/TravisWheelerLab/uhoh. The README contains basic instructions for use. See the releases page for pre-compiled downloads.

Explaining Inheritance

November 16, 2020November 16, 2020 George Lesica

I like to use metaphors to explain things. Often, they aren’t very good. But once in awhile I come up with something that seems to work pretty well.

Inheritance in Object Oriented design isn’t the easiest thing to explain, particularly the idea of an “abstract” class. But if we think about classes as literal “buckets” of state, then what’s the equivalent of an abstract class? What about a bucket with a hole in it; a leaky bucket?

A leaky bucket isn’t very useful on its own, you have to plug the hole in one way or another. Depending on how you plug the hole, however, you’ll get different outcomes. Plug the hole with a hose and it can help you water your garden. Plug the hole with a piece of screen or cloth and you’ve built yourself a filter. Plug the hole with a bit of waterproof tape and, well, now the leaky bucket is just a bucket.

Each of these tools requires holding or collecting water, like the shared data and / or behavior present in an abstract class. But each of them requires its own, separate, implementation in order to work properly; this is the specialization of a concrete class.

Your mileage may vary, but I’ve found this to be a useful metaphor when I explain inheritance to students.

Automatic UML diagrams for Dart

May 25, 2019June 3, 2019 George Lesica

Most of the code I write at work is in the Dart programming language. Specifically, I’ve worked on some internal tooling that involved the static analysis libraries that ship alongside the language SDK. So when a colleague mused that he would like to be able to automatically generate UML diagrams for some of our libraries, I jumped on it. The result was the DCDG package and tool.

The first step is to install it:

$ pub global activate dcdg

Run it like this:

$ dcdg --help

You may need to add Dart’s package directory to your path before this will work, otherwise you can use pub global run dcdg instead.

To use the tool, navigate to a Dart on your file system and run the tool. By default, this will print the PlantUML markup for a pretty standard UML diagram for the package in question. You can either redirect this output to a file (dcdg > output.puml) or use the -o option to specify an output file (dcdg -o output.puml).

To turn the markup into a PNG image, just run the plantuml command on the file you created with DCDG (plantuml output.puml), this will create a file called output.png. You can also redirect the DCDG output directly into PlantUML by using the -p option, in which case the PNG image will be written to stdout, so you’ll need to redirect it to a file yourself:

$ dcdg | plantuml -p > output.png

There are a plethora of options (perhaps too many, actually). Let’s take a look at a few of them. First, it can be helpful to see just the public API for a package represented as a UML diagram, particularly if the diagram is to be included in public documentation. This is easily accomplished with the --exported-only flag:

$ dcdg --exported-only | plantuml -p > output.png

This will produce the image shown below when run on the DCDG codebase itself. DCDG doesn’t export very many classes since it is primarily intended to be used as a command line tool.

`dcdg --exported-only | plantuml -p > output.png`

Alternatively, someone might want to see a diagram with more detail, particularly if they plan to contribute to the code. However, we may want to exclude certain features, such as private class members, from the diagram to reduce clutter.

$ dcdg --exclude-private=field,method | plantuml -p > output.png

`dcdg --exclude-private field,method | plantuml -p > output.png`

It is also possible to pare the diagram down to include only classes that inherit behavior from a particular class. This is done using the --is-a option (which can be provided more than once). The example below includes only the inheritance tree rooted at the abstract DiagramBuilder class from DCDG.

`dcdg --is-a DiagramBuilder | plantuml -p > output.png`

There are quite a few other options as well. See dcdg --help for a full list. The source code can be found on GitHub, bug reports and pull requests are welcome.

Being null-aware

May 22, 2019June 3, 2019 George Lesica

Null, as implemented in Java, JavaScript, and other similar languages, is… troublesome. But why? It doesn’t seem so bad to hear it described. If a variable is a container, a null value just means that the container is empty. This is a perfectly natural state of affairs, so what’s the big deal?

There are a couple ways to look at this issue. The one that speaks to me most clearly is to think about the impact of null values on the types used by our programs and what they mean.

Let’s say we have the following class and function:

class Person {
  String firstName;
  String lastName;
}

void sayHello(Person p) {
  print("hello there, " + p.firstName + "!");
}

The types here represent a contract between the programmer who wrote the function and the programmers who will call the function. The contract says that the caller can pass an instance of Person into the sayHello function and the computer will print a friendly, personalized greeting. If the caller passes anything other than a Person, the function isn’t guaranteed to work right. In most statically typed languages it won’t even compile / run.

Since the function will only ever see instances of Person there’s nothing wrong with immediately referencing the firstName field. After all, a Person has a firstName, that’s part of what it means to be a Person!

But if our language allows null values, the caller could also pass null into this function. This is where the trouble begins: null doesn’t have a firstName field (even though it is a perfectly legal value in any place where a Person is accepted) and the function above will crash if it receives a value with no firstName field!

Effectively, the function above doesn’t accept a Person, it accepts a Person | null (which can be read “person or null”). This is called a sum type or a union type. So every type in this hypothetical language is actually itself or null.

But we’re not even finished, it gets worse. In languages that support actual union types, the compiler / interpreter forces the programmer to handle each of the possible types that compose the union. The author of the function above was allowed to totally ignore the possibility that the function might receive a null value. A similar function that handles null appropriately is shown below:

void sayHello(Person p) {
  if (p == null) {
    print('hello there!');
  } else {
    print("hello there, " + p.firstName + "!");
  }
}

The problem with this solution is twofold. First, languages like Java don’t require this sort of defensive programming, and this has a tendency to cause bugs. Second, it’s impossible to tell if the null check is actually necessary. What if the caller has already checked for a null value?

// ...
Person person = getPerson();
if (person == null) {
  // ...
} else {
  sayHello(person);
}
// ...

In this case, the value passed to sayHello will never be null, so one of the checks is unnecessary. But which one should be removed? That’s unclear. To get around this problem, we can define a new type called Person?. We will give the new type the exact same behavior as the current version of Person, that is, we will allow variables of this type to contain null. Then, we will change the meaning of Person to exclude null as a possible value. I’ve summarized below.

Person p0 = null; // ERROR
Person p1 = new Person(); // allowed
Person? p2 = null; // allowed
Person? p3 = new Person(); // allowed

Now, when someone calls the sayHello function, they must provide an instance of the Person class, and they may not provide a null value. This means that we can safely revert to the original version of the function without the null check. It also means that any null check, if one is necessary, will have to be performed by the caller.

If we would prefer to allow null values then we can change the type to Person?. However, since we have now declared that we accept null values, the compiler or interpreter can force us to check for null, just like it would if we declared the type to be Person | null in a language with union types.

void sayHello(Person? p) {
  print(p.firstName); // ERROR - forgot to check for null
}

So why does this matter? Programming is mostly a human problem. We are almost always the weakest link. This means that anything we can do to make our code easier to write, easier to read, and easier to use, is likely to prevent bugs today and help us add new features tomorrow. By specifying clearly the contracts implied by our code we can produce better software, more easily.

Why I like static types

May 21, 2019 George Lesica

I generally prefer statically typed languages. I just do. All else being equal, I will choose a statically typed language over a dynamically typed language every time. But why? And why did I qualify that statement?

Well, that’s a little more complicated.

Let’s take a look at a simple function that determines whether or not a value is prime. The version below, written in JavaScript, doesn’t technically contain any type information.

function isPrime(n) {
    let i = 2;
    while (i <= Math.sqrt(n)) {
        if (n % i === 0) {
            return false;
        }
        i++;
    }
    return true;
}

When I say that this code doesn’t “technically” contain any types I mean that there are no types specified in the program text. The parameter n, the local variable i, and the return value could be anything, at least as far as JavaScript is concerned. There are, however, some implied types. For instance, the value we provide for n must be in the domain of the % (modulo) operator.

But, as programmers, we care about more than the language-level semantics of our programs.

The idea of a prime number itself only really makes sense for integers greater than 1. It wouldn’t make sense to ask whether the string “hello” is prime, for example. So while JavaScript doesn’t really care about types, we certainly do, because our problem domain (primality) does.

If we rewrite this function in Dart, a language that allows type annotations, we can capture at least some of the semantics of our problem domain within the text of our program.

bool isPrime(int n) {
    int i = 2;
    while (i <= sqrt(n)) {
        if (n % i == 0) {
            return false;
        }
        i++;
    }
    return true;
}

Note that it is no longer possible to pass “hello” to this function (well, you can, but the program won’t run). This is helpful because inputs other than integers don’t make sense within the problem domain anyway. So rather than add code to handle such mistakes, we can change the program so that it will refuse to even compile / run.

The point here is that our problem (finding prime numbers) has types, so it makes sense for our program to have (the same) types.

However, you might have noticed that our second function doesn’t actually have “the same” types as our problem. Ideally, we would like to require that n be an integer greater than 1. Unfortunately, we can’t express this idea with Dart, at least not in a way that would be likely to result in a satisfying experience for users of our function.

While the types don’t get us quite to where we want to be, we can still use runtime checks to finish the job. In this case we can provide some pretty helpful error messages as well. We could also decide to just return false for integers that don’t make sense (this is also a nice example of how we can define errors away). That being said, something is better than nothing, at least in my opinion.

bool isPrime(int n) {
    if (n < 2) {
      return false;
    }
    int i = 2;
    while (i <= sqrt(n)) {
        if (n % i == 0) {
            return false;
        }
        i++;
    }
    return true;
}

Sometimes, the problem domain itself is more difficult to translate into a program and types can help smooth the way. For example, say we have a function that accepts a URL:

function sendRequest(url) {
  // ...
}

What does a URL look like? Do we need the leading “https://”, or is this a situation where we can use “//” to infer the protocol (like the href attribute on an HTML anchor tag)? Furthermore, how do we verify that what we were handed is a valid URL? That could be a lot of work. We could write a reusable function to validate a URL, but if we’re going to go down that road we might as well make it a type.

void sendRequest(Uri url) {
  // ...
}

Once again, our problem domain has types, and by introducing those types into our program we can both simplify our code and make it easier for others to use.

Earlier, I implied that I might choose a dynamically typed language under certain circumstances. A programming language is just a tool, an abstraction over the machine to facilitate human interaction. The right tool for a job depends on the job.

Writing a browser extension, for example, is easily done in JavaScript (although today TypeScript is closing the gap). Elixir / Erlang can be a great choice for scalable server applications. Racket makes it easy to create DSLs. There is a seemingly infinite selection of machine learning and data analysis tools based on Python. In these cases, and others, the ecosystem that surrounds a language can be important enough that it outweighs other considerations, such as static types.

At the end of the day, I prefer statically typed languages because they allow me to represent more of my problem domain in the program itself. But I try hard to remember that the right tool for the job isn’t the one I like best, but the one that is most likely to result in a correct, useful piece of software.