Migrated and moved blog

October 7, 2016

For those still following despite little activity: I migrated my blog to Github hosted Jekyll and resurrected the subdomain blog.johanneslink.net. So please delete this RSS feed and add this one instead.

This blog will stay up for a while to keep the comments (which I failed to move over).

On Ordered Sets

April 19, 2016

General purpose OO languages – like Java or C# – come with a library for collection type objects like arrays, lists, and sets. If you throw in a static type system those languages usually differentiate between collection interfaces (List, Queue, Set) and implementing classes (ArrayList, PriorityQueue, HashSet). It’s considered good practice to only expose the interface types in your API and use a suitable implementation under the hood. So far so good.

The Problem

Sometimes however, when designing an API, you feel there’s an interface missing that would better epress the semantics of what you want to communicate. One of the more common examples is an OrderedSet. You want to specify that the collection you return contains any value only once: set semantics. But you also want to tell the user in which order those values should be processed. Here’s a concrete example taken from JUnit 5’s TestIdentifier:

  public class TestIdentifier...
    public Set getTags() {
      return tags;
    }

Tests, which can usually be mapped to a method in a test class, may have tags. Those tags can come from an annotation on the method itself, but also from an annotation at class level or even from a superclass. Since we can imagine tags that contradict each other, e.g. “fast” vs “slow”, a consumer of <code>getTags</code> is interested in knowing the order in which those tags were added – from superclass to class to method itself – so that they can derive tag evaluation priority. In Java there even exists a set implementation that provides a predictable order of values added to a set: LinkedHashSet. All that seems missing is an OrderedSet interface.

There is no such thing as an unambiguous ordered set

Neither Java nor C# provide an ordered set interface and there is a very good reason for it: Set semantics and ordering semantics do not go well together. Here is why:

  • Set semantics require that any value in a set can be there only once.
  • Order by entry semantics require that a) any value that was added earlier is ordered before any value that was added later and b) any value that was added later is ordered after any value that was added earlier.
  • Now consider the “ordered set” [a, b, c]. What’s the ordering after you re-add value a. Is it still [a, b, c] or should it change to [b, c, a]?

If you want to keep set semantics in place you have to break either part a) or part b) of ordering semantics.

What’s the Solution, Then?

The solution is simple but might not feel satisfactory: You have to decide for either dropping ordering in your API or keeping duplicate entries. If you can go with the former, it’s a simplification of your contract and thereby reduces coupling in your code. If you go with the latter, the client of your code will have to figure out an unambiguous way how to deal with duplicate values and their order.

A variation of keeping the duplicates is to introduce a domain specific interface, e.g. PrioritizedTagSet, and define the concrete and unambiguous semantics yourself, but not for all ordered sets, just for your specific use case of it.

What you should not do is to just have a Set interface and document (in Javadoc or elsewhere) that the underlying implementation will take care of some kind of ordering. That way, you’d be hiding an important part of your contract in documentation only.

What about SortedSet?

Sorting is a different beast, although it sounds similiar. Sorting means “ordering a given collection by an external criterion”. In our case we might want to display all tags in alphanumeric “order”. Well, natural languages are a mess when it comes to non-ambiguity.

TL;DR

Set semantics and order by entry semantics contradict each other. Choose one.

Goodbye, JUnit 5

April 16, 2016

Sometimes, despite your best efforts, a conflict within the team cannot be overcome with goodwill or compromises. Something (or someone) has to give. This time it was me.

I wish Marc, Stefan and Matthias all the best for making JUnit 5 a success. The project deserves to become the future of testing on the JVM.

Update:

I have been asked a couple of times if I’m worried about JUnit 5’s fundamental design: No, I’m not. In terms of design and architecture JUnit 5 is headed in the right direction.

On-site TDD

August 1, 2014

One of the recurring questions in TDD workshops is “How do I test private methods?“. My usual answer is worded along the following lines: “You don’t. Any private method should be tested through the public interface. If you think the private method is complex enough for deserving its own test(s), extract it to a public place and test it there.“ I still think this is the best general answer I can give, however, I recently discovered a set of situations that I handle differently.

Imagine yourself trying to implement some non-trivial solution to a problem. By “non-trivial“ I mean that the necessary algorithm is complicated enough so that you cannot oversee all intermediate steps and decision points in your mind alone.  You’re tackling the problem step by step — pardon-me — test by test. At some point, there will be one test that forces you to implement at least part of the algorithm. The tests you created will eventually be sufficient to cover the logic, but they are not fine-grained enough to let you grow the solution in tiny, controllable steps.

Enter On-site TDD. This technique runs a few TDD cycles “on-site“ meaning “directly inside the production code“. The goal is to enable finer-grained TDD without the overhead of having to (temporarily) extract an implementation detail. Let’s demonstrate the technique with an example: Our task is to encrypt a text using columnar transposition: You take a String text and an Integer key, split the text into lines of length key and then assemble the text by columns — top-down and left to right. Here is the encryption table for “the battle will start at daybreak“ with key 7. Ignoring all spaces the resulting cipher text is “tltaheayewrbbitralaetltatsdk“:

columnarTransposition

In order to reduce the usual testing framework noise I’ll go with a simple Groovy script for both test code and production code. I’ll leave it to the astute reader to imagine test classes and production classes.

We start with the trivial case:

assert encrypt('ab', 2) == 'ab'

def encrypt(text, key) {
  text
}

And proceed to an example that requires to really work with the input text:

assert encrypt('abcdef', 3) == 'adbecf'

which can trivially be fulfilled like this:

def encrypt(text, key) {
  if (text == 'abcdef' && key == 3) {
    return 'adbecf'
  }
  text
}

At this point we have several options:

  • Adding another example and then try to come up with the implementation all at once. This is called triangulation.
  • Trying to write a real implementation now and tweak it till the tests pass.
  • Evolving the algorithm and tests step by step — inside the production code. This is what I will show here…

We focus on the non-trivial branch and specify the first piece on our way to a working algorithm, which is splitting the text into individual characters:

def encrypt(text, key) {
  if (text == ‚abcdef' && key == 3) {
    def chars
    assert chars == ['a', 'b', 'c', 'd', 'e', 'f']
    return 'adbecf'
  }
  text
}

Now our tests will fail, but we can easily fix this:

def encrypt(text, key) {
  if (text == 'abcdef' && key == 3) {
    def chars = text.toList()
    assert chars == ['a', 'b', 'c', 'd', 'e', 'f']
    return 'adbecf'
  }
  text
}

Next, we add an assertion for splitting the chars into lines of length 3:

def encrypt(text, key) {
  if (text == 'abcdef' && key == 3) {
    def chars = text.toList()
    assert chars == ['a', 'b', 'c', 'd', 'e', 'f']
    def lines
    assert lines == [['a', 'b', 'c'], ['d', 'e', 'f']]
    return 'adbecf'
  }
  text
}

And again, fixing the broken test is just a matter of looking up the correct method in Groovy’s Development Kit:

def encrypt(text, key) {
  if (text == 'abcdef' && key == 3) {
    def chars = text.toList()
    assert chars == ['a', 'b', 'c', 'd', 'e', 'f']
    def lines = chars.collate(key)
    assert lines == [['a', 'b', 'c'], ['d', 'e', 'f']]
    return 'adbecf'
  }
  text
}

Let’s speed up a bit. Here come assertion and implementation for converting the lines to columns:

def encrypt(text, key) {
  if (text == 'abcdef' && key == 3) {
    def chars = text.toList()
    assert chars == ['a', 'b', 'c', 'd', 'e', 'f']
    def lines = chars.collate(key)
    assert lines == [['a', 'b', 'c'], ['d', 'e', 'f']]
    def columns = lines.transpose()
    assert columns == [['a', 'd'], ['b', 'e'], ['c', 'f']]
    return 'adbecf'
  }
  text
}

The last but one step is flattening the columns:

def encrypt(text, key) {
  if (text == 'abcdef' && key == 3) {
    def chars = text.toList()
    assert chars == ['a', 'b', 'c', 'd', 'e', 'f']
    def lines = chars.collate(key)
    assert lines == [['a', 'b', 'c'], ['d', 'e', 'f']]
    def columns = lines.transpose()
    assert columns == [['a', 'd'], ['b', 'e'], ['c', 'f']]
    def cryptedChars = columns.flatten()
    assert cryptedChars == ['a', 'd', 'b', 'e', 'c', 'f']
    return 'adbecf'
  }
  text
}

What remains is assembling the letters into a string:

def encrypt(text, key) {
  if (text == 'abcdef' && key == 3) {
    def chars = text.toList()
    assert chars == ['a', 'b', 'c', 'd', 'e', 'f']
    def lines = chars.collate(key)
    assert lines == [['a', 'b', 'c'], ['d', 'e', 'f']]
    def columns = lines.transpose()
    assert columns == [['a', 'd'], ['b', 'e'], ['c', 'f']]
    def cryptedChars = columns.flatten()
    assert cryptedChars == ['a', 'd', 'b', 'e', 'c', 'f']
    def result = cryptedChars.join('')
    assert result == 'adbecf'
    return result
  }
  text
}

Now we can get rid of asserts and the special-case branch:

assert encrypt('ab', 2) == 'ab'
assert encrypt('abcdef', 3) == 'adbecf'

def encrypt(text, key) {
  def chars = text.toList()
  def lines = chars.collate(key)
  def columns = lines.transpose()
  def encryptedChars = columns.flatten()
  def result = encryptedChars.join('')
  return result
}

Et voilà, we arrived at a working algorithm in tiny steps; much tinier than would have been possible by sticking to assertions within the test class only. Of course, you should choose the step size according to your knowledge of language and domain. When in doubt, take a smaller step to stay in full control.

Most of the times I am happy with deleting the assertions now that they’ve fulfilled their duty. When I feel they should stick around after all, I will make real tests out of them by extracting the logic into a class of its own and moving the assertions to a test class – an existing or a new one, depending on where I extracted the code to.

One precondition for doing On-site TDD is the ability to write assertions – or something to the same effect – inside your production code without thereby creating a dependency on the test framework. If you cannot do that, there is another way of achieving something similar: First, move the parts of the production code you want to evolve over to your test class. Second, go about implementing your solution in the way I’ve demonstrated above. Last, move the code back to the production class. This is – by the way – what you’re supposed to do when practicing “TDD as if you meant it“.

As always, feedback and criticism is more than welcome!

Update 1: REPL

As some of the commenters on twitter mentioned: When you’re lucky enough to use a language with a decent REPL, most (if not all) of the On-site TDD steps can be done there. When using a REPL with inline evaluations (e.g. light table) you might even forgo the assertions completely since you do see the values of the temp vars anyway.

Update 2: Outside-In

One of the commenters remarked that On-site TDD looks like mostly useful for inside-out (or bottom-up) TDD. So far I have been using it exclusively in inside-out situations. Trying to imagine useful outside-in scenarios is not straightforward – at least not to me. As far as my experiments went, using a dependency was never complicated enough that On-site TDD seemed necessary. But hey, if YOU come up with a good example, PLEASE let me know.

Veterans of TDD: Lasse Koskela

May 26, 2014

I “recorded” episode 6 using email ping pong with Lasse. Since I knew that his company is involved in funding start-ups, the TDD vs learning-what-product-to-built angle was especially interesting to me.


Lasse Koskela has been practicing test-driven development for over a decade. He has written two books on the topic and introduced many teams to the technique. Lasse works as “Principal Generalist” for Reaktor. You can reach him through lasse.koskela@reaktor.fi.

Q: When was your first contact with TDD and what did you think about it at the time?

Lasse: It’s hard to say exactly when the first contact was. I have a goldfish-like memory but I do remember reading up on Extreme Programming around 2001 when I was switching jobs and soon started promoting agile methods within the company so I guess it was around 2001 when I first learned about TDD. It would take a couple of years for me to become really infected, though.

Q: What did eventually convince you that TDD is a worthwhile approach?

Lasse: There’s no one specific thing or moment when that happened but I’ll tell you a story that kind of describes how I got to the point of being convinced: In the early 2000’s I was working on projects at big companies – Fortune 500 type of places – meaning there were usually a lot of people involved, responsibilities were spread around, half of the project staff seemed to be managers, and there was hardly any test automation to speak of, not to mention manual environment setup, deployment, and things like that. Very often, deploying a new version of a web application to a testing environment would go through a third party infrastructure team and a separate testing team might take a couple of days before they would actually get to test what you had built.

In that place I couldn’t afford to deliver code that didn’t work – the slow feedback loop would have cost my sanity – so I started writing unit tests more thoroughly, refactoring code more, and looking at integration and system-level test automation. On one project we even got a non-technical business representative write executable acceptance tests for us in an Excel spreadsheet, which we would then feed into our homegrown test automation framework.

At some point, being in this mental mode of wanting to write a lot of unit tests, seeing a clear value to creating them and repeatedly executing them, it wasn’t much of a jump to give TDD a try. I had read about the method before and “understood” how it was supposed to work and how it would purportedly help me. So I started doing it. And kept trying. I vaguely remember it taking maybe 6 months before TDD had become second nature but all that time I “knew” that it worked, despite me forgetting and slipping every now and then. I felt like programming test-first made perfect sense.

I credit much of that transition being so unnoticeable to the fact that I was already intimately familiar with unit testing and the kind of designs that make unit testing easy or difficult. I was already bought into the idea that code should be modular and testable. TDD simply made it easier to end up with such designs. At times it felt like magic even though it’s really a very simple dynamic.

Q: What has changed in the way you practice and teach TDD since the early days?

Lasse: Well, for one thing I don’t teach TDD as much as I used to. Nowadays I seem to be doing a small handful of trainings a year whereas in 2005-2006 I remember doing a class every other week.

What’s changed in my practice? Clearly the biggest thing is that I find myself doing less test-first programming than before. That’s not because I wouldn’t find TDD useful but rather because certain system conditions required for test-first programming aren’t in place.

Q: This is closely related to my next question: Are there situations in which you consider TDD not to be the right approach for developing software? If so, what other techniques and approaches would you recommend in those situations?

Lasse: One specific situation, which is why I’ve recently done less test-first programming, is one where you simply don’t know what you want the code to do. For instance, if it’s the first time our team is doing video recording on an Android device I have no idea what API’s the code should call, in which order, what constitutes as valid input, etc. I can read all the documentation I want but really the best approach I know of in such a situation is to go ahead and hack something together that seems to work. You might call it a spike, a prototype or exploration but it wasn’t done test-first. And, me working with all these new technologies and API’s, I find myself doing that kind of exploration much more often than back in the day when I stayed firmly within my technical expertise, Java-based web apps and backend systems.

Q: What do you think is TDD’s relevance in today’s world of lean startups, functional and concurrent programming, continuous delivery and mobile devices everywhere?

Lasse: I’ve been pondering that every now and then, especially regarding the startup context. Our company funds early stage startups through our Reaktor Polte arm and I get to advise and consult these startups every now and then. What it boils down to is that whether the code is written test-first or test-last generally isn’t even near my major concern. I guess it’s a bit like agile development in that the first thing to do is to make sure we’re doing the right thing. Once you start feeling comfortable that this is the case, that’s when it starts to make much more sense to invest your energy in doing things right.

Some people say that you don’t need to do TDD if the software you’re working on doesn’t have to work. In a way, for a startup that’s kind of the case. The code doesn’t have to be bulletproof from day one. It doesn’t have to be sustainable. On the other hand, when you’ve found an angle that does generate revenue it becomes increasingly important that the technology is solid and maintainable.

Many thanks, Lasse, for answering my questions!


 

Other episodes of the series: