This presentation is about how to change a team’s attitude towards writing automated tests. The
talk covers the same case study as
Groovy vs Java for Testing, adopting
Spock in MongoDB, but this is a more process/agile/people perspective, not a
technical look at the merits of one language over another.
I sadly do not have a lot of time for questions during the presentation, but
thanks to the wonders of modern technology, I have a list of unanswered
questions which I will attempt to address here.
Is testing to find out your system works? Or is it so you know when your
system is broken?
Excellent question. I would expect that if you have a system that’s in
production (which is probably the large majority of the projects we work on),
we can assume the system is working, for some definition of working.
Automated testing is particularly good at catching when your system stops
doing the things you thought it was doing when you wrote the tests (which
may, or may not, mean the system is genuinely “broken”). Regression testing is
to find out when your system is no longer doing what you expect, and automated tests are
really good for this.
But testing can also make sure you implement code that behaves the way you
expect, especially if you write the tests first. Automated tests can be used
to determine that your code is complete, according to some pre-agreed
specification (in this case, the automated tests you wrote up front).
So I guess what I’m trying to say is, when you first write the tests you
have tests that, when they pass, proves the system works (assuming your
tests are testing the right things and/or not giving you false positives).
Subsequent passes show that you haven’t broken anything.
At what level do “tests documenting code” actually become useful? And who
is/should the documentation be targeted to?
In the presentation, my case study is the MongoDB Java Driver. Our users
were Java programmers, who were going to be coding using our driver. So in
this example, it makes a lot of sense to document the code using a language
that our users understood. We started with Java, and ended up using Groovy
because it was also understandable for our users and a bit more succinct.
On a previous project we had different types of tests. The unit and system
tests documented what the expected behaviour was at the class or module
level, and was aimed at developers in the team. The acceptance tests were
written in Java, but in a friendly DSL-style way. These were usually
written by a triad of tester, business analyst and developer, and documented to all
these guys and girls what the top-level behaviour should be. Our audience here
was fairly technical though, so there was no need to go to the extent of trying
to write English-language-style tests, they were readable enough for a
reasonably techy (but non-programmer) audience. These were not designed to be
read by “the business” - us developers might use
them to answer questions about the behaviour of the system, but they didn’t
document it in a way that just anyone could understand.
These are two different approaches for two different-sized
team/organisations, with different users. So I guess in summary the answer is
“it depends”. But at the very least, developers on your own team should be
able to read your tests and understand what the expected behaviour of the
How do you become a team champion? I.e. get authority and acceptance that
people listen to you?
In my case, it was just by accident - I happened to care about the tests
being green and also being useful, so I moaned at people until it happened. But
it’s not just about nagging, you get more buy-in if other people see you
doing the right things the right way, and it’s not too painful for them to
follow your example.
There are going to be things that you care about that you’ll never get other
people to care about, and this will be different from team to team. You have
two choices here - if you care that much, and it bothers you that much, you
have to do it yourself (often on your own time, especially if your boss
doesn’t buy into it). Or, you have to let it go - when it comes to quality,
there are so many things you could care about that it might be more
beneficial to drop one cause and pick another that you can get people to care
For example, I wanted us to use assertThat instead of assertFalse (or
true, or equals, or whatever). I tried to demo the advantages (as I saw
them) of my approach to the team, and tried to push this in code reviews, but
in the end the other developers weren’t sold on the benefits, and
from my point of view the benefits weren’t big enough to force the issue.
Those of us who cared, used assertThat. For the rest, I was just happy
people were writing and maintaining tests.
So, pick your battles. You’ll be surprised at how many people do get on board
with things. I thought implementing checkstyle and setting draconian
formatting standards was going to be a tough battle, but in the end people
were just happy to have any standards, especially when they were enforced
by the build.
Do you report test, style, coverage, etc failures separately? Why?
We didn’t fail on coverage. Enforcing a coverage percentage is a really good
way to end up with crappy tests, like for getters/setters and constructors
(by the way, if there’s enough logic in your constructor that it needs a
test, You’re Doing It Wrong).
Generally different types of failures are found by different tools, so for
this reason alone they will be reported separately - for example, checkstyle
will fail the build if it doesn’t conform to our style standards, codenarc
fails it for Groovy style failures, and Gradle will run the tests in a
different task to these two.
What’s actually important, though, is time-to-failure. For checkstyle, for
example, it will fail on something silly like curly braces in the wrong place.
You want this to fail within seconds, so you can fix the silly mistake
quickly. Ideally you’d have IntelliJ (perhaps) run your checks before it even
makes it into your CI environment. Compiler errors should, of course, fail
things before you run a test, short-running tests should fail before
long-running tests. Basically, the easier it is to fix the problem, the
sooner you want to know, I guess.
Our build was relatively small and not too complex, so actually we ran all
our types of tests (integration and unit, both Groovy and Java) in a single
task, because this turned out to be much quicker in Gradle (in our case) than
splitting things up into a simple pipeline.
You might have a reason to report stuff separately, but for me it’s much more
important to understand how fast I need to be aware of a particular type of
Sometimes I find myself modifying code design and architecture to enable
testing. How can I avoid damaging design?
This is a great question, and a common one too. The short answer is: in
general writing code that’s easier to test leads to a cleaner design anyway
(for example, dependency injection at that appropriate places). If you find
you need to rip your design apart to test it, there’s a smell there somewhere
- either your design isn’t following SOLID principals, or you’re trying to
test the wrong things.
Of course, the common example here is testing private methods - how do you test
these without exposing secrets1? I think for me, if it’s important
to be tested
it’s important enough to be exposed in some way - it might belong in some
sort of util or helper (right now I’m not going to go into whether utils or
helpers are, in themselves a smell), in a smaller class that only
provides this sort of functionality, or simply a protected method. Or, if
you’re testing with Groovy, you can access private methods anyway so this
becomes a moot point (i.e. your testing framework may be limiting you).
In another story from LMAX, we found we had created methods just for testing. It seemed a
bit wrong to have these methods only available for testing, but later on down
the line, we needed access to many of these methods In Real Life (well, from
our Admin app), so our testing had “found” a missing feature. When we came
to implement it, it was pretty easy as we’d already done most of it for
My co-workers often point to a lack of end-to-end testing as the reason why
a lot of bugs get out to production even though they don’t have much unit
tests nor integration tests. What, in your experience, is a good balance
between unit tests, integration tests and end-to-end testing?
Hmm, sounds to me like “lack of tests” is your problem!
How did you go about getting buy in from the team to use Spock?
I cover this in
my other presentation on the topic - the
short version is, I did a week-long spike to investigate whether Spock would
make testing easier for us, showed the pros and cons to the whole team, and
then led by example writing tests that (I thought) were more readable than
what we had before and, probably most importantly, much easier to write than
what we were previously doing. I basically got buy-in by showing how much
easier it was for us to use the tool than even JUnit (which we were all
familiar with). It did help that we were already using Gradle, so we already
had a development dependency on Groovy. It also helped that adding Spock made
no changes to the dependencies of the final Jar, which was very important.
Over time, further buy-in (certainly from management) came when the new tests
started catching more errors - usually regressions in our code or regressions in
the server’s overnight builds. I don’t think it was Spock specifically
that caught more problems - I think it was writing more tests, and
better tests, that caught the issues.
Can we do data driven style tests in frameworks like junit or cucumber?
I don’t think you can in JUnit (although maybe there’s something out there). I
believe someone told me you can do it in
Are there drawbacks to having tests that only run in ci? I.e I have Java 8
on my machine, but the test requires Java 7
Yes, definitely - the drawback is Time. You have to commit your code to a
branch that is being checked by CI and wait for CI to finish before you find
In practice, we found very little that was different between Java
7 and 8, for example, but this is a valid concern (otherwise you wouldn’t be
testing a complex matrix of dependencies at all).
In our case, our Java 6 driver used Netty for async capabilities, as the
stuff we were using from Java 7 wasn’t available. This was clearly a
different code path that wasn’t tested by us locally as we were all running
Java 8. Probably more importantly for us is we were testing against at least
3 different major versions of the server, which all supported different
features (and had different APIs). I would often find I’d broken the tests
for version 2.2 as I’d only been running it on 2.6, and had forgotten to
either turn off the new tests for the old server versions, or didn’t realise
the new functionality wouldn’t work there.
So the main drawback is time - it takes a lot longer to find out about these
errors. There are a few ways to get around this:
Commit often!! And to a branch that’s actually going to be run by CI
Make your build as fast as possible, so you get failures fast (you
should be doing this anyway)
You could set up virtual machines locally or somewhere cloudy to run these
configurations before committing, but that sounds kinda painful (and to my
mind defeats a lot of the point of CI).
I set up Travis on my fork of the project, so I could have that running a
different version of Java and MongoDB when I committed to my own fork - I’d
be able to see some errors before they made it into the “real” project.
If you can, you probably want these specific tests run first so they can
fail fast. E.g. if you’re running a Java 6 & MongoDB 2.2 configuration on
CI, run those tests that only work in that environment first. Would
probably need some Gradle magic, and/or might need you to separate these
into a different set of folders. The advantage of this approach though is
if you set up some aliases on your local machine you could sanity check
just these special cases before checking in. For example, I had aliases to
start MongoDB versions/configurations from a single command, and to set
JAVA_HOME to whichever version I wanted.
Do you have any tips for unit tests that pass on dev machines but not on
Jenkins because it’s not as powerful as our own machines? E.g. Synchronous
calls timeout on the Jenkins builds intermittently.
Erk! Yes, not uncommon. No, not really. We had our timeouts set longer than I
would have liked to prevent these sorts of errors, and they still
intermittently failed. You can also set some sort of retry on the test, and
get your build system to re-run those that fail to see if they pass later.
It’s kinda nasty though.
If you ask where are tests and dev asks if code is correct? And you say yes.
Then dev asks why you’re delaying shipping value, how do you manage that?
These are my opinions:
Your code is not complete without tests that show me it’s complete.
Your code might do what you think it’s supposed to do right now, but given
Shared Code Ownership, anyone can come in and change it at any time, you
want tests in place to make sure they don’t change it to break what you
thought it did
The tests are not so much to show it works right now, the tests are to
show it continues to work in future
Having automated tests will speed you up in future. You can refactor
more safely, you can fix bugs and know almost immediately if you broke
something, you can read from the test what the author of the code thought
the code should do, getting you up to speed faster.
You don’t know you’re shipping value without tests - you’re only shipping
code (to be honest, you never know if you’re shipping value until much later
on when you also analyse if people are even using the feature).
Testing almost never slows you down in the long run. Show me the bits of your
code base which are poorly tested, and I bet I can show you the bits of your
code base that frequently have bugs (either because the code is not really
doing what the author thinks, or because subsequent changes break things in
If you say code is hard to understand and dev asks if you seriously don’t
understand the code, how do you explain you mean easy to understand without
thinking rather than ‘can I compile this in my head’?
I have zero problem with saying “I’m too stupid to understand this code, and
I expect you’re much smarter than me for writing it. Can you please write it
in a way so that a less smart person like myself won’t trample all over your
beautiful code at a later date through lack of understanding?”
By definition, code should be easy to understand by someone who’s not the
author. If someone who is not the author says the code is hard to understand,
then the code is hard to understand. This is not negotiable. This is what
code reviews or pair programming should address.
What is effective nagging like? (Whether or not you get what you want)
Mmm, good question. Off the top of my head:
Don’t make the people who are the target of the nagging feel stupid -
they’ll get defensive. If necessary, take the burden of “stupidity” on
yourself. E.g. “I’m just not smart enough to be able to tell if this test is
failing because the test is bad or because the code is bad. Can you walk me
through it and help me fix it?”
Do at least your fair share of the work, if not more. When I wanted to
get the code to a state where we could fail style errors, I fixed 99% of the
problems, and delegated the handful of remaining ones that I just didn’t
have the context to fix. In the face of three errors to fix each, the team
could hardly say “no” after I’d fixed over 6000.
Explain why things need to be done. Developers are adults and don’t want
to be treated like children. Give them a good reason and they’ll follow the
rules. The few times I didn’t have good reasons, I could not get the team
to do what I wanted.
Find carrots and sticks that work. At LMAX, a short e-mail at the start
of the day summarising the errors that had happened overnight, who seemed to
be responsible, and whether they looked like real errors or
intermittencies, was enough to get people to fix their problems2 -
they didn’t like to look bad, but they also had enough information to get right
on it, they didn’t have to wade through all the build info. On occasion,
when people were ignoring this, I’d turn up to work with bags of chocolate
that I’d bought with my own money, offering chocolate bars to anyone who
fixed up the tests. I was random with my carrot offerings so people didn’t
game the system.
Give up if it’s not working. If you’ve tried to phrase the “why” in a
number of ways, if you’ve tried to show examples of the benefits, if you’ve
tried to work the results you want into a process, but it’s still not
getting done, just accept the fact that this isn’t working for the team.
Move on to something else, or find a new angle.
1 I had a colleague at LMAX who was working with a hypothesis that
All Private Methods Were Evil - they were clearly only sharable within
single class, so provided no reuse elsewhere, and if you have the same bit
of code being called multiple times from within the same class (but it’s
not valuable elsewhere) then maybe your design is wrong. I’m still
pondering this specific hypothesis 4 years on, and I admit I see its pros
and cons. 2 This worked so well that this process was automated by
one of the guys and turned into a tool called AutoTrish, which as far as I
know is still used at LMAX. Dave Farley talks about it in some of his
Continuous Delivery talks.
What can you do to help developers a) write tests b) write meaningful tests
and c) write readable tests?
Trisha will talk about her experiences of working in a team that wanted to
build quality into their new software version without a painful overhead -
without a QA / Testing team, without putting in place any formal processes,
without slavishly improving the coverage percentage.
The team had been writing automated tests and running them in a continuous
integration environment, but they were simply writing tests as another tick
box to check, there to verify the developer had done what the developer had
aimed to do. The team needed to move to a model where tests provided more
than this. The tests needed to:
Demonstrate that the library code was meeting the requirements
Document in a readable fashion what those requirements were, and what should happen under non-happy-path situations
Provide enough coverage so a developer could confidently refactor the code
This talk will cover how the team selected a new testing framework (Spock, a
framework written in Groovy that can be used to test JVM code) to aid with
this effort, and how they evaluated whether this tool would meet the team’s
needs. And now, two years after starting to use Spock, Trisha can talk
about how both the tool and the shift in the focus of the purpose of tests
has affected the quality of the code. And, interestingly, the happiness
of the developers.