After my first few months of working as a newly minted software developer, I was feeling good. Results from the garden-variety calculation tool I was writing were finally matching up with customer test data.
There were a few other calculations that needed to be added, but since they were smaller and built atop the functionality of the existing calculation, I didn’t think they would take very long. As the development progressed, however, I realized that inaccuracies arising in the new calculations often were caused by underlying issues in the original.
Every time a problem was traced back to the original calculation, I would have to make changes to the existing code that I thought was already stable. Then I had to run the program with the test input parameters to check that the original calculations were still accurate, run the new calculations to check whether the new code fixed the issue, and finally run any other calculations that may have been affected by a change in the code. It took a lot of time to check the same calculations over and over.
At some point I realized — maybe that’s what unit tests are for.
A brief history of unit testing
Software developers have been testing software for as long as they have been writing code, but the ability to automate software testing appeared around the 1980s. (Testing references provides a handy timeline of software testing.) Suddenly, instead of having to run a program manually against lists of test values, or setting breakpoints in the code and tracing the program’s logic step by step, developers could do what they do best — turn testing into another program.
Automated testing allowed developers to write code to test their programs, giving them the ability to quickly run many tests whenever they wanted, without having to put in much effort each time.
Unit Testing 101
The boundaries between different types of automated testing can get a little blurry, but most developers recognize unit testing and integration testing as two of the main subcategories. While unit tests conceptually break a program into discrete pieces and test each piece in isolation, integration tests make sure that the pieces are working properly together as a whole. Unit tests are generally small and don’t take much time to write because they only test a small section of code.
Unit tests can be used as part of the design process, to prevent logic mistakes in complex code, and as contracts that “lock in” a particular programmatic behavior.
Joe Eames, a front-end developer and Pluralsight instructor who has taught several courses on unit testing, said that, although unit testing had been possible earlier, it really became popular in the late ’90s and early 2000s.
“A lot of that had to do with signers of the Agile Manifesto and the agile movement itself,” he said. “Unit testing was a big piece of the agile movement, so as [the movement] made its way through the software development world, unit testing came along with it.”
Eames is referring to a set of software development principles that a group of developers collaborated on back in 2001. The developers were frustrated by inefficient software development practices that relied on extensive upfront planning. Instead, these agile developers advocated for a new way of working that would prioritize short development cycles, frequent feedback from customers and using multiple iterations to get to the final product.
The idea was that large software projects never go exactly according to plan, so it’s better to adjust each incremental phase of development after receiving feedback from the customer, rather than ending up with a product no one wanted.
Unit testing fit into this philosophy nicely. Large batches of tests could be run quickly, which allowed developers to check that programs were always working as expected and could be shown to customers at a moment’s notice. As developers built out the software, running tests reassured them that each new iteration didn’t break the progress made in previous iterations.
Kent Beck, one of the signers of the Agile Manifesto, even evangelized a software development philosophy of writing code for unit tests before writing code for the actual software — called test-driven development — that is still in use today.
Unit testing works by testing “units” of code independently. What constitutes a unit is determined by the developer and varies based on the language and the program.
Unit tests can be used as part of the design process, to prevent logic mistakes in complex code, and as contracts that “lock in” a particular programmatic behavior. They allow the developer to test out code against different combinations of inputs, guarding against mistakes that are more likely to occur in edge cases. And when existing tests from previous iterations of development break after introducing a new feature, developers have the opportunity to fix the issue before it can present a problem to customers in production.
It’s good practice to integrate tests into a company’s continuous integration process for that reason. Unit testing frameworks make it easy for developers to write their own tests, and are often built into the platform developers already use to write code.
Who should be doing the testing?
Even with all the benefits of unit testing, there is a definite downside: You have to spend time actually writing the tests. That’s time that could be spent writing code for the software product, which is ultimately the final judge of value for every company. That’s one of the reasons some companies opt to have dedicated quality assurance (QA) developers who only focus on testing.
For companies that have them, QA developers might be either organized into teams that only consist of QA developers, or each development team could have a member that specializes in QA. Having these QA-specific roles not only allows other developers to focus on writing code for the software product, but also benefits from QA developers’ testing expertise. As in other types of software development, learning how to write tests well can take a lot of time and effort.
“At some level, it’s the same skills that you need to be a good developer,” said Emily Bache, a software consultant specializing in automated testing and agile. “If you’re going to be good at designing software, you need to learn to pay attention to names, to abstractions and to knowing frameworks. It’s all the same skills when you’re writing tests: you need to know the strength of the testing framework, and you also need to pay attention to names, abstractions, loose coupling, high cohesion.”
Other companies make a conscious decision to avoid specialized QA roles.
At Drift, a Boston-based company that provides automated chatbots for marketing and sales teams, developers have always been in charge of writing their own tests.
“We don’t have any QA team, we don’t have QA automation engineers. All of our QA is handled by the engineering team that builds the products and features,” said Freedom Dumlao, chief architect at Drift. “It works really well and we honestly haven’t needed it at all.”
Drift has around 70 developers, who are organized into teams of exactly three developers. Each team is cross functional, having all the necessary front-end, back-end and database skills for producing software, and each developer writes tests for their own code. Dumlao said an advantage of not using dedicated QA teams was that developers wouldn’t feel complacent about pushing code that could contain bugs.
“By asking engineers to take on full responsibility for the quality of their own code, we remove that whole weird social thing that happens where you’re just throwing crap over the wall,” Dumlao said.
“It really does behoove engineers to write tests that ensure that their code works, because that helps them feel comfortable that they’re not going to get a page at 2 a.m. and get the news that their code is breaking.”
This arrangement also helps keep the balance between writing too many and too few tests, because it’s in the developers’ own interests to not write unnecessary tests while still guarding against production issues, Dumlao said.
“If they ship something that has a critical issue, and that critical issue causes customer pain, then they’re going to be on the hook for fixing it rapidly,” he said. “Since we’re a globally deployed company, that could happen at any time, day or night. So it really does behoove engineers to write tests that ensure that their code works, because that helps them feel comfortable that they’re not going to get a page at 2 a.m. and get the news that their code is breaking.”
How to avoid writing low-value tests
But simply restructuring teams doesn’t mean that individual developers will automatically know how to write good tests. A lot of time could still be wasted writing unnecessary tests.
“It’s actually very easy to write what I call ‘low-value tests,’” Pluralsight instructor Eames said. “It’s a test where you’re less likely to experience the value of the test, yet you definitely experience the cost because you actually have to go through the process of writing the test.”
Eames said that learning to avoid writing low-value tests takes a lot of practice.
“I think one of the first things a person should do is over-test,” Eames said. “That means trying to write way too many tests, tests around a whole bunch of stuff. But while you’re doing that, and through the lifetime of the project, start looking at what kinds of tests are actually bringing you value. If you’re not experiencing the benefits, then you can look at that and say, ‘Well, maybe I shouldn’t spend too much time testing these types of things.’”
Eames believes that the only way to truly become good at writing tests is by first writing too many tests. Unfortunately for time-strapped developers, that can be a time-intensive process. Is there a shortcut?
“Taking the opposite approach I find rarely works,” he said, though programming with a developer who has more experience writing tests can help. “There’s no way to transfer knowledge better and more effectively than pair programming with somebody who’s more experienced. That’s probably the only real shortcut that exists.”
What is test-driven development?
Perhaps Eames’ comfort with writing a lot of tests can be explained by test-driven development (TDD), a practice introduced by Kent Beck of writing unit tests before writing code, of which Eames is a practitioner.
In TDD, developers write unit tests as an important part of the design and coding process — not just as a way to check for bugs after a program is already complete. So tests are written prior to the corresponding code they’re testing.
“The first step is to write the test, and it fails,” software consultant Bache said. Because the code that the test is written for doesn’t exist yet, the test either fails to compile or runs into runtime errors, depending on the language. The next step is to write the code so the criteria of the tests are fulfilled, allowing the tests to succeed.
“The first step is to write the test, and it fails.”
“So test-driven development is an iterative, incremental process,” Bache said. “You write a test — one test. You make it pass. You refactor a little, if necessary. You write a new test. So you’re always iteratively and incrementally adding to the code and to the test suite.”
Usually, the types of tests used in TDD are unit tests, but integration tests can also be used in what is called “double-loop TDD,” where inner-loop iterations consist of writing successful unit tests, and the outer loop is made up of integration tests. The purpose in each case is to think about the software on a higher level, about what the code should do, before the developer is distracted by getting into the weeds of coding.
“So you’re looking for weakness in the code,” Bache said. “If I write the test first, then it forces me to call that method before it exists. Writing the test forces me to decide what to name the method, to decide what arguments to pass it, what to call those parameters. So I get to use the method before it exists, which means it forces me to think about that interface. And if I find this awkward, I can then change that interface before the method exists. That’s absolutely the cheapest time to change your mind.”
Eames agrees. He said that TDD helps reduce the time spent on debugging and code maintenance down the road.
“If I’m going to write a program that’s 1,000 lines long, but I’m also on top of it going to write another 1,500 lines of unit tests,” he said, “I can now write 2,500 lines of code total faster than I can write 1,000 lines that’s just code with no test.”
Tests can sometimes discourage refactoring
Writing unit tests in TDD mostly avoids the problem of creating useless tests. Even when developers delete the unit tests afterward, the testing process will already have served a purpose in design and development. However, simply practicing TDD isn’t enough in itself to ensure that all unit tests you write will be useful in the long term.
“They’re not tests at that point, they’re accessories after the fact.”
“If you’ve written the test so it’s too coupled with the implementation, it can fail when you refactor,” Bache said. “And that’s problematic because one of the reasons for having tests is to help you to refactor. So TDD is not a silver bullet for writing good tests.”
At the energetic, philosophical talk he gave to a meet-up of Go developers in London last month, Go developer Dave Cheney pointed to an even more fundamental problem of tests breaking: when you have to go back and change tests because the code has changed, isn’t that kind of cheating?
Since tests are supposed to check that code is behaving a certain way, when you have to change your tests because you decided to change your code’s behavior, it seems to defeat the purpose of having tests. “They’re not tests at that point, they’re accessories after the fact,” Cheney told attendees at one point.
Cheney’s complaint gets at the heart of the purpose unit testing is supposed to serve. Is the point of testing to help with the process of designing programs, like in TDD? Are tests meant to catch logic errors in complex code you’ve written? Or are they a way to enforce contracts between pieces of software that depend on each other?
The answer, of course, is yes — unit tests can serve any and all of these purposes. But it’s important to clarify which purpose they are serving in each given instance, because otherwise they could cause undesired behaviors.
For tests that primarily enforce contracts, such as those that make sure code changes aren’t breaking calls to APIs, changing them can seem to defeat the purpose. Here’s where the semantics can get a little confusing — the tests that enforce contracts are really more integration-type tests than unit-type tests. Integration tests are sometimes included under the label of “unit tests” because both types are automated tests that can be run using a “unit testing framework.”
For the best experience, Cheney recommends in his talk that developers carefully think about how large a “unit” of code being tested really should be, and that developers focus on testing behavior rather than implementation to make excessive changes to tests.
How to avoid constantly breaking your tests
Dumlao said that broken tests caused by code changes is a common problem for developers. “Usually you run into it when you’re really changing the functionality of your software in a way that is somewhat dramatic,” he said.
In addition to being a roadblock to code refactoring, it can also be frustrating for developers to spend time fixing old tests whenever code is changing.
“How are we going to know if we’re breaking the experience for them before they get to see it?”
“Now, I have heard engineers say, ‘Hey, I just don’t think it’s worth investing in testing this because it’s probably going to change later,’” Dumlao said. “But what if it doesn’t? What if we write this and we don’t change it later, then what? Or what if we write this and we do change it later but we’ve got 100 customers who depend on it? How are we going to know if we’re breaking the experience for them before they get to see it? Unit tests are the best way to do that.”
Dumlao said that tests are more likely to break at the beginning of a software project, when code is changing more frequently. Because agile methodology calls for multiple iterations to get to the final product, tests breaking due to code changing is common. Drift confronts this problem by making use of “tracer bullets” — low-effort software changes that get pushed to production to test out a potential new feature.
“They are scoped small intentionally,” Dumlao said. “For example, my tracer bullet might not have a front-end component to it, you might not be able to see anything. But that’s OK, because I can test the APIs.”
Developers at Drift write tracer bullets without accompanying tests, which allows them to quickly feel out new directions for future iterations. Tracer bullets can reduce the number of dramatic code changes later on by helping developers understand the pros and cons of taking the software in a new direction.
Different programming languages require different testing considerations
Aside from the broader questions of how to use unit tests, there are also considerations specific to the programming languages developers use.
“I absolutely do think that it’s more common for people to use tests for back-end languages than front-end,” said Eames, who decided to specialize in front-end development back in 2009. “I became a front-end engineer on purpose because there wasn’t a lot of unit testing being done on the front-end. My first course on Pluralsight was unit testing JavaScript — that kind of made a little bit of a name for myself and started the current phase of my career.”
When unit testing became popular at the turn of the century, Eames said, most products were written using back-end languages. Even websites, which today are built using a significant amount of front-end languages and frameworks, relied mostly on code rendered on the server. As a result, tools created to meet the increase in interest in testing didn’t support front-end languages.
“If you go to a language like C# or Java, you’re going to find really mature unit testing tools, and you’re going to find several really solid options for unit testing,” Eames said. “But if you go to the front end and look at JavaScript, you really kind of have two. And depending on the framework you’re using, you might only have one good option. We’ve been at the front-end testing really solid since maybe 2009. Still, the tools we have for the front end just don’t seem to have the maturity as tools on the back end.”
“One of the last things that people really worry about is how to test it.”
Over time, testing frameworks for front-end languages are starting to catch up, but there is another problem — innovation for front-end languages is just so fast. Innovation on the front-end can be measured in months, and every time a new tool or framework is popularized, developers are busy learning the intricacies of the new system.
“One of the last things that people really worry about is how to test it,” Eames said. “So that sort of falls by the wayside.”
Bache said that, for a dynamically typed language like Python — as opposed to statically typed languages such as Java for C# — it was even more important to have tests.
“It’s a good idea to have tests for our code, because otherwise, the first time it gets executed is in production, essentially, and that can be a little bit dangerous,” she said.
Cheney, who describes Go developers as “a community that sees testing as a first-class activity,” prefers to use the unit testing package that’s built into the standard Go library. “It feels like an anti-pattern to adopt some of the more complicated testing packages that have been ported to Go from other languages; their methodologies reflect those of other language communities and often feel out of place,” Cheney said over email.
Unit testing is a way to make software development more standard and predictable. Today, doing testing is such a widespread practice, and there are so many tools for automated testing that it can be easy to mistake writing tests for a mindless endeavor. But that couldn’t be further from the case — writing good tests requires a lot of practice and careful thought.
Different situations call for different types of tests, different languages present their own unique concerns, and even something like a test breaking could be either a sign that it was badly written or that it was doing its job perfectly. Sure, you can automate your tests, but figuring out when to test, what to test and how to test is far from an automatic decision.