Skip to content

Writing tests in HTSJDK

Yossi Farjoun edited this page Apr 16, 2017 · 3 revisions

HTSJDK is a Java library that uses a mixture of Scala w/Scalatest and Java w/TestNG for testing. Tests are run via the gradle test task, which invokes Scalatest which in turn runs all Scalatest suites and all of HTSJDK's TestNG suites.

Testing w/Java & TestNG

To write a test in Java you simply need to create a new class that extends htsjdk.HtsjdkTest. HtsjdkTest is a superclass that creates a bridge between TestNG and Scalatest and allows tests to be written using TestNG constructs (e.g. @Test and Assert.assert(), but be run by Scalatest. There are hundreds of examples of TestNG tests in HTSJDK to serve as examples if you are unfamiliar.

Location: Place your Java test under the java subdirectory in src/test/java

Testing w/Scala & Scalatest

NOTE: Since HTSJDK is a Java library, not all developers are familiar with Scala. As a result care should be taken to use features of Scala in a way that is relatively easy for Java developers to read and understand. Specifically the following should probably be avoided:

  • Defining methods using operator-like names
  • Type gymnastics
  • Overuse of _ as the default parameter name

To create a test using Scalatest create a scala class that extends htsjdk.UnitSpec. We use the name UnitSpec as it's customary in Scalatest for Unit tests. If you haven't written a Scalatest test before it may be worth reviewing:

  1. A brief introduction to scala: http://docs.scala-lang.org/tutorials/scala-for-java-programmers.html
  2. A simple test in HTSJDK: https://github.com/samtools/htsjdk/blob/master/src/test/scala/htsjdk/samtools/util/StringUtilTest.scala
  3. The Scalatest documentation on Matchers: http://www.scalatest.org/user_guide/using_matchers

Location: Place your Scala test under the scala subdirectory in src/test/scala

Let's take a look at the tests for StringUtil.hammingDistance() in more detail:

  "StringUtil.hammingDistance" should "return zero for two empty sequences" in {
      StringUtil.hammingDistance("", "") shouldBe 0
  }

  Seq(("ATAC", "GCAT", 3), ("ATAGC", "ATAGC", 0)).foreach { case (s1, s2, distance) =>
      it should s"return distance $distance between $s1 and $s2" in {
        StringUtil.hammingDistance(s1, s2) shouldBe distance
      }
  }

  it should "be case sensitive" in {
    StringUtil.hammingDistance("ATAC", "atac") shouldBe 4
  }

  it should "count Ns as matching when computing distance" in {
    StringUtil.hammingDistance("nAGTN", "nAGTN") shouldBe 0
  }

  it should "throw an exception if two strings of different lengths are provided" in {
    an[Exception] shouldBe thrownBy { StringUtil.hammingDistance("", "ABC")}
    an[Exception] shouldBe thrownBy { StringUtil.hammingDistance("Abc", "wxyz")}
  }

The general form of a single test within a suite is:

["ThingBeingTested"|it] should "desired behavior" in {
    test code
}

The first time some class or method is tested you use its name, in quotes, to the left of should. For subsequent tests you use the shorthand it to refer to the last thing tested. This both helps identify where the thing being tested changes, and helps IDEs group tests together when running interactively. To the right of the should you should define small and specific test cases, so that when the test fails it's immediately apparent what failed. For example, the five test cases above are vastly preferable to a single "StringUtil.hammingDistance" should work as expected in { ... }` that then contains multiple tests.

Within a test case, you write code that executes the desired code and then uses matchers (should statements) to verify the outputs. The following shows some useful examples, but there are lots of options so take a look at the scalatest docs!:

x shouldBe true // x must == true
x shouldBe 7    // x must == 7
x shouldBe 7.0 +- 0.001  // x must be between 6.999 and 7.001
xs should have length 10 // xs must be a String or array or collection-like object with length 10
xs shouldBe empty        // xs must be a type with an isEmpty method that returns true
xs should contain theSameElementsAs Seq(1,2,3) // xs must be a collection type containing three elements (1,2,3) in any order
xs should contain theSameElementsInTheSameOrderAs Seq(1,2,3) // xs must be a (sorted) collection type containing three elements (1,2,3) in that order

There's also a nice syntax for testing that code throws an exception:

"throw" should "throw an exception or something is seriously broken" in {
   an[SAMFormatException] shouldBe thrownBy { throw new SAMFormatException("Yikes!") }
}

The should statements can be thought of as special syntax that generates a test case and registers it with the testing framework. As a result, and as shown by the second test above, they can be put inside loops to generate many test cases:

  Seq(("ATAC", "GCAT", 3), ("ATAGC", "ATAGC", 0)).foreach { case (s1, s2, distance) =>
      it should s"return distance $distance between $s1 and $s2" in {
        StringUtil.hammingDistance(s1, s2) shouldBe distance
      }
  }

This code generates two test cases, because the foreach body is executed twice - each time being handed a three tuple with a pair of strings (s1 and s2 as we've called them) and the expected distance between them. Scala's quasiquote syntax (s"") is used to interpolate variables into a String to give each test case its unique name.