|HOME PATTERNS RAMBLINGS ARTICLES TALKS DOWNLOAD BOOKS CONTACT|
April 28, 2006
Some of the most hated people in San Francisco must be the meter maids, the DPT people who drive around in golf carts and hand out tickets to anyone who overslept street cleaning or did not have enough quarters for the meter. On some projects, the most hated people are the metric maids, the people who go around and try to sum up a developer’s hard work and intellectual genius in a number between 1 and 10.
Many managers love metrics: “You can’t manage it if you can’t measure it”. I am actually a big proponent of extracting and visualizing information from large code bases or running systems (see Visualizing Dependencies). But when one tries to boil the spectrum between good and evil down to a single number we have to be careful as to what this number actually expresses.
A frequently used metric for the quality of code is test coverage, typically expressed as a percentage computed by the (executable) lines of code that are hit by test cases over the total number of (executable) lines of code. If I have 80% test coverage this means that 80 out of 100 lines are being hit by my tests. It is a fairly safe statement to claim that a code module with 80% test coverage is better than one with 20% coverage. However, is one that has 90% really better than one that has 80%? Are two modules with 80% coverage equally “good”? We quickly find that a single number cannot do justice to all that is going on in a few thousand lines of source code. Still, finding an abstraction is nice so let’s see how far we can get with this metric.
First we need to qualify the metric with how it is obtained. 80% coverage obtained
via fine-grained, automated unit tests probably counts more than 80% coverage obtained
via coarse regression tests. We need to keep in mind that code coverage counts every
line that was hit by a test, intentionally or unintentionally. Also, the metric does
not make any statement whether any test actually verified that the specific line actually
works correctly. It is a little scary to realize that test cases with no
assert statements achieve the same coverage as tests with
asserts. I have seen a few proposals to remedy this problem:
You might only count lines executed in the class specifically targeted by the unit
test. For example, if
fooTest exercises 60 out of 100 lines in
Foo but also hits 50 lines in
Bar, the lines in
Bar do not count towards
Bar's test coverage because this test was not intended to test
Bar. In fact,
Bar being hit at all might have been the result of insufficient mocking or stubbing of
the test for
Foo. Such behavior should not be rewarded.
A particularly interesting technique is mutation testing. This class of tools modify the source code (or byte code) in the class under test to see whether the change breaks any tests. If no tests break as a result of the random change the mutated line does not count as thoroughly unit tested. Ivan Moore wrote Jester, which performs Java mutation testing. It can produce nice reports that show what was actually changed and whether the change resulted in a failed test.
A while ago we refactored a pretty miserable chunk of code. It took us more than a day to extract the meaningful concepts, refactor the code and add meaningful unit tests. Being proud of our work we ran a test coverage tool against the class to give ourselves a pad on the back about how much better our test coverage was. Sadly, the pad on the back was more like a kick in the pants. For one class the test coverage crept up by a mere 2 percentage points and for the other one it actually decreased! It turned out that the portion of the code that we refactored had actually decent test "coverage" (as determined by the tool), even though it was very poorly written. Rewriting it (and simplifying the logic) reduced the number of lines in this code segment, causing the overall percentage of tested code in the class actually to decrease because the remaining (still) poorly tested code did not shrink.
You should also be cautious about which metric carrot you want to dangle in front
of the developers. In Managing Software Maniacs Ken Whittaker already warned that you should "reward, never incent". The biggest
danger of incentives is that you might just get what you asked for, but not what you
had in mind. If achieving a certain test coverage goal brings a bonus you have to
expect developers to have high morals not to come up with a quick scheme to get coverage
(such as tests without
asserts). If your developers are really that sophisticated and honest maybe they need no
incentive at all to do the right thing. A lot can be said about the dangers of incentives
for developers but it seems that incentives based on metrics are particularly dangerous
because a simple number can rarely represent the real intent.
Even with all these caveats metrics can be quite useful. I frequently use metrics in the following situations:
Oh, and I forgot one useful application of metrics: during performance review times they can be a handy weapon. Who can argue with "my code has 95% test coverage"? Just make sure your boss does not read this blog...
|© 2003-2021 • All rights reserved.|