Wednesday, 10 September 2014

Going from Acceptance Tests to Code

When working with a dev hat on, I keep getting overruled in discussions by programmers who either don't seem to get what I am saying or don't seem to want to get what I am saying. One theme that comes up time and time again is the issue of acceptance testing and how to test. There is often the view that it is the QA's/BA's jobs to specify the acceptance test and that unit testing is the preserve of the developer, who only gets them when the QA's have finished, or just as bad, develops their code in parallel.

Now, this obviously creates the very silos that agilists always claim were 'bad' in traditional methods. I am not a great believer in 'owning' code and also believe that a generalist skill-set is better than a specialist one, as otherwise you get blockers when people are off on annual leave or sick, or you get massive contention on the times of these people. So this is yet another thing that narks me.

Hence, in the spirit of breaking these barriers, I am going to spend this blog post writing code.
In order to do that, we need a story card. So, let's have the following:

"As a tutor, I want to be able to recall the marks of my top 3 students at the end of the year to be able put them forward for end-of-year awards."

Simple enough. So using SpecFlow, MSTest, C# and SQL Server Express Edition, how do I turn this into code?

There are many ways to make this happen, including starting with a walking skeleton and pushing the acceptance criteria down through the layers until it exists in the database or by elaborating on the criteria enough to develop the example using design by contract.

Acceptance Criteria 

There are many ways to make this happen, but working with the tutor and using agile methods, we pick this ticket up, elaborate on it with the tutor and we can come up with something fairly reasonable using Gherkin syntax which represents this.

My personal favourite is using specification by example. This allows the dev, QA and BA to engage the product owner/customer in a role play session, defining at each stage an example, say, message passing, website form, document, customer service agent etc. that can tease out example scenarios with example data for each feature developers are being asked to deliver.

For example, the end result of this after interacting with this tutor may be:

Feature: TopFlight Students
 As a teacher, 
 In order to find the top 3 performing students,
 I need to retrieve the average student marks for the year 
 And pick the top 3

Scenario: Pick the top 3 performing students by average score
Given I have the following student marks:
 | ID | Surname | Forename | Score |
 | 1  | Joe     | Bloggs   | 55    |
 | 1  | Joe     | Bloggs   | 73    |
 | 2  | Freb    | Barnes   | 61    |
 | 3  | Jane    | Jonas    | 83    |
 | 4  | James   | Jonas    | 85    |
When I press retrieve
Then the result should be as follows:
 | ID | Surname | Forename | AverageScore |
 | 4  | James   | Jonas    | 85           |
 | 3  | Jane    | Jonas    | 83           |
 | 1  | Joe     | Bloggs   | 64           |

Having established the acceptance criteria here, we can develop the steps and use stub objects to return the expected values, which become something like the following skeletal SpecFlow steps file which tests the TopFlightEngine static class:

using System;
using System.Collections.Generic;
using TechTalk.SpecFlow;
using TutorMarks.TopFlight;
using Microsoft.VisualStudio.TestTools.UnitTesting;
namespace TutorMarks.TopFlight.Test.Feature
    public class TopFlightStudents
        // Added the student scores
        private IList<StudentRecord> studentRecords;
        [Given(@"I have the following student marks:")]
        public void GivenIHaveTheFollowingStudentMarks(
                IList<StudentRecord> studentScores
            studentRecords = studentScores;
        [When(@"I press retrieve")]
        public void WhenIPressRetrieve()
            // Press retrieve
        [Then(@"the result should be as follows:")]
        public void ThenTheResultShouldBeAsFollows(
                IList<StudentRecord> expectedTopFlightScores
            IList<StudentRecord> actualResults = TopFlightEngine.RetrieveTopFlightStudents( 
            Assert.AreEqual( expectedTopFlightScores.Count, 3, 
                @"The expected number of results were not returned." );
            forint index = 0; index < actualResults.Count; index++ )
                AssertPropertyEquality("ID", index, 
                    expectedTopFlightScores[ index ].Id, 
                    actualResults[ index ].Id);
                AssertPropertyEquality("Surname", index, 
                    expectedTopFlightScores[ index ].Surname, 
                    actualResults[ index ].Surname);
                AssertPropertyEquality("Forename", index, 
                    expectedTopFlightScores[ index ].Forename, 
                    actualResults[ index ].Forename);
                AssertPropertyEquality("Score", index, 
                    expectedTopFlightScores[ index ].Score, 
                    actualResults[ index ].Score);
        private static void AssertPropertyEquality(
            string fieldName, 
            int index, 
            object expectedElement, 
            object actualElement)
            string CONST_DIFFERENT_RESULTS = 
                @"The property {0} for record {1} is unexpected. Expected {2}, Actual {3}";
                    CONST_DIFFERENT_RESULTS, new object[]
        private IList<StudentRecord> MapTableToStudentRecords(
                Table source
            List<StudentRecord> result = new List<StudentRecord>();
            foreach (TableRow row in source.Rows)
                result.Add(new StudentRecord()
                    Id = int.Parse(row["ID"]),
                    Surname = row["Surname"],
                    Forename = row["Forename"],
                    Score = float.Parse(row["Score"])
            return result;

Now, remember, for the sake of the illustration and learning why arbitrary test criteria in TDD is a bad thing, look at the bigger picture.

After eventually making the tests go green, the following can be seen (I should have been a poet):

namespace TutorMarks.TopFlight
    public class TopFlightEngine
        public static IList<StudentRecord> RetrieveTopFlightStudents(IList<StudentRecord> studentRecords)
            return new List<StudentRecord>
                new StudentRecord() { Id = 4, Surname = "James", Forename = "Jonas", Score = 85 },
                new StudentRecord() { Id = 3, Surname = "Jane", Forename = "Jonas", Score = 83 },
                new StudentRecord() { Id = 1, Surname = "Joe", Forename = "Bloggs", Score = 64 }
    // ... Located in another file
    public class StudentRecord
        public int Id { getset; }
        public string Surname { getset; }
        public string Forename { getset; }
        public float Score { getset; }

What pertinent things do you notice? Correct! It only returns the EXACT averages as the tutor expect to see them. This is your 'dumb' wireframe/pretotype. One thing it is NOT is a walking skeleton, as that implies a piece of functionality that manifests through all the connected components of an architecture as a tiny implementation (basically, not actually having any substance to it. Akin to testing with arbitrary data and making sure "...the web bone's connected to the service bone. The service bone's connected to the data bone...", jehee... see what I did there?). This precedes even that! It allows you to get feedback quickly to yourself and builds from the acceptance criteria to the code from the very beginning of a project.

This SAME  pretotype can then be elaborated even further with the tutor by adding more example scenarios. For example, it can be established that the 'mean average' (as opposed to modal or median) is the calculation that brings about the expected results.

For each part of the whole (let us call this part a 'unit'), when playing out the scenario with the customer, using these examples, their view would be as follows:
  1. Take the scores for each student (e.g. "Joe Bloggs"), who is identified by a single ID (the number 1)
  2. Add their scores up (55 + 73 = 128), keeping track of the number of scores they have (2 scores)
  3. Divide the total Sum of the scores by the number of scores they have ( 128 / 2 = 64 )
  4. This is your average score (so average score = 64 )
So, can you see what we have here? Correct! You have a series of test scenarios that you can use to substantiate the pretotype, which relate to the ORIGINAL acceptance criteria! As a result, everyone can see how the unit level code delivers the acceptance criteria all the way through the process.

Taking each step in turn and noting that we can then deliver the individual steps by developing units which use the examples in the steps as acceptance criteria. We then go on to deliver a unit testing class which tests for the correct number of results and then the average results etc. etc,

Benefits, Warnings, Tips

One thing that has consistently been a problem in the past is how to align acceptance and unit tests. If you don't have full code coverage at acceptance tests level, you run the risk of allowing development too much leeway in creating examples which are not aligned in other components of your architecture. That said, given acceptance tests tend to be slow in nature, you could trade off some acceptance specifics, that are low value or risk, such as exception cases, for unit test coverage in that domain, since the 'unit' is typically where such exceptions originate.

Developing unit tests back from the acceptance tests with examples usually give you the highest value cases and secondary scenarios, which automatically gives you unit test alignment with the value that the stakeholder wants. Build on those to then fill out the unit with exception tests, say, potentially mocking them out if you need to.

Tip 1: Take care that that acceptance tests and unit tests are 'joined up'.
This should automatically happen, but there are many a case where it doesn't. This is why working back from the acceptance criteria examples is best to do. If the examples are courses grained than you need at the unit level, discuss it with the stakeholder.

Tip 2: Know Your Context and Watch your coverage!
This is a contentious one. I am of the view that things should be covered 100% in some form. Acceptance tests really help cover at least 80% of the value. However, there are often edge cases and bugs which come about which resulted from unforeseen scenarios. Adding a test for the bug is great, as this fills the gaps, but be aware that it's possible to have acceptance tests cover 80% say, unit tests to cover 100% of the code, but you still have integration problems which result from the 'missing' 20% acceptance tests or bugs resulting from data you/the stakeholder hadn't thought of. This is why integration tests came about, but if you have 100% acceptance test coverage, you don't need integration tests at all, because that's already in the acceptance tests anyway.

This won't always be possible. For example, interfacing with some cloud providers. So know the context of the system and work with that to deliver right up to the cloud service boundary.

Also, don't forget that overusing mocks is a bad thing and unit testing just the edges around the value covered by acceptance tests (if they're not 100%) doesn't prove anything credible either, since you can't isolate an error in acceptance tests by the unit tests that way. I've illustrated the problem areas below, since they will need special attention. Perhaps another discussion with the business owner. Note, green includes unit test coverage and has been removed for clarity.

With mocks. Acceptance test in green, unit test coverage. Red indicates potential areas for bugs, so pay special attention to them.

Without mocks (or only at the extremeties) Again pay particular attention to the red areas.

One of the difficulties is that the red areas are points where the developers don't necessarily have enough information to go on. i.e. they can't set up their tests without potentially fabricating some tests data and the overall behaviour of the system may or may not be consistent with the information used in the other red area(s). Hence, your test suite has the potential to use different entities, with partial overlaps in fields, to get different results in these two different areas of the system.

So make sure that a combination of examples you use makes sense. This might necessitate checking what's in the tests already to make sure you don't create an absurd scenario, such as using random credit card digits for a number in a section of the site you're not interested in unit tests, only for someone else to develop a Luhn validation algorithm and it breaks in all these cases. Not nice to leave cleaning up that sort of mess to your colleagues!