Testing

26. Testing#

We perform testing to increase confidence that our programs work correctly and that those programs meet our customer’s expectations. We call this type of testing functional testing as it ensures that the code functions correctly. While it sounds trite, one cannot overstate the importance of software testing. Software and computer systems permeate our daily lives; the correct functionality of those systems is paramount to our safety and well-being. Unless you are reading these words from a dead tree off the grid somewhere, you are either working with a computer system or benefiting from a product (electricity) controlled by a computer system. Not only can software defects waste time as we work through the issues, but those defects can lead to monetary loss (both direct and indirect) and possibly death.

Testing directly improves product quality, leading to higher customer satisfaction with systems. For example, imagine using a computer that crashes regularly. How long would you continue to use that system? Testing also decreases product development costs, as finding and fixing defects sooner to when they first appear in the system costs less money to fix.

Formally, The Guide to the Software Engineering Body of Knowledge[1] presents software testing as -

Software testing consists of the dynamic verification that a program provides expected behaviors on a finite set of test cases, suitably selected from the usually infinite execution domain.

The guide then breaks down these italicized words:

Dynamic: We execute the program on select inputs
Expected: We observe the expected outputs of the program and decide if the test was successful or not.
Finite: Even the simplest of programs can have so many input values that exhaustive testing is infeasible. As such, we can only perform testing on a subset of the possible tests.
Selected: Selecting different test cases can have different levels of effectiveness. As such, we must seek to create the correct set of test cases to evaluate our programs effectively (minimize time and effort).

Defects will occur in software systems. Our goal is to both minimize their occurences and reduce the time between when they appear and then subsequently discovered.

Latent defects can be extremely dangerous - one way they manifest themselves as zero-day defects that could allow others to access systems illicitly.

Yes, you can initially get away with manually testing your software as you write code. (In some ways, this is necessary - build a little, test that those parts work, and build some more. Rinse, Lather, Repeat.). However, eventually, you will have significant maintainability issues. As you make changes, how do you ensure those changes do not break something?

Before the early 2000s, most software and system testing was a manual process. Individuals wrote test scripts that others manually followed. Some developers built unit test cases but then executed those test cases manually. Since then, testing has become predominantly automated (although not universally).

Several advantages to automated testing:

Cost-savings. These test cases can be run repetitively. Yes, higher upfront costs exist to develop test cases, but that is a one-time expense. Over the lifetime of a project, test cases can be executed thousands of times.
Faster development timeframes. As developers make enhancements to a system, existing test cases can be executed to ensure the existing functionality works. (regression testing)
Immediate feedback
Automation of test case development. Fuzzing - https://owasp.org/www-community/Fuzzing
Automated testing is foundational to continuous integration, continuous delivery, and other modern DevOps practices.
Oh … by the way, the automated grading of your program submissions - that’s all unit tests.

This notebook focuses on verification of the code we have been developing - primarily this involves unit testing of the code we write. Generally speaking, unit tests check that the software produces the correct output based on a specific input. We compare actual and expected results to determine if an error occurred. We consider a “unit” to be any portion of code testable in isolation. We can call functions and execute them, so functions are units. However, we cannot call specific lines within a function, so they are smaller than a unit. As we examine classes shortly, we can test many of the class components in isolation and, thus, treat classes as units for testing.

Two other common levels for functional testing:

system testing examines the behavior of an entire system. This includes external interfaces to other systems as well as the operating environment (e.g., operating system) in which the system resides.
integration testing verifies interactions among software components - how do these units of code work together?

The primary structure of functional tests is relatively consistent. As these tests check that the system produces the correct output for a given input, functions tests following this outline:

Define inputs
Identify the expected output
Perform any system preparation to execute the test
Execute the test
Get the actual output
Compare the actual output against the expected output to see if the two match.

This notebook walks through developing test cases to execute on an automated basis. Our overarching goal is two-fold:

Deliver high-quality products to our customers
Reduce costs as much as possible.

We will use Python’s built-in testing framework, unittest. This framework provides capabilities to organize test cases, execute code, establish any pre-conditions necessary to perform tests, execute the tests, and remove any artifacts of the testing process. The test cases work by making assertions about the code - does this result match some expected value? An assertion is a statement of fact^*. As such, if an assertion fails, our assumption is incorrect, or the code produces an incorrect value.

unittest documentation

26.1. Case Study: Bond Valuation#

We will implement several functions to compute various bond yield values to produce some code to test.

A bond represents a type of corporate debt. A corporation issues a bond with a fixed principal that will be paid to the bond owner at maturity and fixed interest payments at set times. For instance, a corporation can issue a $1,000 bond with 10% annual interest with a maturity of 10 years. Investors holding on to this bond receive $2,000 over the lifetime (10 payments of $100 plus $1,000 at the end).

One valuation to examine is the bond yield. $bond yield = \frac{annual~coupon~payment}{bond~price} $ where the $annual~coupon~payment = face~value * coupon~rate$

We can then define a function to implement the bond yield.

def compute_bond_yield(bond_price, face_value, coupon_rate):
    annual_coupon_payment = face_value * coupon_rate
    bond_yield = annual_coupon_payment / face_value         # used face_value instead of bond price
    return bond_yield

compute_bond_yield(970,1000,0.05)

0.05

The result looks correct, but is it really? Ideally, we want to ensure that the output matches the expected. However, if we plug those numbers into a bond yield calculator, we see that the result should be 0.0515.

To help find issues like this, we need to write unit tests. We write these tests first and then implement the function. We know our functions then work when they pass the test cases. Writing the test cases first also prevents confirmation bias - our minds still have the implementation in mind and may overlook flaws in that implementation if we write test cases afterward.

To use the unittest module, we will first need to import that module and then write test cases. As with exceptions, we extend a pre-existing class and then add methods to that class. The approach will follow this outline:

    
import unittest

class TestName(unittest.TestCase):
    def setUp(self):
        pass

    def tearDown(self):
        pass

    def test_name_1(self):
        ...
    
    def test_name_2(self):
        ...
    
    def test_name_n(self):
        ...
    

Generally, you will want to identify code that contains test cases unmistakably. As such, starting the class names for the unit test with “Test” makes sense. Similarly, we will also want to give descriptive names to the individual methods. As with many other testing frameworks, unittest assumes that methods starting with test_ are test cases. We can also add docstrings to the methods to provide a more detailed test description.

The framework calls the setup() method before each test method. Use this capability to allocate resources or set up conditions necessary to execute the test cases. You can also define a setupClass() method that runs once before executing any tests in the class (more details).

The tearDown() method is called after each test method, regardless of any exceptions. Use this method to release any allocated resources or move the system state back before the test case executed. tearDownClass() also exists - this executes after all of the tests in the class have been executed.

It was unnecessary to define setup() and tearDown() in these examples - the default behavior for both does nothing.

From within a Python program, we can execute the defined test cases with the following method call:

    
    unittest.main(argv=['unittest','TestName'], verbosity=2, exit=False)

Remove the ‘TestName’ element to execute all test cases that have been loaded by the interpreter.

From the command line, we can execute python -m unittest TestName

To discover all possible tests cases and run those, use python -m unittest discover

Following the test outline, we will write some test cases for the compute_bond_yield() function.

1import unittest

class TestBondYield(unittest.TestCase):
    "Validates compute_bond_yield"
    def setUp(self):
        pass
    def tearDown(self):
        pass
    
    def test_bond_yield_5_percent(self):
        "Validate that the computed bond yield approximates 0.0515"
        bond_yield = compute_bond_yield(970, 1000, 0.05)
        self.assertAlmostEqual(bond_yield, 0.0515, places=4)
        
    def test_bond_yield_0_percent(self):
        "Validate that the computed bond yield equals 0 for 0%"
        bond_yield = compute_bond_yield(970, 1000, 0.00)
        self.assertEqual(bond_yield, 0.0)        

unittest.main(argv=['unittest','TestBondYield'], verbosity=2, exit=False)

test_bond_yield_0_percent (__main__.TestBondYield.test_bond_yield_0_percent)
Validate that the computed bond yield equals 0 for 0% ... 

ok

test_bond_yield_5_percent (__main__.TestBondYield.test_bond_yield_5_percent)
Validate that the computed bond yield approximates 0.0515 ... 

FAIL

======================================================================
FAIL: test_bond_yield_5_percent (__main__.TestBondYield.test_bond_yield_5_percent)
Validate that the computed bond yield approximates 0.0515
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/var/folders/f0/69wncpqd02s3j1r_z5sfddd00000gq/T/ipykernel_67341/3752365302.py", line 11, in test_bond_yield_5_percent
    self.assertAlmostEqual(bond_yield, 0.0515, places=4)
AssertionError: 0.05 != 0.0515 within 4 places (0.0014999999999999944 difference)

----------------------------------------------------------------------
Ran 2 tests in 0.001s

FAILED (failures=1)

<unittest.main.TestProgram at 0x108a9d8e0>

One of the tests passed, while the other test failed. We now need to debug the code to see what occurred. Since this is a relatively small amount of code, we can employ the debug strategy of “read” and then “run”. As we read the source code, does it match up with the formula? No. We can also step through the code manually (either with a debugger or using a memory diagram/tracing approach).

Let’s correct the code and then re-run the test.

def compute_bond_yield(bond_price, face_value, coupon_rate):
    annual_coupon_payment = face_value * coupon_rate
    bond_yield = annual_coupon_payment / bond_price
    return bond_yield

unittest.main(argv=['unittest','TestBondYield'], verbosity=2, exit=False)

test_bond_yield_0_percent (__main__.TestBondYield.test_bond_yield_0_percent)
Validate that the computed bond yield equals 0 for 0% ... 

ok

test_bond_yield_5_percent (__main__.TestBondYield.test_bond_yield_5_percent)
Validate that the computed bond yield approximates 0.0515 ... 

ok

----------------------------------------------------------------------
Ran 2 tests in 0.001s

OK

<unittest.main.TestProgram at 0x108a9dfa0>

Another valuation for bonds is the yield to maturity(YTM). This value is the speculative rate of a return given that an investor purchases a bond a given current market price and then holds the bond until marturity (thus, all interest payments and final payments are made).

The YTM can be approximated with this formula:

$ YTM = \frac{C + \frac{FV-PV}{t}}{\frac{FV+PV}{2}} $ where

$C$ – Interest/coupon payment
$FV$ – Face value of the security
$PV$ – Present value/price of the security
$t$ – How many years it takes the security to reach maturity

def calculate_yield_to_maturity(bond_price, face_value, coupon_rate, t):
    c = coupon_rate * face_value
    ytm = (c + (face_value - bond_price)/t) / ( (face_value+bond_price) /2 )
    return ytm

print(compute_bond_yield(850,1000,0.15))
print(calculate_yield_to_maturity(850,1000,0.15,7))

0.17647058823529413
0.18532818532818532

We can then develop another set of test cases to test that function:

class TestBondYTM(unittest.TestCase):
    def setUp(self):
        pass
    def tearDown(self):
        pass
    
    def test_bond_ytm_5_per_10_year(self):
        "Validate that the computed bond yield approximates 0.0515"
        bond_yield = calculate_yield_to_maturity(970, 1000, 0.05,10)
        self.assertAlmostEqual(bond_yield, 0.0538, places=4)
  

26.2. Assertions and `unittest`#

As mentioned, unittest relies upon assertions - statements of facts - when creating test cases. Python uses the following syntax for assertions:

    assert expression[, assertion message]

Rather than using this syntax, the unittest module has defined a series of assert methods that should be called to validate the program’s state. The unittest module documentation lists the available methods.

Above, we provided examples for self.assertEqual() and self.assertAlmostEqual(). Due the imprecision of floating point numbers, self.assertAlmostEqual() should be used when comparing floating point numbers for equality.

For testing if exceptions are rasied, we can use assertRaises(). We should also validate the type (and possibly message) generated in the exception.

def test_bad_list(self):
    """Tests a string in the middle of the list"""
    from max_seq import max_seq
    with self.assertRaises(Exception) as context:
        max_seq([ 1,2.2, "stringValue",3])
    self.assertTrue(type(context.exception) == TypeError,"Wrong exception type raised")

unittest.main(argv=['unittest','TestBondYTM'], verbosity=2, exit=False)

test_bond_ytm_5_per_10_year (__main__.TestBondYTM.test_bond_ytm_5_per_10_year)
Validate that the computed bond yield approximates 0.0515 ... 

ok

----------------------------------------------------------------------
Ran 1 test in 0.000s

OK

<unittest.main.TestProgram at 0x108afa090>

26.3. Documenting Test Cases#

To document a test case, we generally define this information:

Unique Identifier
Test Inputs
Expected Results
Actual Results

As you look at the source code, you can see that the unittest classes have this information in them already. You should include an identifier in the docstring - this will help you locate test cases in larger projects and possibly provide a hierarchical organization if desired.

If you do manually test code instead, you should keep track of your tests in a text file. That way, you at least have some reference to how you previously tested the code. However, the right way to do this is to create automated unit tests. The investment will be worth it.

26.4. Closed Box and Open Box Testing#

As we approach functional testing, we can develop test cases based on two approaches: closed box and open box. In the closed box approach, we do not have any access to the underlying source code - we need to consider the test cases solely based on the inputs and the corresponding outputs. In contrast, the open box approach assumes that the test developers can see the code and then develop code specific to test paths, conditions, and branches that the code can follow. For unit testing, developers will combine these approaches to build test cases. For user acceptance testing, test cases follow a closed box approach as the users/testers do not have insight into the underlying pinnings of how a system works.

Revisiting the definition of software testing from the Software Engineering Body of Knowledge:

Software testing consists of the dynamic verification that a program provides expected behaviors on a finite set of test cases, suitably selected from the usually infinite execution domain.

The definition presents several interesting ideas:

The input set quite often can be infinite. Even for the simple examples above, we have endless input values - use different values in the decimal places for the parameters to the functions.
We need to choose (create) test cases that are “suitably selected”. In other words, what test cases are most likely to find potential issues, represent the range of input values, and execute all of the code.

Note: Others refer to these two types of testing as “black” box and “white” box.

26.4.1. Closed Box Testing: Equivalence Classes#

One strategy for closed box testing is to divide the input values into different equivalence classes (partitions). This approach divides the input space into different regions and then selects a representative value from each region.

For example, U.S. zip codes are five digits long and must contain only numbers. So we can divide this into the following classes:

a string less than five characters in length (invalid)
a string more than five characters in length (invalid)
a string that is five characters long but has non-numeric characters present (invalid)
a string that is five characters long with only digits 0-9 present. (valid)

Using these four classes, we assume inputs within each class are equivalent to each other. E.g., testing “” and “123” will be the same. Similarly, no value exists for testing the 10,000 valid possibilities beyond just one example.

With equivalence classes, the goal is not necessarily to create bad input values but rather to identify values that can be used to limit the number of test cases created. For the zip code example, the non-numeric characters present class was created as the problem statement explicitly mentioned numeric characters only.

As we use equivalence classes to create test cases, we

Determine if there are any specified limits and conditions for input values.
Break in the input space into different classes to create partitions:
1. If the input space is specified as a range condition, then three classes are created: below the range (invalid), in the range (valid), and above the range (invalid). For example, if the condition is an individual’s age must be between 18 and 65, the classes are below 18, 18 to 65, and above 65.
2. If the input space is a given input, we have at least two classes: valid input and invalid inputs.
3. If we look at membership within a collection, we have two equivalence classes: the member exists or the member does not exist. However, if members are treated differently, then additional partitions need to be defined based upon those behaviors.
4. For Boolean values, we have two classes: True and False.
Write a test case that tests each range’s “middle” input values.

Examples:

We could have a business rule that stock symbols must be between 1 and 5 characters in length. With these, we would create equivalence classes looking at strings that are 0, 3, and 7 characters in length.
Based on temperature and pressure, elements and other substances exist in one of three states - solid, liquid, and gas. Therefore, we would define equivalence classes for the input ranges to test each state.

The following table presents the 2022 U.S. Federal Tax Brackets:

Income	Tax
<= $10,275	10% of the taxable income
> $10,275 and <= $41,775	$1,027.50 plus 12% of the excess over $10,275
> $41,775 and <= $89,075	$4,807.50 plus 22% of the excess over $41,775
> $89,075 and <= $170,050	$15,213.50 plus 24% of the excess over $89,075
> $170,050 and <= $215,950	$34,647.50 plus 32% of the the excess over $170,050
> $215,950 and <= $539,900	$49,335.50 plus 35% of the excess over $215,950
> $539,900	$162,718 plus 37% of the excess over $539,900

We would use seven equivalence classes to create test cases with these tax brackets. We choose a value in the middle of each range to represent that test case.

26.4.2. Closed Box Testing: Boundary Value Analysis#

Boundary value analysis (boundary testing) is another closed box testing strategy that complements equivalence classes. Programmers often make mistakes at the boundaries of input ranges. For example, they may use a less than < comparison operator rather than a less than equals <= comparison operator. As such, we create test cases that are just below the boundary, on the boundary, and just above the boundary.

For testing the US Federal tax brackets, we would use: $10,274, $10,275, $10,276, $41,774, $41,775, $41,776, $89,074, $89,075, $89,076, $170,049, $170,050, $170,051, $215,949, $215,950, $215,951, $539,899, $539,900, and $539,901.

Whew, no one ever said U.S. Federal Taxes were easy. As a side note, payroll is a surprisingly challenging problem domain. Systems have to deal with various employee types, salaries, and pay mechanisms, but they also have to consider benefits (e.g., 401k) and taxes (country, state, local).

Another example would be looking at the states (solid, liquid, and gas) of matter. At less than 0 degrees Celsius, water becomes solid. Between 0 degrees Celsius and 100 degrees Celsius, water exists as a liquid. Above 100 degrees Celsius, water becomes a gas. So now, we have -1, 0, 1, 99, 100, and 101 as boundary possibilities.

As numbers go from negative to positive (-1, 0, -1) is another good location for boundary testing.

For collections, we would look at if the collection was empty or a single member.

For lists, look at the starting and ending indexes.

26.4.3. Open Box Testing: Coverage#

With open box testing, developers examine the source to create test cases. The immediate advantage is the ability to look at the different conditionals within the code and write specific test cases against those conditionals. More importantly, the primary consideration with open box testing is test coverage - having a set of test cases that execute all possible code. To determine how much of the code executes, we look at several different levels of test coverage:

statement
decision (branch)
paths

To help define and understand these coverage concepts, consider the following function calculate_price():

def calculate_price(stock_price, num_items, discount):
    quantity_discount = 0.0
    result = 0.0

    if num_items >= 5:
        quantity_discount = 10;
    elif num_items >= 10:          #ordering mistake deliberately placed in code
        quantity_discount = 15;
    else:
        quantity_discount = 0

    if discount > quantity_discount:
        quantity_discount = discount

    result = stock_price/100.0 * (100-quantity_discountquant) * num_items;

    return result;

From this code, we can create a control-flow graph, which is a graphical representation of the paths that execution may take through a program. Each node represents a block - a piece of code - that executes as a single unit. The edges represent decisions (or jumps with iteration) in the control flow. The primary difference between a control-flow graph and a flowchart is that the decision node is combined with the predecessor node (if there is only one) as both execute as a unit.

For full statement coverage, each node must execute. The function does not use the parameter stock_price in any conditionals, so we can keep that value constant. We need to use three different values (0, 5, 10) of num_items to execute the first set of conditionals. Then, we need at least one of the test cases (or a new one) where the discount value is higher than the computed quantity discount. When we test num_items =10, we will set discount = 20.

However, after walking through the test case of num_items = 10, we realize that the conditional in node “2” never evaluates to True. After executing the three test cases, we can see through coverage analysis that node “4” never executes. Hence, we can see that a logic error exists. The statement coverage metric is 87.5% (7/8).

For num_items = 0 and discount = 0, we execute nodes 1, 2, 5, 6, 8
For num_items = 5 and discount = 0, we execute nodes 1, 3, 6, 8
For num_items =10 and discount = 20, we execute nodes 1, 3, 6, 7, 8

Decision coverage requires executing all of the possible outcomes of decisions within the program. Using the control-flow graph, this requires traversing each edge in the graph. Using the same set of test cases:

For num_items = 0 and discount = 0, we follow edges B, D, G, I
For num_items = 5 and discount = 0, we follow edges A, E, I
For num_items =10 and discount = 20, we follow edges A, E, H, J

As before, no possibility exists to follow edges C and F. The branch coverage metric is 80% (8/10).

Path coverage requires the test cases to follow all possible paths through the control flow graph.
For paths, we have six possibilities.

A E H J
A E I
B C F H J
B C F I
B D G H J
B D G I

One way to compute the total number of paths is to examine the “independent” conditionals within the program. In this example, our first conditional had three possibilities, while the second conditional had two possibilities. $ 2 * 3 = 6 $

The number of paths can grow to become very large. For example, the U.S. Federal Tax Bracket has seven initial possibilities assuming we only looked at just the income. Four independent if statements within a function creates $2^4(16)$ possibilities - exponential growth (YIKES!). As you can see, getting to 100% path coverage can become overwhelming with nontrivial code.

Also, realize that these coverage metrics can be misleading. One test case alone (num_items = 10 and discount = 20) executes 5 of 8 nodes (statements) and 4 of 10 edges (decisions/branches). Assuming, we corrected the code to order the conditionals correctly in the first check:

    if num_items >= 10:
        quantity_discount = 15;
    elif num_items >= 5:       
        quantity_discount = 10;
    else:
        quantity_discount = 0    

Our three test cases would then have 100% statement and 100% decision coverage. Assuming we add another three test cases to get to 100% path coverage, it may still be possible to have issues with the code. What happens when the discount is greater than or equal to 100? Are we giving away products as well as handing out money?

Additionally, coverage tools only evaluate whether or not code executes - they do not evaluate whether a test case is beneficial. As a result, you can have test cases that execute quite a bit of code but do not perform worthwhile tests or have meaningful assertions. Combining closed-box and open-box test strategies helps overcome such gaps in test cases.

26.5. Testing Strategies#

Many different strategies exist for developing test cases. Paul Gries, Jennifer Campbell, and Jason Montojo presented these strategies:

Think about size. When a test involves a collection such as a list, string, dictionary, or file, you need to do the following:
1. Test the empty collection.
2. Test a collection with one item in it.
3. Test a general case with several items.
4. Test the smallest interesting case, such as sorting a list containing two values.
Think about dichotomies. A dichotomy is a contrast between two things. Examples of dichotomies are empty/full, even/odd, positive/negative, and alphabetic/nonalphabetic. If a function deals with two or more different categories or situations, make sure you test all of them.
Think about boundaries. If a function behaves differently around a particular boundary or threshold, test exactly that boundary case.
Think about order. If a function behaves differently when values appear in different orders, identify those orders and test each one of them. For the sorting example mentioned earlier, you’ll want one test case where the items are in order and one where they are not.

Source: Paul Gries, Jennifer Campbell, and Jason Montojo. 2017. Practical Programming: An Introduction to Computer Science Using Python 3.6 (3rd. ed.). Pragmatic Bookshelf.

Andrew Hunt and David Thomas created an acronym: CORRECT

Conformance: Does the value conform to an expected format and type?
Ordering: Is the set of values ordered or unordered as appropriate?
Range: Is the value within reasonable minimum and maximum values?
Reference: Does the code reference anything external that is not under the direct control of the code itself?
Existence: Does the value exist (is it non-null, nonzero, present in a set, and so on)?
Cardinality: Are there exactly enough values?
Time (absolute and relative): Is everything happening in order? At the right time? In time?

Source: Andrew Hunt and David Thomas. 2003. Pragmatic Unit Testing in Java with JUnit. The Pragmatic Programmers.

Some additional guidelines to consider:

Choose inputs to force the system to generate all error messages
Choose inputs to force the system to raise exceptions
Test both the “happy” paths as well as as the negative paths. The “happy” path is the program’s execution assumming no errors exist.
Execute APIs with different orderings
Look for inputs that may cause overflow issues. Use extremely large or small values.
Look for “off by one” situations, which often occur with lists and boundary values.
For data, what happens if you have too many characters or fields on a line? too few characters or fields? not enough lines (records) in the file? too many lines (records) in the file?
For sequences/lists of data items, what happens when those items cross boundaries (e.g., 0 or another value identified in boundary value analysis)?

Testing Miss: One of the testing misses that one of the authors witnessed looked at checking the 401k deductions that individual could make. By law, this value is limited to $27,000 for individuals over the age of 50. While running the code in production, the process failed as one employee (the company CEO) reached the limit on the first paycheck of year.

26.6. How Much Testing?#

Writing a comprehensive test suite is a time-consuming effort. Unfortunately, how much testing is required is project dependent. Much depends upon the confidence needed that the code is correct. Little testing may be necessary for throw-away code that may be used once or to evaluate a particular design choice. For code used in mission(business) critical systems, much more confidence is needed that the code is correct; therefore, you need to build more test cases. Ideally, you should aim for 100% decision coverage. Testing is an investment that will save you time and resources. You should be able to find issues sooner and confidently make future code changes without breaking existing functionality.

Very little exists in the literature (academic papers and books) for guidelines. Fred Books, in his seminal book, The Mythical Man-Month: Essays on Software Engineering, presented this schedule breakdown:

1/3 design
1/6 coding
1/2 testing (broken down evenly between unit testing and system testing)

Source: F. P. Brooks, Jr., The Mythical Man-Month: Essays on Software Engineering, Anniversary Edition, Boston: Addison-Wesley, 1995.

Final thoughts:

Test early: start testing as soon as parts are implemented
Test often: running tests at every reasonable opportunity

26.7. Other Testing Frameworks#

unittest is not the only test framework available for Python. Python’s standard library contains unittest. The framework also matches up similarly to the widely-used JUnit framework for Java.

Other frameworks to examine:

Pytest: https://docs.pytest.org/
Hypothesis: https://hypothesis.readthedocs.io/

26.8. Other Testing Approaches#

You should also be familiar with regression testing and reviews.

26.8.1. Regression Testing#

Regression Testing is the re-execution of a set of tests to ensure that some change has not created an unintended side effect through breaking other parts of the system. Generally, regression testing will be performed on system after any type of a change prior to deployment or distribution. These changes include, but are not limited to fixing defects, new functionality, and configuration changes. Ideally, we want to perform these regression test automatically as discussed in this notebook with testing frameworks. Quite frequently, project teams will establish regression tests to execute at specific times - such as

after every successful compile (may be problematic for large systems due to execution time)
as part of the code check-in process to a code management system.
on a regular schedule (nightly/weekly)

In certain circumstances, manual verification may be necessary. However, as with testing in general, manual regression testing is strongly discouraged due to the repeated effort involved.

26.8.2. Reviews#

Reviews are a prevalent quality assurance technique performed at any point in the software development process. For instance, during the requirements and design phases, team meetings may be held to walk through the proposed documents to examine them for both completeness and correctness.

Developers and project teams widely utilize code reviews to improve software quality. By having another developer examine the source directly, that developer may be able to find issues missed by the original developer or suggest alternative improvements. In addition, an ancillary benefit exists in that reviewers become familiar with the changes, increasing their knowledge of the system. Reviewers can also act as mentors to the developers whose code is under review. Several different ways exist to perform code reviews:

Informal
Inspection Meeting
Change-based
Tool-supported

For informal reviews, developers either collaborate through email or in person to walk through the code. Inspection meetings are much more formal. In these cases, a developer will send out a notice with the proposed changes. Reviewers will then inspect the changes. A meeting will then occur where the developer will walk through the code and the reviewers will provide commentary. In change-based reviews as part of the process of updating code in a code management system (e.g., Git), other developers will review the changes and provide commentary in the change(pull) request or issue log. Reviewers can also utilize tools to support their reviews. These tools perform static analysis of the code to find potential problems (e.g., an input value used before any validation) or check the source code to ensure the code follows appropriate style guidelines.

26.9. Case Study: Revisiting the Mortgage Affordability#

In this section we revisit Mortgage Affordability Calculation from the Functions notebook. Our goal here is two-fold:

Making the code more robust through error handling and input validation
Throughly perform unit testing on the code

Below is the code updated to include error handling code and input validation. The code has been refactored(improved) to make testing easier. The input validation has been separated from receiving input from the user. This separation makes testing the validation feasible. The separation also allows the validation logic to be reused elsewhere. We have also encapsulated the processing code to return the output rather than directly printing the output. Again, this makes this portion easier to test. You will also notice that we have not tested input() and print() - as these are built-in Python functions, we assume that they work appropriately. Such assumptions may not hold true universally. In later notebooks, we will present the concept of “mocking” that allows validating results from built-in functions and externally developed code. You should also pay attention to how exceptions were tested and handled.

def input_annual_gross_income(): 
    while True:
        try:
            user_value = input("Enter your annual gross income:")
            return validate_positive_integer(user_value)
        except ValueError as ve:
            print(ve)
            

def validate_positive_integer(test_value):
    try:
        result = int(test_value)
        if result > 0:
            return result
        else:
            raise ValueError()   # we replace this value error with the one raised in the except clause
    except ValueError:
        raise ValueError("You must enter an integer greater than zero. Do not use any commas or other separating characters.")
    
def input_credit_score():
    while True:
        try:
            user_value = input("Enter your credit score (0-850):")
            return validate_integer_range(user_value,0,850)
        except ValueError as ve:
            print(ve)


def validate_integer_range(test_value,start_value,end_value):
    if end_value < start_value:
        raise RuntimeError("range specified in wrong order")
    try:
        result = int(test_value)
        if start_value <= result <= end_value:
            return result
        else:
            raise ValueError()     # replaced in except clause, which catches this one.
    except ValueError:
        raise ValueError("You must enter an integer between {:d} and {:d}. " \
                         "Do not use any commas or other separating characters.".format(start_value,end_value))


def input_down_payment():
    while True:
        try:
            user_value = input("Enter the amount of money you have available for a down payment:")
            return validate_positive_integer(user_value)
        except ValueError as ve:
            print(ve)


def is_eligible_for_loan(credit_score):
    return credit_score >= 500

def compute_annual_percentage_rate(credit_score, base_apr):
    # should check that the credit score is between 500 and 850
    if credit_score >= 700:
        return (850 - credit_score) * 0.0001 + base_apr
    elif credit_score >= 600:
        return (850 - credit_score) * 0.0002 + base_apr
    else:
        return (850 - credit_score) * 0.0003 + base_apr

def compute_max_payment(annual_gross_income):
    result = annual_gross_income * .28 / 12  # max monthly payment is 28% of income
    result = result - 1000/12                # substract homeowners insurance
    result = result - 5000/12                # substract property tax
    return result

def compute_principal(payment, terms_per_year, annual_interest_rate, years):
    result = payment * (1- (1 + annual_interest_rate/terms_per_year)**(-1 * years * terms_per_year))/ ( annual_interest_rate/terms_per_year)
    return result

def compute_max_home_price(principal, down_payment):
    return principal + down_payment

def create_message_max_home_purchase(amount,apr):
    return "Congratulations, you can afford a house worth ${:,.2f} with a {:.2f}% loan.".format(amount,apr*100)

def main():
    agi = input_annual_gross_income()
    credit_score = input_credit_score()
    down_payment = input_down_payment()
    output = process(agi,credit_score,down_payment)
    print(output)

def process(agi,credit_score,down_payment):
    base_apr = 0.03

    if not is_eligible_for_loan(credit_score):
        return "Your credit score is too low to qualify for a mortgage.";
    apr = compute_annual_percentage_rate(credit_score, base_apr)
    max_payment = compute_max_payment(agi)
    if (max_payment <= 0):
        if agi < 6000:
            return "You do not make enough to pay homeowners insurance and property taxes."
        else:
            return "You do not make enough money to qualify for a loan."
    loan_principal = compute_principal(max_payment,12, apr, 30)
    max_home_price = compute_max_home_price(loan_principal, down_payment)
    return create_message_max_home_purchase(max_home_price,apr)

class TestAffordabilityInputs(unittest.TestCase):
    def test_validate_positive_integer(self):
        error_message = "You must enter an integer greater than zero. Do not use any commas or other separating characters."

        with self.assertRaises(Exception) as context:
            validate_positive_integer("asd")
        self.assertTrue(type(context.exception) == ValueError,"Wrong exception type raised")
        self.assertEqual(error_message,context.exception.args[0])
        
        with self.assertRaises(Exception) as context:
            validate_positive_integer("0")     
        self.assertTrue(type(context.exception) == ValueError,"Wrong exception type raised")
        self.assertEqual(error_message,context.exception.args[0])
        
        self.assertEqual(validate_positive_integer(1),1)
        self.assertEqual(validate_positive_integer("100000"),100000)

    def test_validate_integer_range(self):
        error_message = "You must enter an integer between 0 and 850. Do not use any commas or other separating characters."

        with self.assertRaises(Exception) as context:
            validate_integer_range("asd",0,850)
        self.assertTrue(type(context.exception) == ValueError,"Wrong exception type raised")
        self.assertEqual(error_message,context.exception.args[0])
        
        with self.assertRaises(Exception) as context:
            validate_integer_range("-1",0,850)
        self.assertTrue(type(context.exception) == ValueError,"Wrong exception type raised")
        self.assertEqual(error_message,context.exception.args[0])

        with self.assertRaises(Exception) as context:
            validate_integer_range("851",0,850)
        self.assertTrue(type(context.exception) == ValueError,"Wrong exception type raised")
        self.assertEqual(error_message,context.exception.args[0])
        
        with self.assertRaises(Exception) as context:
            validate_integer_range("425",850,0)
        self.assertTrue(type(context.exception) == RuntimeError,"Wrong exception type raised")
        self.assertEqual("range specified in wrong order",context.exception.args[0])        
        
        self.assertEqual(validate_integer_range(0,0,850),0)
        self.assertEqual(validate_integer_range(1,0,850),1)
        self.assertEqual(validate_integer_range(425,0,850),425)
        self.assertEqual(validate_integer_range(850,0,850),850)
            

class TestAffordabilityLogic(unittest.TestCase):
    def test_is_eligible_for_loan(self):
        self.assertEqual(is_eligible_for_loan(499),False)
        self.assertEqual(is_eligible_for_loan(500),True)
        self.assertEqual(is_eligible_for_loan(501),True)
        
    def test_compute_annual_percentage_rate(self):
        base_rate = 0.03
        self.assertEqual(compute_annual_percentage_rate(850,base_rate),base_rate)
        self.assertEqual(compute_annual_percentage_rate(700,base_rate),base_rate+0.0150)
        self.assertEqual(compute_annual_percentage_rate(699,base_rate),base_rate+0.0302)
        self.assertEqual(compute_annual_percentage_rate(600,base_rate),base_rate+0.0500)
        self.assertAlmostEqual(compute_annual_percentage_rate(599,base_rate),base_rate+0.0753,4)
        self.assertAlmostEqual(compute_annual_percentage_rate(500,base_rate),base_rate+0.105,4)
        
    def test_compute_max_payment(self):
        self.assertEqual(compute_max_payment(0),-500)
        self.assertAlmostEqual(compute_max_payment(21428),-0.013,3)
        self.assertAlmostEqual(compute_max_payment(21429),0.01,2)        
        self.assertAlmostEqual(compute_max_payment(100000),1833.33,2)
        
    def test_compute_principal(self):
        self.assertAlmostEqual(compute_principal(1000,12,0.03,30),237189.38,2)  # manual calculate PV in MS excel
    
    def test_max_home_price(self):
        self.assertEqual(compute_max_home_price(1000,1000),2000)
        
    def test_create_message_max(self):
        message = "Congratulations, you can afford a house worth $461,828.79 with a 4.50% loan."
        self.assertEqual(create_message_max_home_purchase(461828.794444,0.045),message)

class TestAffordabilityProcess(unittest.TestCase):
    def test_credit_score_ineligible(self):
        self.assertEqual(process(100000,499,100000),"Your credit score is too low to qualify for a mortgage.")
    def test_income_to_low_for_ins_tax(self):
        self.assertEqual(process(5999,700,100000),"You do not make enough to pay homeowners insurance and property taxes.")
    def test_income_to_low_for_loan(self):
        self.assertEqual(process(6000,700,10000),"You do not make enough money to qualify for a loan.")
    def test_process_good(self):
        self.assertEqual(process(100000,700,100000),"Congratulations, you can afford a house worth $461,828.79 with a 4.50% loan.")

unittest.main(argv=['unittest','TestAffordabilityInputs','TestAffordabilityLogic','TestAffordabilityProcess'], 
              verbosity=2, exit=False)

test_validate_integer_range (__main__.TestAffordabilityInputs.test_validate_integer_range) ... 

ok

test_validate_positive_integer (__main__.TestAffordabilityInputs.test_validate_positive_integer) ... 

ok

test_compute_annual_percentage_rate (__main__.TestAffordabilityLogic.test_compute_annual_percentage_rate) ... 

ok

test_compute_max_payment (__main__.TestAffordabilityLogic.test_compute_max_payment) ... 

ok

test_compute_principal (__main__.TestAffordabilityLogic.test_compute_principal) ... 

ok

test_create_message_max (__main__.TestAffordabilityLogic.test_create_message_max) ... 

ok

test_is_eligible_for_loan (__main__.TestAffordabilityLogic.test_is_eligible_for_loan) ... 

ok

test_max_home_price (__main__.TestAffordabilityLogic.test_max_home_price) ...

ok

test_credit_score_ineligible (__main__.TestAffordabilityProcess.test_credit_score_ineligible) ... 

ok

test_income_to_low_for_ins_tax (__main__.TestAffordabilityProcess.test_income_to_low_for_ins_tax) ... 

ok

test_income_to_low_for_loan (__main__.TestAffordabilityProcess.test_income_to_low_for_loan) ... 

ok

test_process_good (__main__.TestAffordabilityProcess.test_process_good) ...

ok

----------------------------------------------------------------------
Ran 12 tests in 0.003s

OK

<unittest.main.TestProgram at 0x10c1250a0>

26.10. Suggested LLM Prompts#

Explain unit testing. Provide an example using Python’s unittest module that validates if a function to compute the month’s to pay off a given loan amount with a corresponding annual percentage rate?
What are the different types of testing? When does each of the fit into the software proceess. Provide relevant examples.
Explain the differences between whitebox and blackbox testing. Provide examples in Python from the financial domain. (Note: We use these terms due to their prevelance in the literature which should lead the LLM produce a better result rather than open and closed box testing.)
What test frameworks exist for Python? What are their advantages and disadvantages.
Write a tutorial on Python’s unittest module. Use accessing stock prices from an external service and calculating average daily returns as an example.
Produce 10 strategies or guidelines to create effective test cases. Provide examples in Python.
What are the key benefits of automated testing over manual testing? Discuss the trade-offs and challenges associated with implementing automated testing.
Explain the concept of equivalence partitioning in software testing. How does it help in designing effective test cases?
What are boundary conditions, and why are they important in software testing? Provide examples of boundary conditions for different types of input data. How do boundary conditions and equivalence partitions relate to each other?
Discuss the different types of test coverage metrics (e.g., statement coverage, branch coverage, path coverage) and their relative strengths and weaknesses. Provide a financial example in Python.
What is regression testing, and why is it important? Discuss techniques for effective regression testing in continuous integration and delivery environments.
Explain the role of static analysis tools in software testing. How can they help identify potential issues before runtime? Provide some example open tools available for Python.
Explain the differences between functional testing and non-functional testing. Provide examples of non-functional testing types (e.g., performance, security, usability).
Explain the importance of testing edge cases and corner cases in software testing. Provide examples of edge cases for different types of applications.
Discuss the role of code reviews in software testing. How can code reviews complement automated testing and help identify potential issues?

26.11. Review Questions#

What is the goal of testing?
Summarize the primary structure of test cases.
What are assertions? How do we utilize assertions in test cases? What does an assert statement do?
How do equivalence partitions and boundary value analysis help us create effective test cases?
How do reviews lead to high-quality software?
Why is coverage important in testing? What weaknesses does coverage have?
How does the number of paths through a control flow graph grow?
Why is it important for tests to be repeatable and automated?
Assume, you have a function that takes a list of numbers and returns the sum of all even numbers in the list. What test cases should be created for this function?
Assume you have a function takes a temperature in Celsius and returns the corresponding temperature in Fahrenheit. What test cases should be created for this function?
You have a function that checks the number of assets in an investment portfolio. The maximum number of assets allowed is twenty. What test cases should be created for this function?
You have a function that takes an applicant’s credit score and determines if they are eligible for a loan (credit score >= 650). The maximum credit score is 850. What test cases should be created for this function?
You have a function that takes a credit card number as input and returns True if it is a valid number, False otherwise. What test cases should be created for this function?
You have a function that takes an amount in one currency and converts it to another currency based on the current exchange rate (retrieved from an external source). What test cases should be created for this function?

answers

26.12. Exercises#

For the following problem, determine the equivalence classes and provide a representative value: For orders below $100,000 no discount is available. For orders up to $200,000, a discount of 10%. For orders up to $350,000, a discount of 15% is available. For orders over $350,000, a 20% is offered.
For the values in the previous problem, what order values are needed to test the boundary conditions?
What are the boundary values to test for the individual’s age must be between the ages of 18 and 65?
Write the function compute_price(order_total) and create a unittest class for both the boundary conditions and equivalence classes.
Write a function strip_digits(str) that removes any digits from the string parameter and returns the result. Write a unittest class for the function. You should test the following conditions:
- empty string
- string with no digits present
- string with only digits present
- string with a mixture of digits and other characters.
Given the following function, median(l), which returns the median value for the items in a list, create a unittest class that has 100% statement coverage and 100% decision coverage. How many paths are possible?

def median(l):
    """Finds the median value of the list"""
    if l:
        result = 0
        s_list = sorted(l)
        if len(s_list) % 2 == 1:
            result = s_list[len(s_list)//2] 
        else:
            print(len(s_list)//2)
            result = (s_list[len(s_list)//2 - 1] + s_list[len(s_list)//2])/2
        return result
    else:
        raise ValueError("list empty")

26.13. References#

[1] P. Bourque and R.E. Fairley, eds., Guide to the Software Engineering Body of Knowledge, Version 3.0, 2014. IEEE Computer Society, www.swebok.org