Software Testing Veterans Emily B and Kent Beck Clarify Approval Tests: A Deep Dive into a Niche but Powerful Technique

In a recent discussion on the Modern Software Engineering channel, Emily B engaged Kent Beck in a critical conversation to demystify ‘approval testing’ and differentiate it from more commonly understood ‘acceptance tests.’ While acceptance testing is a broad, customer-perspective validation technique universally practiced, approval testing is presented as a distinct, specialized methodology. Originating perhaps as ‘text-based testing’ and later formalized around 2012, approval testing deviates from traditional assertion-based tests by replacing the ‘assert’ step with ‘print diff.’ This involves generating an output (e.g., text, PDF, UI strings, screenshots) and comparing it against a previously approved baseline, rather than a binary pass/fail assertion. Kent Beck revealed his extensive, 25-year history using similar ‘diffing’ techniques across various outputs, including life insurance annual reports and user interface trees, albeit without the specific ‘approval testing’ nomenclature.

Key characteristics of approval testing include its ‘three-state output’—a change can be a definite bug, an irrelevant diff to be filtered, or a desired change to be approved—contrasting with the binary red/green results of assertion-based tests. Advocates highlight its superior diagnosability, where diffs provide rich context for failure analysis, and the ease of updating tests when desired behavior changes (simply ‘approve’ the new baseline). However, challenges persist, notably the ‘long-term maintenance cost’ of reviewing numerous diffs, the risk of ‘approving all’ changes under pressure and missing latent bugs, and the need for robust tooling to group failures, filter irrelevant changes, and flag suspicious outputs (e.g., stack traces in approved results). Both Emily B and Kent Beck acknowledged the potential for generative AI to assist in interpreting test failures, particularly with text-based diffs, encouraging experimentation in this nascent field. The discussion also clarified that approval testing differs from ‘golden master’ and ‘snapshot testing’ by emphasizing active approval and care for the baseline, rather than implying immutable ‘golden’ results or ephemeral ‘snapshots’.