Atilla Gunal and Rahul Patil, Microsoft
The non-linear interaction of many software components makes quality assurance a hard problem even for traditional serial code. Concurrency and interaction from multiple threads adds an additional temporal dimension to software complexity. This extra dimension introduces unique bug types such as deadlocks, livelocks and race conditions.
In this paper, we will describe how a solid stress framework, complete with integrated structured randomization and methodic meddling of temporal properties makes practical software quality assurance possible. Specifically, we discuss the methods and practices applied to provide solid assurance to a critical commercial component – the native Concurrency Runtime stack from Microsoft. First, by applying random distributions in individual tests and integrating such individual tests via a statistically fair scheduler, we describe how to cope with traversing the seemingly infinite interaction patterns. Second, we will expose how such testing helps identifying hangs stemming from deadlocks and livelocks. Thirdly, we will talk about methodically injecting randomization into the temporal properties of the software system, and how that can be used to assure us to find bugs with reasonable probabilistic expectation. We will conclude with a brief survey of the effectiveness of our stochastic stress framework over other tools.
2010 Technical Paper, Atilla Gunal and Rahul Patil, Abstract, Paper, Slides