multithreading - write - unit testing multithreaded code c++

How should I unit test threaded code? (17)

I have thus far avoided the nightmare that is testing multi-threaded code since it just seems like too much of a minefield. I'd like to ask how people have gone about testing code that relies on threads for successful execution, or just how people have gone about testing those kinds of issues that only show up when two threads interact in a given manner?

This seems like a really key problem for programmers today, it would be useful to pool our knowledge on this one imho.

(if possible) don't use threads, use actors / active objects. Easy to test.

Another way to (kinda) test threaded code, and very complex systems in general is through Fuzz Testing . It's not great, and it won't find everything, but its likely to be useful and its simple to do.


Fuzz testing or fuzzing is a software testing technique that provides random data("fuzz") to the inputs of a program. If the program fails (for example, by crashing, or by failing built-in code assertions), the defects can be noted. The great advantage of fuzz testing is that the test design is extremely simple, and free of preconceptions about system behavior.


Fuzz testing is often used in large software development projects that employ black box testing. These projects usually have a budget to develop test tools, and fuzz testing is one of the techniques which offers a high benefit to cost ratio.


However, fuzz testing is not a substitute for exhaustive testing or formal methods: it can only provide a random sample of the system's behavior, and in many cases passing a fuzz test may only demonstrate that a piece of software handles exceptions without crashing, rather than behaving correctly. Thus, fuzz testing can only be regarded as a bug-finding tool rather than an assurance of quality.

Concurrency is a complex interplay between the memory model, hardware, caches and our code. In the case of Java at least such tests have been partly addressed mainly by jcstress . The creators of that library are known to be authors of many JVM, GC and Java concurrency features.

But even this library needs good knowledge of the Java Memory Model specification so that we know exactly what we are testing. But I think the focus of this effort is mircobenchmarks. Not huge business applications.

For J2E code, I've used SilkPerformer, LoadRunner and JMeter for concurrency testing of threads. They all do the same thing. Basically, they give you a relatively simple interface for administrating their version of the proxy server, required, in order to analyze the TCP/IP data stream, and simulate multiple users making simultaneous requests to your app server. The proxy server can give you the ability to do things like analyze the requests made, by presenting the whole page and URL sent to the server, as well as the response from the server, after processing the request.

You can find some bugs in insecure http mode, where you can at least analyze the form data that is being sent, and systematically alter that for each user. But the true tests are when you run in https (Secured Socket Layers). Then, you also have to contend with systematically altering the session and cookie data, which can be a little more convoluted.

The best bug I ever found, while testing concurrency, was when I discovered that the developer had relied upon Java garbage collection to close the connection request that was established at login, to the LDAP server, when logging in. This resulted in users being exposed to other users' sessions and very confusing results, when trying to analyze what happened when the server was brought to it's knees, barely able to complete one transaction, every few seconds.

In the end, you or someone will probably have to buckle down and analyze the code for blunders like the one I just mentioned. And an open discussion across departments, like the one that occurred, when we unfolded the problem described above, are most useful. But these tools are the best solution to testing multi-threaded code. JMeter is open source. SilkPerformer and LoadRunner are proprietary. If you really want to know whether your app is thread safe, that's how the big boys do it. I've done this for very large companies professionally, so I'm not guessing. I'm speaking from personal experience.

A word of caution: it does take some time to understand these tools. It will not be a matter of simply installing the software and firing up the GUI, unless you've already had some exposure to multi-threaded programming. I've tried to identify the 3 critical categories of areas to understand (forms, session and cookie data), with the hope that at least starting with understanding these topics will help you focus on quick results, as opposed to having to read through the entire documentation.

Have a look at my related answer at

Designing a Test class for a custom Barrier

It's biased towards Java but has a reasonable summary of the options.

In summary though (IMO) its not the use of some fancy framework that will ensure correctness but how you go about designing you multithreaded code. Splitting the concerns (concurrency and functionality) goes a huge way towards raising confidence. Growing Object Orientated Software Guided By Tests explains some options better than I can.

Static analysis and formal methods (see, Concurrency: State Models and Java Programs ) is an option but I've found them to be of limited use in commercial development.

Don't forget that any load/soak style tests are rarely guaranteed to highlight problems.

Good luck!

I also had serious problems testing multi- threaded code. Then I found a really cool solution in "xUnit Test Patterns" by Gerard Meszaros. The pattern he describes is called Humble object .

Basically it describes how you can extract the logic into a separate, easy-to-test component that is decoupled from its environment. After you tested this logic, you can test the complicated behaviour (multi- threading, asynchronous execution, etc...)

I have faced this issue several times in recent years when writing thread handling code for several projects. I'm providing a late answer because most of the other answers, while providing alternatives, do not actually answer the question about testing. My answer is addressed to the cases where there is no alternative to multithreaded code; I do cover code design issues for completeness, but also discuss unit testing.

Writing testable multithreaded code

The first thing to do is to separate your production thread handling code from all the code that does actual data processing. That way, the data processing can be tested as singly threaded code, and the only thing the multithreaded code does is to coordinate threads.

The second thing to remember is that bugs in multithreaded code are probabilistic; the bugs that manifest themselves least frequently are the bugs that will sneak through into production, will be difficult to reproduce even in production, and will thus cause the biggest problems. For this reason, the standard coding approach of writing the code quickly and then debugging it until it works is a bad idea for multithreaded code; it will result in code where the easy bugs are fixed and the dangerous bugs are still there.

Instead, when writing multithreaded code, you must write the code with the attitude that you are going to avoid writing the bugs in the first place. If you have properly removed the data processing code, the thread handling code should be small enough - preferably a few lines, at worst a few dozen lines - that you have a chance of writing it without writing a bug, and certainly without writing many bugs, if you understand threading, take your time, and are careful.

Writing unit tests for multithreaded code

Once the multithreaded code is written as carefully as possible, it is still worthwhile writing tests for that code. The primary purpose of the tests is not so much to test for highly timing dependent race condition bugs - it's impossible to test for such race conditions repeatably - but rather to test that your locking strategy for preventing such bugs allows for multiple threads to interact as intended.

To properly test correct locking behavior, a test must start multiple threads. To make the test repeatable, we want the interactions between the threads to happen in a predictable order. We don't want to externally synchronize the threads in the test, because that will mask bugs that could happen in production where the threads are not externally synchronized. That leaves the use of timing delays for thread synchronization, which is the technique that I have used successfully whenever I've had to write tests of multithreaded code.

If the delays are too short, then the test becomes fragile, because minor timing differences - say between different machines on which the tests may be run - may cause the timing to be off and the test to fail. What I've typically done is start with delays that cause test failures, increase the delays so that the test passes reliably on my development machine, and then double the delays beyond that so the test has a good chance of passing on other machines. This does mean that the test will take a macroscopic amount of time, though in my experience, careful test design can limit that time to no more than a dozen seconds. Since you shouldn't have very many places requiring thread coordination code in your application, that should be acceptable for your test suite.

Finally, keep track of the number of bugs caught by your test. If your test has 80% code coverage, it can be expected to catch about 80% of your bugs. If your test is well designed but finds no bugs, there's a reasonable chance that you don't have additional bugs that will only show up in production. If the test catches one or two bugs, you might still get lucky. Beyond that, and you may want to consider a careful review of or even a complete rewrite of your thread handling code, since it is likely that code still contains hidden bugs that will be very difficult to find until the code is in production, and very difficult to fix then.

I have had the unfortunate task of testing threaded code and they are definitely the hardest tests I have ever written.

When writing my tests, I used a combination of delegates and events. Basically it is all about using PropertyNotifyChanged events with a WaitCallback or some kind of ConditionalWaiter that polls.

I am not sure if this was the best approach, but it has worked out for me.

I like to write two or more test methods to execute on parallel threads, and each of them make calls into the object under test. I've been using Sleep() calls to coordinate the order of the calls from the different threads, but that's not really reliable. It's also a lot slower because you have to sleep long enough that the timing usually works.

I found the Multithreaded TC Java library from the same group that wrote FindBugs. It lets you specify the order of events without using Sleep(), and it's reliable. I haven't tried it yet.

The biggest limitation to this approach is that it only lets you test the scenarios you suspect will cause trouble. As others have said, you really need to isolate your multithreaded code into a small number of simple classes to have any hope of thoroughly testing them.

Once you've carefully tested the scenarios you expect to cause trouble, an unscientific test that throws a bunch of simultaneous requests at the class for a while is a good way to look for unexpected trouble.

Update: I've played a bit with the Multithreaded TC Java library, and it works well. I've also ported some of its features to a .NET version I call TickingTest .

I spent most of last week at a university library studying debugging of concurrent code. The central problem is concurrent code is non-deterministic. Typically, academic debugging has fallen into one of three camps here:

  1. Event-trace/replay. This requires an event monitor and then reviewing the events that were sent. In a UT framework, this would involve manually sending the events as part of a test, and then doing post-mortem reviews.
  2. Scriptable. This is where you interact with the running code with a set of triggers. "On x > foo, baz()". This could be interpreted into a UT framework where you have a run-time system triggering a given test on a certain condition.
  3. Interactive. This obviously won't work in an automatic testing situation. ;)

Now, as above commentators have noticed, you can design your concurrent system into a more deterministic state. However, if you don't do that properly, you're just back to designing a sequential system again.

My suggestion would be to focus on having a very strict design protocol about what gets threaded and what doesn't get threaded. If you constrain your interface so that there is minimal dependancies between elements, it is much easier.

Good luck, and keep working on the problem.

If you are testing simple new Thread(runnable).run() You can mock Thread to run the runnable sequentially

For instance, if the code of the tested object invokes a new thread like this

Class TestedClass {
    public void doAsychOp() {
       new Thread(new myRunnable()).start();

Then mocking new Threads and run the runnable argument sequentially can help

private Thread threadMock;

public void myTest() throws Exception {
    //when new thread is created execute runnable immediately 
    PowerMockito.whenNew(Thread.class).withAnyArguments().then(new Answer<Thread>() {
        public Thread answer(InvocationOnMock invocation) throws Throwable {
            // immediately run the runnable
            Runnable runnable = invocation.getArgumentAt(0, Runnable.class);
            if(runnable != null) {
            return threadMock;//return a mock so Thread.start() will do nothing         
    TestedClass testcls = new TestedClass()
    testcls.doAsychOp(); //will invoke in current thread
    //.... check expected 

It's been a while when this question was posted, but it's still not answered ...

kleolb02 's answer is a good one. I'll try going into more details.

There is a way, which I practice for C# code. For unit tests you should be able to program reproducible tests, which is the biggest challenge in multithreaded code. So my answer aims toward forcing asynchronous code into a test harness, which works synchronously .

It's an idea from Gerard Meszardos's book " xUnit Test Patterns " and is called "Humble Object" (p. 695): You have to separate core logic code and anything which smells like asynchronous code from each other. This would result to a class for the core logic, which works synchronously .

This puts you into the position to test the core logic code in a synchronous way. You have absolute control over the timing of the calls you are doing on the core logic and thus can make reproducible tests. And this is your gain from separating core logic and asynchronous logic.

This core logic needs be wrapped around by another class, which is responsible for receiving calls to the core logic asynchronously and delegates these calls to the core logic. Production code will only access the core logic via that class. Because this class should only delegate calls, it's a very "dumb" class without much logic. So you can keep your unit tests for this asychronous working class at a minimum.

Anything above that (testing interaction between classes) are component tests. Also in this case, you should be able to have absolute control over timing, if you stick to the "Humble Object" pattern.

Testing MT code for correctness is, as already stated, quite a hard problem. In the end it boils down to ensuring that there are no incorrectly synchronised data races in your code. The problem with this is that there are infinitely many possibilities of thread execution (interleavings) over which you do not have much control (be sure to read this article, though). In simple scenarios it might be possible to actually prove correctness by reasoning but this is usually not the case. Especially if you want to avoid/minimize synchronization and not go for the most obvious/easiest synchronization option.

An approach that I follow is to write highly concurrent test code in order to make potentially undetected data races likely to occur. And then I run those tests for some time :) I once stumbled upon a talk where some computer scientist where showing off a tool that kind of does this (randomly devising test from specs and then running them wildly, concurrently, checking for the defined invariants to be broken).

By the way, I think this aspect of testing MT code has not been mentioned here: identify invariants of the code that you can check for randomly. Unfortunately, finding those invariants is quite a hard problem, too. Also they might not hold all the time during execution, so you have to find/enforce executions points where you can expect them to be true. Bringing the code execution to such a state is also a hard problem (and might itself incur concurrency issues. Whew, it's damn hard!

Some interesting links to read:

The following article suggests 2 solutions. Wrapping a semaphore (CountDownLatch) and adds functionality like externalize data from internal thread. Another way of achieving this purpose is to use Thread Pool (see Points of Interest).

Sprinkler - Advanced synchronization object

Tough one indeed! In my (C++) unit tests, I've broken this down into several categories along the lines of the concurrency pattern used:

  1. Unit tests for classes that operate in a single thread and aren't thread aware -- easy, test as usual.

  2. Unit tests for Monitor objects (those that execute synchronized methods in the callers' thread of control) that expose a synchronized public API -- instantiate multiple mock threads that exercise the API. Construct scenarios that exercise internal conditions of the passive object. Include one longer running test that basically beats the heck out of it from multiple threads for a long period of time. This is unscientific I know but it does build confidence.

  3. Unit tests for Active objects (those that encapsulate their own thread or threads of control) -- similar to #2 above with variations depending on the class design. Public API may be blocking or non-blocking, callers may obtain futures, data may arrive at queues or need to be dequeued. There are many combinations possible here; white box away. Still requires multiple mock threads to make calls to the object under test.

As an aside:

In internal developer training that I do, I teach the Pillars of Concurrency and these two patterns as the primary framework for thinking about and decomposing concurrency problems. There's obviously more advanced concepts out there but I've found that this set of basics helps keep engineers out of the soup. It also leads to code that is more unit testable, as described above.

You may use EasyMock.makeThreadSafe to make testing instance threadsafe

Pete Goodliffe has a series on the unit testing of threaded code.

It's hard. I take the easier way out and try to keep the threading code abstracted from the actual test. Pete does mention that the way I do it is wrong but I've either got the separation right or I've just been lucky.