multicore - tutorial - r parallel foreach

What parallel programming model do you recommend today to take advantage of the manycore processors of tomorrow? (7)

If you were writing a new application from scratch today, and wanted it to scale to all the cores you could throw at it tomorrow, what parallel programming model/system/language/library would you choose? Why?

I am particularly interested in answers along these axes:

  1. Programmer productivity / ease of use (can mortals successfully use it?)
  2. Target application domain (what problems is it (not) good at?)
  3. Concurrency style (does it support tasks, pipelines, data parallelism, messages...?)
  4. Maintainability / future-proofing (will anybody still be using it in 20 years?)
  5. Performance (how does it scale on what kinds of hardware?)

I am being deliberately vague on the nature of the application in anticipation of getting good general answers useful for a variety of applications.

Multi-core programming may actually require more than one paradigm. Some current contenders are:

  1. MapReduce. This works well where a problem can be easily decomposed into parallel chunks.
  2. Nested Data Parallelism. This is similar to MapReduce, but actually supports recursive decomposition of a problem, even when the recursive chunks are of irregular size. Look for NDP to be a big win in purely functional languages running on massively parallel but limited hardware (like GPUs).
  3. Software Transactional Memory. If you need traditional threads, STM makes them bearable. You pay a 50% performance hit in critical sections, but you can scale complex locking schemes to 100s of processors without pain. This will not, however, work for distributed systems.
  4. Parallel object threads with messaging. This really clever model is used by Erlang. Each "object" becomes a lightweight thread, and objects communicate by asynchronous messages and pattern matching. It's basically true parallel OO. This has succeeded nicely in several real-world applications, and it works great for unreliable distributed systems.

Some of these paradigms give you maximum performance, but only work if the problem decomposes cleanly. Others sacrifice some performance, but allow a wider variety of algorithms. I suspect that some combination of the above will ultimately become a standard toolkit.

The mapreduce/hadoop paradigm is useful, and relevant. Especially for people who are used to languages like perl, the idea of mapping over an array and doing some action on each element should come pretty fluidly and naturally, and mapreduce/hadoop just takes it to the next stage and says that there's no reason that each element of the array need be processed on the same machine.

In a sense it's more battle tested, because Google is using mapreduce and plenty of people have been using hadoop, and has shown that it works well for scaling applications across multiple machines over the network. And if you can scale over multiple machines across the network, you can scale over multiple cores in a single machine.

Two solutions I really like are join calculus (JoCaml, Polyphonic C#, ) and the actor model (Erlang, Scala, E, Io).

I'm not particularly impressed with Software Transactional Memory. It just feels like it's only there to allow threads to cling on to life a little while longer, even though they should have died decades ago. However, it does have three major advantages:

  1. People understand transactions in databases
  2. There is already talk of transactional RAM hardware
  3. As much as we all wish them gone, threads are probably going to be the dominant concurrency model for the next couple of decades, sad as that may be. STM could significantly reduce the pain.

You mentioned Java, but you only mention threads. Have you looked at Java's concurrent library? It comes bundled with Java 5 and above.

It's a very nice library containing ThreadPools, CopyOnWriteCollections to name a very few. Check out the documentation at the Java Tutorial. Or if you prefer, the Java docs.

D2010 Beta: The Perfect Way to support Multi-Core

Make it compatible with the Async calls in Delphi Prism, at least syntactically.

Multithreading Puzzles

There are a number of topics covered at this link.

Multithreaded Programming with ThreadMentor : A Tutorial


Here are some direct links to the problems listed at that link, along with their initial descriptions.

ThreadMentor : The Dining Philosopher's Problem
ThreadMentor : The Dining Philosopher's Problem: The Lefty-Righty Version

The dining philosophers problem is invented by E. W. Dijkstra. Imagine that five philosophers who spend their lives just thinking and easting. In the middle of the dining room is a circular table with five chairs. The table has a big plate of spaghetti. However, there are only five chopsticks available, as shown in the following figure. Each philosopher thinks. When he gets hungry, he sits down and picks up the two chopsticks that are closest to him. If a philosopher can pick up both chopsticks, he eats for a while. After a philosopher finishes eating, he puts down the chopsticks and starts to think.

ThreadMentor : The Cigarette Smoker's Problem

This problem is due to S. S. Patil in 1971. Suppose a cigarette requires three ingredients, tobacco, paper and match. There are three chain smokers. Each of them has only one ingredient with infinite supply. There is an agent who has infinite supply of all three ingredients. To make a cigarette, the smoker has tobacco (resp., paper and match) must have the other two ingredients paper and match (resp., tobacco and match, and tobacco and paper). The agent and smokers share a table. The agent randomly generates two ingredients and notifies the smoker who needs these two ingredients. Once the ingredients are taken from the table, the agent supplies another two. On the other hand, each smoker waits for the agent's notification. Once it is notified, the smoker picks up the ingredients, makes a cigarette, smokes for a while, and goes back to the table waiting for his next ingredients.

ThreadMentor : The Producer/Consumer (or Bounded-Buffer) Problem

Suppose we have a circular buffer with two pointers in and out to indicate the next available position for depositing data and the position that contains the next data to be retrieved. See the diagram below. There are two groups of threads, producers and consumers. Each producer deposits a data items into the in position and advances the pointer in, and each consumer retrieves the data item in position out and advances the pointer out.

ThreadMentor : The Roller Coaster Problem

Suppose there are n passengers and one roller coaster car. The passengers repeatedly wait to ride in the car, which can hold maximum C passengers, where C < n. However, the car can go around the track only when it is full. After finishes a ride, each passenger wanders around the amusement park before returning to the roller coaster for another ride. Due to safety reasons, the car only rides T times and then shot-off.

This one has additional constraints:

  1. The car always rides with exactly C passengers;
  2. No passengers will jump off the car while the car is running;
  3. No passengers will jump on the car while the car is running;
  4. No passengers will request another ride before they can get off the car.

ThreadMentor : The Bridge Problem

The description for this one relies on images. Here is a modified quote with image references removed.

Consider a narrow bridge that can only allow three vehicles in the same direction to cross at the same time. If there are three vehicles on the bridge, any incoming vehicle must wait until the bridge is clear.

When a vehicle exits the bridge, we have two cases to consider. Case 1, there are other vehicles on the bridge; and Case 2 the exiting vehicle is the last one on bridge. In the first case, one new vehicle in the same direction should be allowed to proceed.

Case 2 is more complicated and has two subcases. In this case, the exiting vehicle is the last vehicle on the bridge. If there are vehicles waiting in the opposite direction, one of them should be allowed to proceed. Or, if there is no vehicle waiting in the opposite direction, then let the waiting vehicle in the same direction to proceed.