c++ implementation source - Why is std::map implemented as a red-black tree?

3 Answers

It really depends on the usage. AVL tree usually has more rotations of rebalancing. So if your application doesn't have too many insertion and deletion operations, but weights heavily on searching, then AVL tree probably is a good choice.

std::map uses Red-Black tree as it gets a reasonable trade-off between the speed of node insertion/deletion and searching.

how unordered_map in

Why is std::map implemented as a red-black tree?

There are several balanced binary search trees (BSTs) out there. What were design trade-offs in choosing a red-black tree?

The previous answers only address tree alternatives and red black probably only remains for historical reasons.

Why not a hash table?

In a tree a type requires only partial ordering (< comparison) to be used as a key in the map. However, hash tables require that each key type has a hash function defined. Keeping these type requirements to a minimum is very important for generic programming.

Designing a good hash table requires intimate knowledge of the context it which it will be used. Should it use open addressing, or linked chaining? What levels of load should it accept before resizing? Should it use an expensive hash that avoids collisions, or one that is rough and fast?

(C++11 did add hash tables with unordered_map. You can see from the documentation it requires setting policies to configure many of these options.)

Since the STL can't anticipate which is the best choice for your application, the default needs to be more flexible. Trees "just work" and scale nicely.

What about other trees?

Red Black tree's offer fast lookup and are self balancing, unlike BSTs. Another user pointed out its advantages over the self-balancing AVL tree.

Alexander Stepanov (The creator of STL) said that he would use a B* Tree instead of a Red-Black tree if he wrote std::map again, because it is more friendly for modern memory caches.

One of the biggest changes since then has been the growth of caches. Cache misses are very costly, so locality of reference is much more important now. Node-based data structures, which have low locality of reference, make much less sense. If I were designing STL today, I would have a different set of containers. For example, an in-memory B*-tree is a far better choice than a red-black tree for implementing an associative container. - Alexander Stepanov

Is red black tree or B* always the best?

On other occasions Alex has stated that std::vector is almost always the best list container for similar reasons. It rarely makes sense to use std::list or std::deque even for those situations we were taught in school (such as removing an element from the middle of the list). std::vector is so fast that it beats those structures for everything but large n.

Applying that same reasoning if you have only a small number of elements (hundreds?) using a std::vector and linear searching may be more efficient than the tree implementation of std::map. Depending on the frequency of insertion, a sorted std::vector combined with std::binary_search can be the fastest choice.

Update 2017-06-14: webbertiger edit its answer after I commented. I should point out that its answer is now a lot better to my eyes. But I kept my answer just as additional information...

Due to the fact that I think first answer is wrong (correction: not both anymore) and the third has a wrong affirmation. I feel I had to clarify things...

The 2 most popular tree are AVL and Red Black (RB). The main difference lie in the utilization:

  • AVL : Better if ratio of consultation (read) is bigger than manipulation (modification). Memory foot print is a little less than RB (due to the bit required for coloring).
  • RB : Better in general cases where there is a balance between consultation (read) and manipulation (modification) or more modification over consultation. A slightly bigger memory footprint due to the storing of red-black flag.

The main difference come from the coloring. You do have less re-balance action in RB tree than AVL because the coloring enable you to sometimes skip or shorten re-balance actions which have a relative hi cost. Because of the coloring, RB tree also have higher level of nodes because it could accept red nodes between black ones (having the possibilities of ~2x more levels) making search (read) a little bit less efficient... but because it is a constant (2x), it stay in O(log n).

If you consider the performance hit for a modification of a tree (significative) VS the performance hit of consultation of a tree (almost insignificant), it become natural to prefer RB over AVL for a general case.