Blog
std::vector & std::list - comparison
Stanisław Kubiak 15.11.2018
STL containers – the question about them does not seem to be absolutely basic. And yet – too often we’ve seen that approximation of the computational complexity of std::vector or memory complexity for std::list borders on a miracle. Therefore, I start the session with entries about the containers available in the standard library. On the first fire go two basic – the aforementioned vector and the list.
But let’s start with a quick reminder of how we classify the containers available in the standard library. Of course, the C++ standard will come to our aid – thanks to it everything will become very clear: http://eel.is/c++draft/containers.general#tab:containers.lib.summary – by the way, I have just proved that the C++ standard is very clear, simple and clear.
Ekhm..
tl, dr because I do not want to click on strange links – containers are divided into: sequential, associatic (sorted and unsorted) and adapters. We start the entry cycle with sequential entries. Specifically, two – std::list and std::vector.
std::letter
The idea of this container is very simple. We create separate list nodes and link them together with indicators.
Note – there is a single-bound (one-way) list – std::forward_list. The preceding element points to the next element. There is also a double-bound (two-way) list – std::list. In addition to the pointer on the next item, there is also a pointer on the preceding item.
The most important feature of the list is the fact that it is not held in a continuous memory area. When you request to add a new node to the list, new memory will be reserved – most likely in a completely different region. After a successful allocation, only the appropriate allocation of indicators will occur – all this happens with constant computational complexity: O(1). This applies to items that are added/removed from the list on any node in the list. Since deleting/adding an item does not affect the remaining elements (we only update the indicators) – we should conclude that delete/add operations do not affect iterators – they remain correct.
What about memory overhead? Most likely (because it depends on the implementation) each node in the list has a pointer to the next item, a two-way list, and a pointer to the preceding item. Of course, there is still an object stored on the list node. Thus, the overhead for a single node std::list in my<int> case is 24 bytes (you can verify it at home – either by reading the stdlib implementation or by calling the sizeof operator on an empty container) – I assume that we focus only on the default allocator.</int>
In summary, the memory overhead will be N*(sizeof(T) + 2*rozmiar_wskaźnika) – with the size of the indicator you can read here: https://dorwijnerda.pl/en/blog/how-much-does-it-pointer-weight/ is, of course, the number of elements.
Std::vector
It’s time for a second container.
It has a different premise – it is embedded in continuous areas of memory. This makes the memory overhead much smaller than that of std::list. Details already behind momencik, first dry facts.
std::vector can be equated with a dynamic array („array” of fixed size is std::array). The size increases as the program becomes available at the user’s request. There is some risk involved – when the number of declared objects in the container is exceeded, memory will be relocated. Existing objects are moved to a new memory location. By the way, all indicators, references and iterators indicating the „previous” vector will become obsolete („invalidated” – by the way: be careful in such a situation on ub). How much the container is enlarged depends on its implementation. As a rule, it is assumed that it is 150 – 200% of the previous size.
Add/remove an item at the beginning/end of a container has a deprezed fixed computational complexity: O(1).
Adding, deleting an item anywhere std::vector is already a topic a little more complicated. This involves having to move all the elements of the container before the item is deleted. Therefore, the pessimistic computational complexity is O(n).
The memory overhead for std::vector is minimal. There are 3 pointers outside the memory area for container elements (although this depends on the stdlib implementation). One pointing to the reserved memory area, the other to the beginning of the container, the third to its end. In summary, the memory overhead will be N*sizeof(T) + 3*rozmiar_wskaźnika.
I will mention one more specific specialization vector – std::vector<bool>.</bool> Because a bool variable is usually represented in numeric form (http://eel.is/c++draft/basic.fundamental#6, http://eel.is/c++draft/basic.fundamental#7), the user does not operate on specific bits. Instead, it works on proxy objects – they semantically simulate a bool variable. However, what is under the hood depends on the implementation of the standard library. Moreover, there is no requirement for reserved memory to be continuous, http://eel.is/c++draft/vector.bool#3.
Use of containers
When to select std::list/std::forward_list, and when to select std::vector?
The general rule is that std::vector should be the basic container you need.
Access to any element is constant over time, has a minimum memory overhead over what the usual array type offers. At the same time, it provides high performance – in a moment we will check why.
However, std::list should only be selected if you know that you will often add or remove items in the „middle” of the container. At the same time, you will not need to refer to different elements.
Measurements
So many theories, let’s see what it looks like in practice.
To begin with, let’s see how long it takes to find a particular item in rather small containers – say, 10 million integers.
<chrono>#include <iostream>#include <string>#include <vector>#include <list>#include <algorithm>#include <numeric>#include <random>#include namespace { constexpr auto size = 10'000'000; } template <typename callback=""> void measureRun(std::string what, callback&c) { const auto start = std::chrono::high_resolution_clock::now(); c(); const auto end = std::chrono::high_resolution_clock::now(); const std::chrono::d uration<double, std::milli=""> diff = end - start; std::cout < what < " took " < diff.count() < "" < std::endl; } int main() { std::vector <int>v(size); std::iota[ms](v.begin(), v.end(), 1); std::shuffle(v.begin(), v.end(), std::random_device{}); std::list<int> l(v.begin(), v.end()); const auto dummy_find = (auto const& container) { for (auto const& e : container) if (e == val) break; }[val = -1337]; std::cout < "starting... </int> </int> </double,> </typename> </random> </numeric> </algorithm> </list> </vector> </string> </iostream> </chrono> " < std::endl; for (std::size_t i = 0u; i < 10; ++i) measureRun("find for std::vector", [&] {dummy_find(v); }); for (std::size_t i = 0; i < 10; ++i) measureRun("find for std::list", [&] {dummy_find(l); }); return 0; }
Nothing special – ot, we create two containers. We randomly generate 10,000,000 numbers, load them into a vector, feed them a second container. Finally, we perform a complete review for the search item.
Average measurements from ten executions on my machine (i7-7820HQ@2.90 GHz, 32GB ram, O2 optimization compilation) give the following results: 23.2[ms]0 for std::vector and 134.56[ms] for std::list.
Why such a big difference?
Ano, because the vector resides in a continuous memory area.This results in the fact that once the loaded memory will be efficiently used by the L1 cache (my CPU has it in the amount of 256KB), each subsequent reference to the next element in the vector will take place almost immediately.
In std::list, on the other hand, due to the allocation of additional nodes in different areas of memory, it is doomed to fail. Given the measurement results, it should be assumed with virtually 100% certainty that each subsequent search for an item from the list was equivalent to a cache miss, and this is a loss of several hundred CPU cycles.
That is… for CPU eternity.
Minor modifications in measurements
Ok, we already know that all memory discontinuity is to blame.
Let’s gently modify the example from above. We will introduce a memory allocator that will make the memory continuous also for the list.
template <typename t=""> struct dummyAllocator { typedef T value_type; dummyAllocator() = default; T* allocate(std::size_t n) { T* res = reinterpret_cast<T*>(block.data() + cur_pos); cur_pos += n; return res; } void deallocate(T*, std::size_t n) noexcept { cur_pos -= n; } Private: static std::array<T, size=""> block; std::size_t cur_pos{ 0 }; }; template<typename t=""> std::array<T, size=""> dummyAllocator<T>::block = std::array<T, size="">();</T,> </T> </T,> </typename> </T,> </T*> </typename>
Note! – The above „allocator” is in no way a creation that should work on production. It is not a creation that should pass any code review. It is simply POC, created ad hoc. 🙂
int main() { std::vector<int> v(size); std::iota(v.begin(), v.end(), 1); std::shuffle(v.begin(), v.end(), std::random_device{}); std::vector <int, dummyAllocator <int>> v1(v.begin(), v.end()); const auto dummy_find = (auto co[val = 1337]nst& container) { for (auto const& e : container) if (e == val) break; }; std::cout < "starting... </int> </int> " < std::endl; for (auto i = 0; i < 10; ++i) { measureRun("find for std::vector", {[&]dummy_find(v); }); measureRun("find for std::vector with custom allocator", {du[&]mmy_find(v1); }); } v1.clear(); std::list <int>l(v.begin(), v.end()); std::letter <int, dummyAllocator <int>> l1(v.begin(), v.end()); for (auto i = < 10; ++i) { measureRun("find for std::list", {dummy_find[&](l); }); measureRun("find for std::list with custom allocator", {dummy_find(l1[&]); }); } return 0; } 0;</int> </int>
The time results are as follows (again, averaged 10 executions):
- std::vector: 15.018 [ms]
- std::vector<int, dummyallocator=””>: 15.78</int,> [ms]
- std::letter: 111.78 [ms]
- std::letter<int, dummyallocator=””>: 72.43</int,> [ms]
That is… vector is faster again. And that’s despite the constant memory area – were the people implementing stdlib so weak programmers?
Well, no.
The use of a continuous memory area for std::list results in a profit of about 35%. Is it a lot or not enough? Given that std::vector is still faster (although it is no longer a size difference), you may feel some understateness. However, given that list nodes are interconnected by indicators, this should be considered a pretty good result (albeit still far from ideal). The indicators that the list is based on add a markup in the form of a jump to a specific memory address. And it’s also deadly for the cache. Since the issues of writing applications that take full advantage of the benefits of the cache go well beyond this entry, I recommend scott meyers’ presentation of Code::D ive 2014: https://www.youtube.com/watch?v=WDIkqP4JbkE
Finish
To sum up – wondering which of the two std::vector and std::list to use, you should almost always use std::vector. It is fast, its memory overhead is minimal, it allows quick access to any element. It can be cpu cache friendly. However, remember that it also has its drawbacks (rather features) – removing or adding elements in the „middle” of the container is quite problematic. When you use the entire available memory pool for a specific instance of std::vector, expect to be relocated. Which, for sure, will be very slow.