# ordered-map **Repository Path**: zydltech/ordered-map ## Basic Information - **Project Name**: ordered-map - **Description**: fork from https://github.com/Tessil/ordered-map.git - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: https://github.com/Tessil/ordered-map.git - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 1 - **Created**: 2020-04-21 - **Last Updated**: 2020-12-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README [![Build Status](https://travis-ci.org/Tessil/ordered-map.svg?branch=master)](https://travis-ci.org/Tessil/ordered-map) [![Build status](https://ci.appveyor.com/api/projects/status/7fug7piv59d0in36/branch/master?svg=true)](https://ci.appveyor.com/project/Tessil/ordered-map/branch/master) ## C++ hash map and hash set which preserves the order of insertion The ordered-map library provides a hash map and a hash set which preserve the order of insertion in a way similar to Python's [OrderedDict](https://docs.python.org/3/library/collections.html#collections.OrderedDict). When iterating over the map, the values will be returned in the same order as they were inserted. The values are stored contiguously in an underlying structure, no holes in-between values even after an erase operation. By default a `std::deque` is used for this structure, but it's also possible to use a `std::vector`. This structure is directly accessible through the `values_container()` method and if the structure is a `std::vector`, a `data()` method is also provided to easily interact with C APIs. This provides fast iteration but with the drawback of an O(n) erase operation. An O(1) `pop_back()` and an O(1) `unordered_erase()` functions are available. To resolve collisions on hashes, the library uses linear robin hood probing with backward shift deletion. The library provides a behaviour similar to a `std::deque/std::vector` with unique values but with an average time complexity of O(1) for lookups and an amortised time complexity of O(1) for insertions. This comes at the price of a little higher memory footprint (8 bytes per bucket by default). Two classes are provided: `tsl::ordered_map` and `tsl::ordered_set`. **Note**: The library uses a power of two for the size of its buckets array to take advantage of the [fast modulo](https://en.wikipedia.org/wiki/Modulo_operation#Performance_issues). For good performances, it requires the hash table to have a well-distributed hash function. If you encounter performance issues check your hash function. ### Key features - Header-only library, just add the [include](include/) directory to your include path and you are ready to go. If you use CMake, you can also use the `tsl::ordered_map` exported target from the [CMakeLists.txt](CMakeLists.txt). - Values are stored in the same order as the insertion order. The library provides a direct access to the underlying structure which stores the values. - O(1) average time complexity for lookups with performances similar to `std::unordered_map` but with faster insertions and reduced memory usage (see [benchmark](https://tessil.github.io/2016/08/29/benchmark-hopscotch-map.html) for details). - Provide random access iterators and also reverse iterators. - Support for heterogeneous lookups allowing the usage of `find` with a type different than `Key` (e.g. if you have a map that uses `std::unique_ptr` as key, you can use a `foo*` or a `std::uintptr_t` as key parameter to `find` without constructing a `std::unique_ptr`, see [example](#heterogeneous-lookups)). - If the hash is known before a lookup, it is possible to pass it as parameter to speed-up the lookup (see `precalculated_hash` parameter in [API](https://tessil.github.io/ordered-map/classtsl_1_1ordered__map.html#a7fcde27edc6697a0b127f4b1aefa8a7d)). - Support for efficient serialization and deserialization (see [example](#serialization) and the `serialize/deserialize` methods in the [API](https://tessil.github.io/ordered-map/classtsl_1_1ordered__map.html) for details). - The library can be used with exceptions disabled (through `-fno-exceptions` option on Clang and GCC, without an `/EH` option on MSVC or simply by defining `TSL_NO_EXCEPTIONS`). `std::terminate` is used in replacement of the `throw` instruction when exceptions are disabled. - API closely similar to `std::unordered_map` and `std::unordered_set`. ### Differences compared to `std::unordered_map` `tsl::ordered_map` tries to have an interface similar to `std::unordered_map`, but some differences exist. - The iterators are `RandomAccessIterator`. - Iterator invalidation behaves in a way closer to `std::vector` and `std::deque` (see [API](https://tessil.github.io/ordered-map/classtsl_1_1ordered__map.html#details) for details). If you use `std::vector` as `ValueTypeContainer`, you can use `reserve()` to preallocate some space and avoid the invalidation of the iterators on insert. - Slow `erase()` operation, it has a complexity of O(n). A faster O(1) version `unordered_erase()` exists, but it breaks the insertion order (see [API](https://tessil.github.io/ordered-map/classtsl_1_1ordered__map.html#a9f94a7889fa7fa92eea41ca63b3f98a4) for details). An O(1) `pop_back()` is also available. - The equality oprators `operator==` and `operator!=` are order dependent. Two `tsl::ordered_map` with the same values but inserted in a different order don't compare equal. - For iterators, `operator*()` and `operator->()` return a reference and a pointer to `const std::pair` instead of `std::pair` making the value `T` not modifiable. To modify the value you have to call the `value()` method of the iterator to get a mutable reference. Example: ```c++ tsl::ordered_map map = {{1, 1}, {2, 1}, {3, 1}}; for(auto it = map.begin(); it != map.end(); ++it) { //it->second = 2; // Illegal it.value() = 2; // Ok } ``` - By default the map can only hold up to 232 - 1 values, that is 4 294 967 295 values. This can be raised through the `IndexType` class template parameter, check the [API](https://tessil.github.io/ordered-map/classtsl_1_1ordered__map.html#details) for details. - No support for some bucket related methods (like `bucket_size`, `bucket`, ...). Thread-safety guarantee is the same as `std::unordered_map` (i.e. possible to have multiple concurrent readers with no writer). Concerning the strong exception guarantee, it holds only if `ValueContainer::emplace_back` has the strong exception guarantee (which is true for `std::vector` and `std::deque` as long as the type `T` is not a move-only type with a move constructor that may throw an exception, see [details](http://en.cppreference.com/w/cpp/container/vector/emplace_back#Exceptions)). These differences also apply between `std::unordered_set` and `tsl::ordered_set`. ### Installation To use ordered-map, just add the [include](include/) directory to your include path. It is a **header-only** library. If you use CMake, you can also use the `tsl::ordered_map` exported target from the [CMakeLists.txt](CMakeLists.txt) with `target_link_libraries`. ```cmake # Example where the ordered-map project is stored in a third-party directory add_subdirectory(third-party/ordered-map) target_link_libraries(your_target PRIVATE tsl::ordered_map) ``` If the project has been installed through `make install`, you can also use `find_package(tsl-ordered-map REQUIRED)` instead of `add_subdirectory`. The code should work with any C++11 standard-compliant compiler and has been tested with GCC 4.8.4, Clang 3.5.0 and Visual Studio 2015. To run the tests you will need the Boost Test library and CMake. ```bash git clone https://github.com/Tessil/ordered-map.git cd ordered-map/tests mkdir build cd build cmake .. cmake --build . ./tsl_ordered_map_tests ``` ### Usage The API can be found [here](https://tessil.github.io/ordered-map/). ### Example ```c++ #include #include #include #include #include int main() { tsl::ordered_map map = {{'d', 1}, {'a', 2}, {'g', 3}}; map.insert({'b', 4}); map['h'] = 5; map['e'] = 6; map.erase('a'); // {d, 1} {g, 3} {b, 4} {h, 5} {e, 6} for(const auto& key_value : map) { std::cout << "{" << key_value.first << ", " << key_value.second << "}" << std::endl; } map.unordered_erase('b'); // Break order: {d, 1} {g, 3} {e, 6} {h, 5} for(const auto& key_value : map) { std::cout << "{" << key_value.first << ", " << key_value.second << "}" << std::endl; } for(auto it = map.begin(); it != map.end(); ++it) { //it->second += 2; // Not valid. it.value() += 2; } if(map.find('d') != map.end()) { std::cout << "Found 'd'." << std::endl; } const std::size_t precalculated_hash = std::hash()('d'); // If we already know the hash beforehand, we can pass it as argument to speed-up the lookup. if(map.find('d', precalculated_hash) != map.end()) { std::cout << "Found 'd' with hash " << precalculated_hash << "." << std::endl; } tsl::ordered_set, std::equal_to, std::allocator, std::vector> set; set.reserve(6); set = {'3', '4', '9', '2'}; set.erase('2'); set.insert('1'); set.insert('\0'); set.pop_back(); set.insert({'0', '\0'}); // Get raw buffer for C API: 34910 std::cout << atoi(set.data()) << std::endl; } ``` #### Heterogeneous lookup Heterogeneous overloads allow the usage of other types than `Key` for lookup and erase operations as long as the used types are hashable and comparable to `Key`. To activate the heterogeneous overloads in `tsl::ordered_map/set`, the qualified-id `KeyEqual::is_transparent` must be valid. It works the same way as for [`std::map::find`](http://en.cppreference.com/w/cpp/container/map/find). You can either use [`std::equal_to<>`](http://en.cppreference.com/w/cpp/utility/functional/equal_to_void) or define your own function object. Both `KeyEqual` and `Hash` will need to be able to deal with the different types. ```c++ #include #include #include #include struct employee { employee(int id, std::string name) : m_id(id), m_name(std::move(name)) { } // Either we include the comparators in the class and we use `std::equal_to<>`... friend bool operator==(const employee& empl, int empl_id) { return empl.m_id == empl_id; } friend bool operator==(int empl_id, const employee& empl) { return empl_id == empl.m_id; } friend bool operator==(const employee& empl1, const employee& empl2) { return empl1.m_id == empl2.m_id; } int m_id; std::string m_name; }; // ... or we implement a separate class to compare employees. struct equal_employee { using is_transparent = void; bool operator()(const employee& empl, int empl_id) const { return empl.m_id == empl_id; } bool operator()(int empl_id, const employee& empl) const { return empl_id == empl.m_id; } bool operator()(const employee& empl1, const employee& empl2) const { return empl1.m_id == empl2.m_id; } }; struct hash_employee { std::size_t operator()(const employee& empl) const { return std::hash()(empl.m_id); } std::size_t operator()(int id) const { return std::hash()(id); } }; int main() { // Use std::equal_to<> which will automatically deduce and forward the parameters tsl::ordered_map> map; map.insert({employee(1, "John Doe"), 2001}); map.insert({employee(2, "Jane Doe"), 2002}); map.insert({employee(3, "John Smith"), 2003}); // John Smith 2003 auto it = map.find(3); if(it != map.end()) { std::cout << it->first.m_name << " " << it->second << std::endl; } map.erase(1); // Use a custom KeyEqual which has an is_transparent member type tsl::ordered_map map2; map2.insert({employee(4, "Johnny Doe"), 2004}); // 2004 std::cout << map2.at(4) << std::endl; } ``` #### Serialization The library provides an efficient way to serialize and deserialize a map or a set so that it can be saved to a file or send through the network. To do so, it requires the user to provide a function object for both serialization and deserialization. ```c++ struct serializer { // Must support the following types for U: std::uint64_t, float // and std::pair if a map is used or Key for a set. template void operator()(const U& value); }; ``` ```c++ struct deserializer { // Must support the following types for U: std::uint64_t, float // and std::pair if a map is used or Key for a set. template U operator()(); }; ``` Note that the implementation leaves binary compatibilty (endianness, float binary representation, size of int, ...) of the types it serializes/deserializes in the hands of the provided function objects if compatibilty is required. More details regarding the `serialize` and `deserialize` methods can be found in the [API](https://tessil.github.io/ordered-map/classtsl_1_1ordered__map.html). ```c++ #include #include #include #include #include class serializer { public: explicit serializer(const char* file_name) { m_ostream.exceptions(m_ostream.badbit | m_ostream.failbit); m_ostream.open(file_name, std::ios::binary); } template::value>::type* = nullptr> void operator()(const T& value) { m_ostream.write(reinterpret_cast(&value), sizeof(T)); } void operator()(const std::pair& value) { (*this)(value.first); (*this)(value.second); } private: std::ofstream m_ostream; }; class deserializer { public: explicit deserializer(const char* file_name) { m_istream.exceptions(m_istream.badbit | m_istream.failbit | m_istream.eofbit); m_istream.open(file_name, std::ios::binary); } template T operator()() { T value; deserialize(value); return value; } private: template::value>::type* = nullptr> void deserialize(T& value) { m_istream.read(reinterpret_cast(&value), sizeof(T)); } void deserialize(std::pair& value) { deserialize(value.first); deserialize(value.second); } private: std::ifstream m_istream; }; int main() { const tsl::ordered_map map = {{1, -1}, {2, -2}, {3, -3}, {4, -4}}; const char* file_name = "ordered_map.data"; { serializer serial(file_name); map.serialize(serial); } { deserializer dserial(file_name); auto map_deserialized = tsl::ordered_map::deserialize(dserial); assert(map == map_deserialized); } { deserializer dserial(file_name); /** * If the serialized and deserialized map are hash compatibles (see conditions in API), * setting the argument to true speed-up the deserialization process as we don't have * to recalculate the hash of each key. We also know how much space each bucket needs. */ const bool hash_compatible = true; auto map_deserialized = tsl::ordered_map::deserialize(dserial, hash_compatible); assert(map == map_deserialized); } } ``` ##### Serialization with Boost Serialization and compression with zlib It's possible to use a serialization library to avoid the boilerplate. The following example uses Boost Serialization with the Boost zlib compression stream to reduce the size of the resulting serialized file. The example requires C++20 due to the usage of the template parameter list syntax in lambdas, but it can be adapted to less recent versions. ```c++ #include #include #include #include #include #include #include #include #include #include namespace boost { namespace serialization { template void serialize(Archive & ar, tsl::ordered_map& map, const unsigned int version) { split_free(ar, map, version); } template void save(Archive & ar, const tsl::ordered_map& map, const unsigned int /*version*/) { auto serializer = [&ar](const auto& v) { ar & v; }; map.serialize(serializer); } template void load(Archive & ar, tsl::ordered_map& map, const unsigned int /*version*/) { auto deserializer = [&ar]() { U u; ar & u; return u; }; map = tsl::ordered_map::deserialize(deserializer); } }} int main() { tsl::ordered_map map = {{1, -1}, {2, -2}, {3, -3}, {4, -4}}; const char* file_name = "ordered_map.data"; { std::ofstream ofs; ofs.exceptions(ofs.badbit | ofs.failbit); ofs.open(file_name, std::ios::binary); boost::iostreams::filtering_ostream fo; fo.push(boost::iostreams::zlib_compressor()); fo.push(ofs); boost::archive::binary_oarchive oa(fo); oa << map; } { std::ifstream ifs; ifs.exceptions(ifs.badbit | ifs.failbit | ifs.eofbit); ifs.open(file_name, std::ios::binary); boost::iostreams::filtering_istream fi; fi.push(boost::iostreams::zlib_decompressor()); fi.push(ifs); boost::archive::binary_iarchive ia(fi); tsl::ordered_map map_deserialized; ia >> map_deserialized; assert(map == map_deserialized); } } ``` ### License The code is licensed under the MIT license, see the [LICENSE file](LICENSE) for details.