# PPQSort **Repository Path**: kj-smart/PPQSort ## Basic Information - **Project Name**: PPQSort - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-12-05 - **Last Updated**: 2025-12-05 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README [![Standalone](https://github.com/GabTux/ppqsort_suite/actions/workflows/standalone.yml/badge.svg)](https://github.com/GabTux/ppqsort_suite/actions/workflows/standalone.yml) [![Install](https://github.com/GabTux/ppqsort_suite/actions/workflows/install.yml/badge.svg)](https://github.com/GabTux/ppqsort_suite/actions/workflows/install.yml) [![Tests](https://github.com/GabTux/ppqsort_suite/actions/workflows/tests.yml/badge.svg)](https://github.com/GabTux/ppqsort_suite/actions/workflows/tests.yml) [![codecov](https://codecov.io/github/GabTux/PPQSort/graph/badge.svg?token=K7UVUZ4N1N)](https://codecov.io/github/GabTux/PPQSort) # PPQSort (Parallel Pattern QuickSort) Parallel Pattern Quicksort (PPQSort) is a **efficient implementation of parallel quicksort algorithm**, written by using **only** the C++20 features without using third party libraries (such as Intel TBB). PPQSort draws inspiration from [pdqsort](https://github.com/orlp/pdqsort), [BlockQuicksort](https://github.com/weissan/BlockQuicksort) and [cpp11sort](https://gitlab.com/daniel.langr/cpp11sort) and adds some further optimizations. * **Focus on ease of use:** Using only C++20 features, header only implementation, user-friendly API. * **Comprehensive test suite:** Ensures correctness and robustness through extensive testing. * **Benchmarks shows great performance:** Achieves impressive sorting times on various machines. # Integration PPQSort is header only implementation. All the files needed are in include directory. Add to existing CMake project using [CPM.cmake](https://github.com/cpm-cmake/CPM.cmake): ```CMake include(cmake/CPM.cmake) CPMAddPackage( NAME PPQSort GITHUB_REPOSITORY GabTux/PPQSort VERSION 1.0.6 # change this to latest commit or release tag ) target_link_libraries(YOUR_TARGET PPQSort::PPQSort) ``` Alternatively use FetchContent or just checkout the repository and add the include directory to the linker flags. # Usage PPQSort has similiar API as std::sort, you can use `ppqsort::execution::` policies to specify how the sort should run. ```cpp // run parallel ppqsort::sort(ppqsort::execution::par, input.begin(), input.end()); // Specify number of threads ppqsort::sort(ppqsort::execution::par, input.begin(), input.end(), 16); // provide custom comparator ppqsort::sort(ppqsort::execution::par, input.begin(), input.end(), cmp); // force branchless variant ppqsort::sort(ppqsort::execution::par_force_branchless, input_str.begin(), input_str.end(), cmp); ``` PPQSort will by default use C++ threads, but if you prefer, you can link it with OpenMP and it will use OpenMP as a parallel backend. However you can still enforce C++ threads parallel backend even if linked with OpenMP: ```cpp #define FORCE_CPP #include // ... rest of the code ... ``` # Benchmark We compared PPQSort with various parallel sorts. Benchmarks shows, that the PPQSort is one of the fastest parallel sorting algorithms across various input data and different machines. | Name | Algorithm | Memory usage | External dependencies | Highlight | |---------------------------------|-------------------|--------------|----------------------------------------------------------------------|-----------------------------------------------------| | PPQSort | Quicksort | in-place | **None** | parallel pattern quicksort algorithm | | [GCC BQS](https://gcc.gnu.org/onlinedocs/libstdc++/manual/parallel_mode_design.html) | Quicksort | in-place | OpenMP | allocating threads proportionally to subtask sizes | | [cpp11sort](https://gitlab.com/daniel.langr/cpp11sort) | Quicksort | in-place | **None** | Header-only, C++11 compliant | | oneTBB [parallel_sort](https://spec.oneapi.io/versions/latest/elements/oneTBB/source/algorithms/functions/parallel_sort_func.html#parallel-sort) | quicksort | out-place | oneTBB | Splits input to small tasks | | [poolSTL](https://github.com/alugowski/poolSTL) sort | Quicksort | in-place | **None** | Header-only, C++17 compliant | | Boost [block_indirect_sort](https://www.boost.org/doc/libs/develop/libs/sort/doc/html/sort/parallel.html#sort.parallel.block_indirect_sort) | merging algorithm | out-place | Boost | Upper bounded small memory usage | | [AQsort](https://github.com/DanielLangr/AQsort) | Quicksort | in-place | OpenMP | Allows the sorting of multiple datasets at once | | [MPQsort](https://github.com/voronond/MPQsort/) | Quicksort | in-place | OpenMP | Multiway Parallel Quicksort | | [IPS4o](https://github.com/ips4o/ips4o) | Samplesort | in-place | oneTBB | Divides data into buckets and sort them recursively | ## Running on ARM cluster * Fujitsu A64FX CPU * NUMA architecture, 48 cores (4CPUs x 12cores) Results for **INT**, input size was **2e9** (2 billions): ![arm_patterns_ext](https://github.com/GabTux/PPQSort/assets/24779546/95741ffe-d710-4360-afac-fa5dce3c50c1) | Algorithm | Random | Ascending | Descending | Rotated | OrganPipe | Heap | Total | Rank | |-------------|----------------------------------|-------------------------------------|--------------------------------------|-----------------------------------|-------------------------------------|--------------------------------|---------------------------------|--------------------------------| | PPQSort C++ | 5.84s | 1.84s | 4.55s | **1.38s** | **2.96s** | 5.58s | **22.15s** | **1** | | GCC BQS | 13.72s | 4.18s | 19.11s | 49.89s | 8.24s | 13.78s | 108.92s | 6 | | oneTBB | 43.66s | **0.09s** | 8.62s | 13.84s | 8.12s | 43.9s | 118.23s | 9 | | poolSTL | 34.63s | 5.61s | 7.23s | 14.78s | 7.81s | 46.88s | 116.94s | 7 | | MPQsort | 13.35s | 5.74s | 5.77s | 4.67s | 7.71s | 12.87s | 50.11s | 5 | | cpp11sort | 9.58s | 2.47s | **2.66s** | 5.47s | 3.42s | 9.9s | 33.5s | 3 | | AQsort | 24.72s | 3.66s | 23.14s | 21.83s | 22.6s | 25.31s | 121.26s | 8 | | Boost | 8.2s | 3.0s | 4.26s | 13.96s | 6.97s | 7.92s | 44.31s | 4 | | IPS$^4$o | **4.8s** | 0.19s | 5.97s | 5.21s | 5.59s | **4.91s** | 26.67s | 2 | ## Summary Extended benchmarks (detailed in forthcoming paper) shows that IPS4o (https://github.com/ips4o) often surpasses PPQSort in raw speed. However, IPS4o relies on the external library oneTBB (https://github.com/oneapi-src/oneTBB) introducing integration complexities. PPQSort steps up as a compelling alternative due to its: * **Competitive Speed:** Delivers performance comparable to IPS4o on most machines. * **Hardware Agnostic:** Maintains strong performance across various hardware, potentially surpassing IPS4o on specific systems, especially ARM platforms. * **Dependency-Free:** No external libraries are required, simplifying integration. For applications demanding a fast, dependency-free parallel sorting solution, PPQSort is an excellent choice. PPQSort is also used in [tracy](https://github.com/wolfpld/tracy) project. # Running Tests and Benchmarks Bash script for running or building specific components: ```bash $ scripts/build.sh all ... $ scripts/run.sh standalone ... ``` Note that the benchmark's CMake file will by default download sparse matrices (around 26GB). # Implementation Details about the implementation and benchmark results can be found in the following resources: - [IT SPY Poster](https://www.itspy.cz/wp-content/uploads/2024/09/it_spy_2024_informacni_letak_37.pdf) - This is a brief summary poster. - [Full master thesis text](https://dspace.cvut.cz/bitstream/handle/10467/114740/F8-DP-2024-Hevr-Gabriel-thesis.pdf?sequence=-1&isAllowed=y) - a very detailed text, it also includes additional information not directly related to PPQSort, but related to parallel sorting. - [Conference paper](https://link.springer.com/chapter/10.1007/978-3-031-85697-6_10) - Presented on [PPAM](https://ppam.edu.pl/) (Parallel Processing & Applied Mathematics) - Another forthcoming publication in a Concurrency and Computation: Practice and Experience journal.