Skip to content

Use std::threads instead of openmp for parallelizing qsort #194

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

r-devulap
Copy link
Contributor

First stab at switching to std::threads to not have the dependency on openmp. These look like about 10-20% slower than the openmp version, not entirely sure why:

Benchmark                                                                     Time             CPU      Time Old      Time New       CPU Old       CPU New
----------------------------------------------------------------------------------------------------------------------------------------------------------
[simdsort/random_10m/int* vs. simdsort/random_10m/int*]64_t                +0.1881         -0.8570      36212571      43022493      36211824       5177321
[simdsort/random_10m/int* vs. simdsort/random_10m/int*]32_t                +0.1628         -0.8453      16560818      19257370      16560817       2561878
[simdsort/random_10m/int* vs. simdsort/random_10m/int*]16_t                +0.1600         -0.8628      11944131      13854631      11942415       1638489
OVERALL_GEOMEAN                                                            +0.1702         -0.8552             0             0             0             0

arrsize_t task_threshold;

threadmanager() {
#ifdef XSS_COMPILE_OPENMP
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, this needs a new macro.

@r-devulap r-devulap requested a review from Copilot May 5, 2025 03:51
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR eliminates the dependency on OpenMP by replacing parallelized qsort code with a std::thread implementation using a dedicated threadmanager.

  • Updated qsort_ and related functions to use std::thread and threadmanager instead of OpenMP tasks.
  • Removed conditional OpenMP code blocks and adjusted thread management in both common and AVX512-specific qsort headers.

Reviewed Changes

Copilot reviewed 3 out of 5 changed files in this pull request and generated 1 comment.

File Description
src/xss-common-qsort.h Replaced OpenMP tasks with std::thread calls and threadmanager usage.
src/xss-common-includes.h Added necessary thread and mutex includes; introduced threadmanager struct.
src/avx512-16bit-qsort.hpp Updated qsort calls to use threadmanager instead of OpenMP threshold.
Files not reviewed (2)
  • Makefile: Language not supported
  • scripts/bench-compare.sh: Language not supported

#ifdef XSS_COMPILE_OPENMP
max_thread_count = 8;
#else
max_thread_count = 0;
Copy link
Preview

Copilot AI May 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting max_thread_count to 0 in non-OpenMP builds disables parallel thread spawning. Consider initializing it with a non-zero value (e.g., based on std::thread::hardware_concurrency()) to enable proper multithreading.

Copilot uses AI. Check for mistakes.

@r-devulap
Copy link
Contributor Author

r-devulap commented May 8, 2025

Closing this in favor of #201

@r-devulap r-devulap closed this May 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant