Skip to content

qsv 3.3.0#216264

Merged
BrewTestBot merged 2 commits intomasterfrom
bump-qsv-3.3.0
Mar 24, 2025
Merged

qsv 3.3.0#216264
BrewTestBot merged 2 commits intomasterfrom
bump-qsv-3.3.0

Conversation

@BrewTestBot
Copy link
Copy Markdown
Contributor

Created by brew bump


Created with brew bump-formula-pr.

Details

release notes
# [3.3.0] - 2025-03-23

Highlights:

  • stats got another round of improvements:
    • boolean inferencing is now configurable!
      Before, it was limited to a simple, English-centric heuristic:
      • When a column's cardinality is 2; and the 2 values' first characters are 0/1, t/f or y/n case-insensitive, the data type of the column is inferred as boolean
      • With the new --boolean-patterns <arg> option, we can now specify arbitrary true_pattern:false_pattern pattern pairs. Each pattern can be a string of length > 1 and are case-insensitive. If a pattern ends with "*", it is treated as a prefix.
        For example, t*:f* matches "true", "Truthy", "T" as boolean true so long as the corresponding false pattern (e.g. "Fake, False, f") is also matched and the cardinality is 2.
        For backwards compatibility, the default true/false pairs are 1:0,t*:f*,y*:n*
    • percentiles can now be computed!
      By enabling the --percentiles flag, stats will now return the 5th, 10th, 40th, 60th, 90th and 95th percentile by default using the nearest-rank method for all numeric and date/datetime columns. The returned percentiles can be configured to return different percentiles using the --percentile-list <arg> option.
      Note that the method for computing quartiles (Method 3) is basically a specialized implementation of the nearest rank method for q1 (25th), q2 (50th or median) and q3 (75th percentile), thus the choice of non-overlapping defaults for --percentile-list.
  • frequency: got a performance boost now that we're using qsv-stats 0.32.0, which uses the faster foldhash crate
  • in the same vein, by replacing ahash with foldhash suite-wide, qsv got a tad faster when doing hash lookups
  • sample: "streaming" bernoulli sampling now works for any remotely hosted CSVs with servers that support chunked downloads, without requiring range request support.
  • we're now using the latest Polars engine - v0.46.0 at the py-1.26.0 tag.

Added

Changed

Fixed

Full Changelog: dathere/qsv@3.2.0...3.3.0

@github-actions github-actions bot added rust Rust use is a significant feature of the PR or issue bump-formula-pr PR was created using `brew bump-formula-pr` labels Mar 24, 2025
@github-actions
Copy link
Copy Markdown
Contributor

🤖 An automated task has requested bottles to be published to this PR.

@github-actions github-actions bot added the CI-published-bottle-commits The commits for the built bottles have been pushed to the PR branch. label Mar 24, 2025
@BrewTestBot BrewTestBot enabled auto-merge March 24, 2025 03:31
@BrewTestBot BrewTestBot added this pull request to the merge queue Mar 24, 2025
Merged via the queue into master with commit 00b9a17 Mar 24, 2025
14 checks passed
@BrewTestBot BrewTestBot deleted the bump-qsv-3.3.0 branch March 24, 2025 03:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bump-formula-pr PR was created using `brew bump-formula-pr` CI-published-bottle-commits The commits for the built bottles have been pushed to the PR branch. rust Rust use is a significant feature of the PR or issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants