forked from facebookincubator/velox
-
Notifications
You must be signed in to change notification settings - Fork 3
Add velox-cudf support for AssignUniqueId #62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
karthikeyann
wants to merge
261
commits into
rapidsai:velox-cudf
Choose a base branch
from
karthikeyann:fea-CudfAssignUniqueId
base: velox-cudf
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Add velox-cudf support for AssignUniqueId #62
karthikeyann
wants to merge
261
commits into
rapidsai:velox-cudf
from
karthikeyann:fea-CudfAssignUniqueId
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/ok to test 17269f3 |
Summary: Pull Request resolved: facebookincubator#14671 Added MakeRowFromMap utility class to project specified keys from a vector of MAP type into a RowVector with named fields. Takes a list of keys to extract (keysToProject) and corresponding output field names (outputFieldNames), with options to replace nulls with type-specific defaults, allow top-level nulls in the output RowVector, and control duplicate key handling. Optinally accepts exec::EvalCtx for use in Vector Functions employ expression evaluation specific behavior like per-row error handling. Currently only supports SMALLINT, INTEGER, and BIGINT keys. bypass-github-export-checks Reviewed By: mbasmanova Differential Revision: D81465416 fbshipit-source-id: be1189f0d0d75a008b4948d4718fa910daa3fe85
… compatibility (facebookincubator#14798) Summary: $USERNAME is not the standard variable. It is empty on MacOS. Pull Request resolved: facebookincubator#14798 Reviewed By: kagamiori Differential Revision: D82029919 Pulled By: Yuhta fbshipit-source-id: aa2577215a6f54d50c484324b96f065be7fada07
…cubator#14780) Summary: The 'buffers' variable is unused, so remove it. Pull Request resolved: facebookincubator#14780 Reviewed By: kagamiori Differential Revision: D82030707 Pulled By: Yuhta fbshipit-source-id: 85ba4920ae394971b015cca800e8f23586871f60
Summary: This implementation aligns with https://github.com/apache/iceberg/blob/main/spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/functions/TruncateFunction.java And the test is ported from https://github.com/apache/iceberg/blob/main/spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/sql/TestSparkTruncateFunction.java Description: https://iceberg.apache.org/spec/#truncate-transform-details Pull Request resolved: facebookincubator#14774 Reviewed By: xiaoxmeng Differential Revision: D82030041 Pulled By: Yuhta fbshipit-source-id: 98d57c89901e29538e6a6658f1116b6f59be68bf
Summary: Pull Request resolved: facebookincubator#14841 feat: Add counter for nimble and dwrf writer Reviewed By: tanjialiang Differential Revision: D82275723 fbshipit-source-id: 1608fe29f40904be5fc1e7286702bc8ea7b6b681
…#10138) Summary: Support Spark `array_sort` to allow sorting elements with `lambda` function. Since Spark has different comparison implementation with Presto, Presto's `array_sort` implementation is refactored for Spark to rewrite `lambda` function as a simple comparator if possible. This pr tries to: 1. Move Presto `array_sort` to `velox/functions/lib`. 2. Add a new option `nullsFirst` to support nulls to be placed at the start of the array (to support Spark function `sort_array`). 4. Extract the common logic of `SimpleComparisonMatcher` and move it to `velox/functions/lib`, and create different `SimpleComparisonChecker` for Spark and Presto for the comparison match( e.g, `=` is `eq` in Presto, but `equalto` in Spark). 6. Add tests to cover Spark rewrite function logic. Pull Request resolved: facebookincubator#10138 Reviewed By: Yuhta Differential Revision: D71047836 Pulled By: kagamiori fbshipit-source-id: 80ff84670985f4000308f509841656f388702cc5
…ncubator#14676) Summary: Pull Request resolved: facebookincubator#14676 Reviewed By: kagamiori Differential Revision: D82030005 Pulled By: Yuhta fbshipit-source-id: f656bf304e16856916d77e89b942647d8b0820db
Summary: Fixes the below compilation error. ``` velox/experimental/cudf/connectors/parquet/ParquetDataSink.cpp:145:7: error: 'commitStrategyToString' was not declared in this scope; did you mean 'commitStrategy_'? 145 | commitStrategyToString(commitStrategy_)); | ^~~~~~~~~~~~~~~~~~~~~~ ``` Follow-up for facebookincubator@99fe06a. Pull Request resolved: facebookincubator#14799 Reviewed By: kagamiori Differential Revision: D82029874 Pulled By: Yuhta fbshipit-source-id: ea537a07e8d2625ebc0b5fddea2dea8b0b098de8
…kincubator#14848) Summary: X-link: facebookincubator/axiom#394 Pull Request resolved: facebookincubator#14848 Continuation of facebookincubator#14784 bypass-github-export-checks Reviewed By: Yuhta Differential Revision: D82289830 fbshipit-source-id: b12e1042d412c0bc992655665b4363bb614398b9
…facebookincubator#14843) Summary: Fixes facebookincubator#14842 This failed static assertion with libstdc++ 15. See also the error log in the associated issue. Pull Request resolved: facebookincubator#14843 Reviewed By: xiaoxmeng Differential Revision: D82329227 Pulled By: kagamiori fbshipit-source-id: 6d85d572bc8564d0e4bc319b569c23851f595deb
…ebookincubator#14825) Summary: Pull Request resolved: facebookincubator#14825 Current implementation of the AlignedBuffer::allocate does not allocate the exact size by default. For some cases, when the buffer size is known beforehand, this leads to significant memory overconsumption because the buffer allocates the best size suggested by the MemoryPool::getPreferredSize. To avoid that overallocation a new allocateExact parameter was recently added to the AlignedBuffer::allocate. However, usage of this parameter is a bit clanky. To make the API call more verbose I introduce a new helper function AlignedBuffer::allocateExact, that is simply a verbose wrapper around AlignedBuffer::allocate. For reference, here are current ranges produced by MemoryPool::getPreferredSize: ``` 1 - 8 = 8 9 - 12 = 12 13 - 16 = 16 17 - 24 = 24 25 - 32 = 32 33 - 48 = 48 49 - 64 = 64 65 - 96 = 96 97 - 128 = 128 129 - 192 = 192 193 - 256 = 256 257 - 384 = 384 385 - 512 = 512 513 - 768 = 768 769 - 1,024 = 1,024 1,025 - 1,536 = 1,536 1,537 - 2,048 = 2,048 2,049 - 3,072 = 3,072 3,073 - 4,096 = 4,096 4,097 - 6,144 = 6,144 6,145 - 8,192 = 8,192 8,193 - 12,288 = 12,288 12,289 - 16,384 = 16,384 16,385 - 24,576 = 24,576 24,577 - 32,768 = 32,768 32,769 - 49,152 = 49,152 49,153 - 65,536 = 65,536 65,537 - 98,304 = 98,304 98,305 - 131,072 = 131,072 131,073 - 196,608 = 196,608 196,609 - 262,144 = 262,144 262,145 - 393,216 = 393,216 393,217 - 524,288 = 524,288 524,289 - 786,432 = 786,432 786,433 - 1,048,576 = 1,048,576 1,048,577 - 1,572,864 = 1,572,864 1,572,865 - 2,097,152 = 2,097,152 2,097,153 - 3,145,728 = 3,145,728 3,145,729 - 4,194,304 = 4,194,304 4,194,305 - 6,291,456 = 6,291,456 6,291,457 - 8,388,608 = 8,388,608 8,388,609 - 12,582,912 = 12,582,912 12,582,913 - 16,777,216 = 16,777,216 16,777,217 - 25,165,824 = 25,165,824 25,165,825 - 33,554,432 = 33,554,432 33,554,433 - 50,331,648 = 50,331,648 50,331,649 - 67,108,864 = 67,108,864 67,108,865 - 100,663,296 = 100,663,296 100,663,297 - 134,217,728 = 134,217,728 134,217,729 - 201,326,592 = 201,326,592 201,326,593 - 268,435,456 = 268,435,456 268,435,457 - 402,653,184 = 402,653,184 402,653,185 - 536,870,912 = 536,870,912 536,870,913 - 805,306,368 = 805,306,368 805,306,369 - 1,073,741,824 = 1,073,741,824 1,073,741,825 - 1,610,612,736 = 1,610,612,736 1,610,612,737 - 2,147,483,648 = 2,147,483,648 2,147,483,649 - 3,221,225,472 = 3,221,225,472 3,221,225,473 - 4,294,967,296 = 4,294,967,296 ``` Reviewed By: Yuhta Differential Revision: D82167134 fbshipit-source-id: 4733be75c39ca90a1fead0faa0c44ec12b80bae9
… join conditions (facebookincubator#14837) Summary: Pull Request resolved: facebookincubator#14837 Add filter support in index join for filter which can't be converted into join conditions that can be pushdown to index source. The filter is executed on the lookup result before left join processing. This is to enable Meta AI data exploration query shapes The followup is to consider use join match tracker inside index lookup to handle this logic to be consistent with other join type implementations. Reviewed By: zacw7 Differential Revision: D82149399 fbshipit-source-id: efa942e1d8c58f08dbe51d955f91858367bdfe6c
…incubator#14658) Summary: Expression rewrites are currently defined in `VectorFunction.cpp`, rewrites are registered only for scalar functions, and rewrites are applied during expression compilation. In the upcoming `ExpressionOptimizer` work (facebookincubator#14523), we intend to build on the existing `ExpressionRewrite` support and introduce more expression rewrites for special form expressions. This refactor formalizes the existing `ExpressionRewrite` framework so it can be expanded upon in the `ExpressionOptimizer`. Pull Request resolved: facebookincubator#14658 Reviewed By: mbasmanova Differential Revision: D82252841 Pulled By: kagamiori fbshipit-source-id: 0f9007c3ae9e4dbdd58cefe7397874998841c295
Summary: Pull Request resolved: facebookincubator#14836 This change adds support for TIME type in Velox. Support for casting TIME along with support in simple function interface and basic UDF's supporting TIME will come in subsequent diffs. See Issue: facebookincubator#14633 Reviewed By: kevinwilfong Differential Revision: D81811610 fbshipit-source-id: ceb6c324c7ce4dc5bf761e77647283fbe72a4104
Summary: Register cudf in executor, and may disable cudf in task plan level. For several plans, if the plan cannot fully be executed in GPU, we will not offload this stage to GPU to avoid format conversion cost, so we need this config to disable CUDF driver adapter. Pull Request resolved: facebookincubator#14216 Reviewed By: Yuhta Differential Revision: D82322695 Pulled By: kagamiori fbshipit-source-id: 110e209a38a072ec6170e80fa26b804205ce7c42
Summary: With the C++20 change using volatile is an error. There was one benchmark that did use this. but it didn’t come up in the CI which didn’t seem to build the benchmarks at all. As a result the CI needed a fix to ensure proper compilation. Pull Request resolved: facebookincubator#14552 Reviewed By: bikramSingh91 Differential Revision: D82456701 Pulled By: kgpai fbshipit-source-id: e3c68d3c8be2aaf200390cba993a3ebadb4d3408
…ncubator#14863) Summary: Pull Request resolved: facebookincubator#14863 These constants will be used in spatial joins, which won't actually need the full power of GEOS. Extracting these means we can keep the join code simpler and agnostic to the join filter, and avoid conditional compilation flags for GEOS. In the future we can extract more constants to GeometryConstants if desired. Reviewed By: bikramSingh91 Differential Revision: D82451199 fbshipit-source-id: 907a19f9d97573a663ab6a07b2a6c9963c83797c
…cebookincubator#14767) Summary: Bumps [pypa/gh-action-pypi-publish](https://github.com/pypa/gh-action-pypi-publish) from 1.12.4 to 1.13.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/pypa/gh-action-pypi-publish/releases">pypa/gh-action-pypi-publish's releases</a>.</em></p> <blockquote> <h2>v1.13.0</h2> <blockquote> <p>[!important] 🚨 This release includes fixes for <a href="https://github.com/pypa/gh-action-pypi-publish/security/advisories/GHSA-vxmw-7h4f-hqxh">GHSA-vxmw-7h4f-hqxh</a> discovered by <a href="https://github.com/woodruffw"><code>@woodruffw</code></a><a href="https://github.com/sponsors/woodruffw">💰</a>. We've also integrated <a href="http://zizmor.sh">Zizmor</a> to catch similar issues in the future and you should too.</p> </blockquote> <h2>✨ New Stuff</h2> <p><a href="https://github.com/woodruffw"><code>@woodruffw</code></a><a href="https://github.com/sponsors/woodruffw">💰</a> updated the README to no longer mention the attestations feature being experimental in <a href="https://redirect.github.com/pypa/gh-action-pypi-publish/issues/347">https://github.com/facebookincubator/velox/issues/347</a>: it's been rather stable for a year already 🎉 He also added more diagnostic output which includes printing out the GitHub Environment claim via <a href="https://redirect.github.com/pypa/gh-action-pypi-publish/issues/371">https://github.com/facebookincubator/velox/issues/371</a> and warning about the unsupported reusable workflows configurations <a href="https://redirect.github.com/pypa/gh-action-pypi-publish/issues/306">https://github.com/facebookincubator/velox/issues/306</a>, when using Trusted Publishing.</p> <blockquote> <p>[!tip] The official support for reusable workflows is currently blocked on changes to PyPI. To get updates about progress on the action side, you may want to subscribe to <a href="https://redirect.github.com/pypa/gh-action-pypi-publish/issues/166">https://github.com/facebookincubator/velox/issues/166</a>. At PyCon US 2025 Sprints, <a href="https://github.com/facutuesca"><code>@facutuesca</code></a><a href="https://github.com/sponsors/facutuesca">💰</a>, <a href="https://github.com/miketheman"><code>@miketheman</code></a><a href="https://github.com/sponsors/miketheman">💰</a>, <a href="https://github.com/woodruffw"><code>@woodruffw</code></a><a href="https://github.com/sponsors/woodruffw">💰</a> and I<a href="https://github.com/sponsors/webknjaz">💰</a> spent several hours IRL brainstorming how to fix this and migrate projects that happen to rely on an obscure corner case with reusable workflows that temporarily allows them to function by accident. The result of that discussion is posted @ <a href="https://redirect.github.com/pypi/warehouse/issues/11096#issuecomment-2895081700">pypi/warehouse#11096</a>. Note that this is a volunteer-led effort and there is no ETA. If you need this soon, make your employer sponsor the PSF and maybe they'll be able to hire somebody for this work on Warehouse.</p> </blockquote> <p>In addition to that, <a href="https://github.com/konstin"><code>@konstin</code></a><a href="https://github.com/sponsors/konstin">💰</a> sent <a href="https://redirect.github.com/pypa/gh-action-pypi-publish/issues/378">https://github.com/facebookincubator/velox/issues/378</a> to pin <code>actions/setup-python</code> to a SHA hash. This makes <code>pypi-publish</code> compatible with new GitHub policies that allow organizations to mandate hash-pinning actions used in workflows.</p> <h2>🛠️ Internal Dependencies</h2> <p><a href="https://github.com/webknjaz"><code>@webknjaz</code></a><a href="https://github.com/sponsors/webknjaz">💰</a> made a bunch of updates to the action runtime which includes bumping it to Python 3.13 in <a href="https://redirect.github.com/pypa/gh-action-pypi-publish/issues/331">https://github.com/facebookincubator/velox/issues/331</a> and updating the dependency tree across the board. <code>pip-with-requires-python</code> is no longer being installed (<a href="https://redirect.github.com/pypa/gh-action-pypi-publish/issues/332">https://github.com/facebookincubator/velox/issues/332</a>). Some related bumps were contributed by <a href="https://github.com/woodruffw"><code>@woodruffw</code></a><a href="https://github.com/sponsors/woodruffw">💰</a> (<a href="https://redirect.github.com/pypa/gh-action-pypi-publish/issues/359">https://github.com/facebookincubator/velox/issues/359</a>) and <a href="https://github.com/kurtmckee"><code>@kurtmckee</code></a><a href="https://github.com/sponsors/kurtmckee">💰</a> sent a contributor-facing PR, bumping the linting configuration via <a href="https://redirect.github.com/pypa/gh-action-pypi-publish/issues/335">https://github.com/facebookincubator/velox/issues/335</a>.</p> <h2>💪 New Contributors</h2> <ul> <li><a href="https://github.com/kurtmckee"><code>@kurtmckee</code></a> made their first contribution in <a href="https://redirect.github.com/pypa/gh-action-pypi-publish/issues/335">https://github.com/facebookincubator/velox/issues/335</a></li> <li><a href="https://github.com/konstin"><code>@konstin</code></a> made their first contribution in <a href="https://redirect.github.com/pypa/gh-action-pypi-publish/issues/378">https://github.com/facebookincubator/velox/issues/378</a></li> </ul> <p><strong>🪞 Full Diff</strong>: <a href="https://github.com/pypa/gh-action-pypi-publish/compare/v1.12.4...v1.13.0">https://github.com/pypa/gh-action-pypi-publish/compare/v1.12.4...v1.13.0</a></p> <p><strong>🧔♂️ Release Manager:</strong> <a href="https://github.com/sponsors/webknjaz"><code>@webknjaz</code></a> <a href="https://stand-with-ukraine.pp.ua">🇺🇦</a></p> <p><strong>💬 Discuss</strong> <a href="https://bsky.app/profile/webknjaz.me/post/3lxxzvzhvfc2e">on Bluesky 🦋</a>, <a href="https://mastodon.social/webknjaz/115143522527224444">on Mastodon 🐘</a> and <a href="https://github.com/pypa/gh-action-pypi-publish/discussions/379">on GitHub</a>.</p> <p><a href="https://github.com/sponsors/webknjaz"><img src="https://img.shields.io/badge/%40webknjaz-transparent?logo=githubsponsors&logoColor=%23EA4AAA&label=Sponsor&color=2a313c" alt="GH Sponsors badge" /></a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/pypa/gh-action-pypi-publish/commit/ed0c53931b1dc9bd32cbe73a98c7f6766f8a527e"><code>ed0c539</code></a> 📦📌 Bump the pinned dependency tree</li> <li><a href="https://github.com/pypa/gh-action-pypi-publish/commit/77db1b7cf7dcea2e403bb4350516284282740dd6"><code>77db1b7</code></a> Merge branch PR <a href="https://redirect.github.com/pypa/gh-action-pypi-publish/issues/306">https://github.com/facebookincubator/velox/issues/306</a>, GHSA-vxmw-7h4f-hqxh fix and PR <a href="https://redirect.github.com/pypa/gh-action-pypi-publish/issues/378">https://github.com/facebookincubator/velox/issues/378</a> into unstable/v1</li> <li><a href="https://github.com/pypa/gh-action-pypi-publish/commit/280b3a1b7e38a360b85b4ee41645d27b79bde3fc"><code>280b3a1</code></a> Alias <code>typing as t</code> in imports</li> <li><a href="https://github.com/pypa/gh-action-pypi-publish/commit/e380240d7e3673f460e0621686f33fbbf9594e85"><code>e380240</code></a> Use <code>object</code> in place of <code>typing.Any</code> in annotations</li> <li><a href="https://github.com/pypa/gh-action-pypi-publish/commit/e50bff6eb477e46de0cbacc0693737ecb690eb0f"><code>e50bff6</code></a> Deduplicate claim ref lookup</li> <li><a href="https://github.com/pypa/gh-action-pypi-publish/commit/decbc9a5d448364aa64c211724dc79a2cefcab2a"><code>decbc9a</code></a> Hint people to subscribe to <a href="https://redirect.github.com/pypa/gh-action-pypi-publish/issues/166">https://github.com/facebookincubator/velox/issues/166</a> for notifications</li> <li><a href="https://github.com/pypa/gh-action-pypi-publish/commit/8208ad36a18e6fdd644f6ad69dc70c833d8af633"><code>8208ad3</code></a> Ask not to report bugs with reusable workflow</li> <li><a href="https://github.com/pypa/gh-action-pypi-publish/commit/ff0fef5bdb66aa250f741d5d8b00a8b78b9dffd5"><code>ff0fef5</code></a> 🧪 Scope WPS202 suppression to specific files</li> <li><a href="https://github.com/pypa/gh-action-pypi-publish/commit/1293b8c325b5f9abcab5160ee3553de2ee6a883f"><code>1293b8c</code></a> Use yamllint disable line length lint</li> <li><a href="https://github.com/pypa/gh-action-pypi-publish/commit/ed01280d14b6f9a0edaa1a5494d8f7ffed709083"><code>ed01280</code></a> Linter (different rule)</li> <li>Additional commits viewable in <a href="https://github.com/pypa/gh-action-pypi-publish/compare/76f52bc884231f62b9a034ebfe128415bbaabdfc...ed0c53931b1dc9bd32cbe73a98c7f6766f8a527e">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `dependabot rebase` will rebase this PR - `dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `dependabot merge` will merge this PR after your CI passes on it - `dependabot squash and merge` will squash and merge this PR after your CI passes on it - `dependabot cancel merge` will cancel a previously requested merge and block automerging - `dependabot reopen` will reopen this PR if it is closed - `dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Pull Request resolved: facebookincubator#14767 Reviewed By: kKPulla Differential Revision: D82466593 Pulled By: kagamiori fbshipit-source-id: 42baeea5b5d19bd6384d688e1ba9e9e2050ba3b4
Summary: The sed program usage only works on Linux and needs a fix for macOS. This is the same fix that was applied for libstemmer. Pull Request resolved: facebookincubator#14852 Reviewed By: kKPulla Differential Revision: D82466401 Pulled By: kagamiori fbshipit-source-id: 12d66f3a5b48b4df5b59d879c9022c88cdb5660f
Summary: Pull Request resolved: facebookincubator#14854 Cache row size estimates so that callers can call it multiple times without worrying too much about the cost. It is a prereq diff for having a more dynamic row size estimate in case of missing file stats. Reviewed By: tanjialiang Differential Revision: D81762324 fbshipit-source-id: a63d9c8b18634c1f7b89b0bd7f70f6a53d3722b7
…ebookincubator#14855) Summary: Pull Request resolved: facebookincubator#14855 X-link: facebookincubator/nimble#250 Original diff: D80310282 Add a framework to complement the row size estimate heuristics, based on the retained vector sizes. Currently this framework is used as a stop gap solution to still have functional row estimates when column stats are missing, and decoders couldn't provide a relatively cheap estimate. The current functionality gap in decoder row estimates caused various queries to run with super small batches (frequently just 10 rows), and vastly slowing down the downstream eval. NOTE: this diff fixes an accounting issue for arrays and maps, which was causing query OOMs. Reviewed By: tanjialiang Differential Revision: D81762328 fbshipit-source-id: 4238e5e45632323b577fc1bc58b6ebfcf9033dfc
…r#14857) Summary: Pull Request resolved: facebookincubator#14857 X-link: facebookincubator/nimble#251 Add a kill switch for row size tracking in case it has unexpected overhead for some data shapes. (Low concern IMO because the row size tracking would quickly increase the batch size and reduce its own overhead. If the end state batch size is still small, we should either way tune the batch memory budget.) The session property wire up would be added in a presto PR separately. NOTE: this diff also disables row size tracking for metalake reads by default. Reviewed By: tanjialiang Differential Revision: D81762323 fbshipit-source-id: 56b62f4d4d70f779576177f9d40d72db419c7809
…cubator#14768) Summary: Bumps [actions/github-script](https://github.com/actions/github-script) from 7.0.1 to 8.0.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/actions/github-script/releases">actions/github-script's releases</a>.</em></p> <blockquote> <h2>v8.0.0</h2> <h2>What's Changed</h2> <ul> <li>Update Node.js version support to 24.x by <a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> in <a href="https://redirect.github.com/actions/github-script/pull/637">actions/github-script#637</a></li> <li>README for updating actions/github-script from v7 to v8 by <a href="https://github.com/sneha-krip"><code>@sneha-krip</code></a> in <a href="https://redirect.github.com/actions/github-script/pull/653">actions/github-script#653</a></li> </ul> <h2>⚠️ Minimum Compatible Runner Version</h2> <p><strong>v2.327.1</strong><br /> <a href="https://github.com/actions/runner/releases/tag/v2.327.1">Release Notes</a></p> <p>Make sure your runner is updated to this version or newer to use this release.</p> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> made their first contribution in <a href="https://redirect.github.com/actions/github-script/pull/637">actions/github-script#637</a></li> <li><a href="https://github.com/sneha-krip"><code>@sneha-krip</code></a> made their first contribution in <a href="https://redirect.github.com/actions/github-script/pull/653">actions/github-script#653</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/actions/github-script/compare/v7.1.0...v8.0.0">https://github.com/actions/github-script/compare/v7.1.0...v8.0.0</a></p> <h2>v7.1.0</h2> <h2>What's Changed</h2> <ul> <li>Upgrade husky to v9 by <a href="https://github.com/benelan"><code>@benelan</code></a> in <a href="https://redirect.github.com/actions/github-script/pull/482">actions/github-script#482</a></li> <li>Add workflow file for publishing releases to immutable action package by <a href="https://github.com/Jcambass"><code>@Jcambass</code></a> in <a href="https://redirect.github.com/actions/github-script/pull/485">actions/github-script#485</a></li> <li>Upgrade IA Publish by <a href="https://github.com/Jcambass"><code>@Jcambass</code></a> in <a href="https://redirect.github.com/actions/github-script/pull/486">actions/github-script#486</a></li> <li>Fix workflow status badges by <a href="https://github.com/joshmgross"><code>@joshmgross</code></a> in <a href="https://redirect.github.com/actions/github-script/pull/497">actions/github-script#497</a></li> <li>Update usage of <code>actions/upload-artifact</code> by <a href="https://github.com/joshmgross"><code>@joshmgross</code></a> in <a href="https://redirect.github.com/actions/github-script/pull/512">actions/github-script#512</a></li> <li>Clear up package name confusion by <a href="https://github.com/joshmgross"><code>@joshmgross</code></a> in <a href="https://redirect.github.com/actions/github-script/pull/514">actions/github-script#514</a></li> <li>Update dependencies with <code>npm audit fix</code> by <a href="https://github.com/joshmgross"><code>@joshmgross</code></a> in <a href="https://redirect.github.com/actions/github-script/pull/515">actions/github-script#515</a></li> <li>Specify that the used script is JavaScript by <a href="https://github.com/timotk"><code>@timotk</code></a> in <a href="https://redirect.github.com/actions/github-script/pull/478">actions/github-script#478</a></li> <li>chore: Add Dependabot for NPM and Actions by <a href="https://github.com/nschonni"><code>@nschonni</code></a> in <a href="https://redirect.github.com/actions/github-script/pull/472">actions/github-script#472</a></li> <li>Define <code>permissions</code> in workflows and update actions by <a href="https://github.com/joshmgross"><code>@joshmgross</code></a> in <a href="https://redirect.github.com/actions/github-script/pull/531">actions/github-script#531</a></li> <li>chore: Add Dependabot for .github/actions/install-dependencies by <a href="https://github.com/nschonni"><code>@nschonni</code></a> in <a href="https://redirect.github.com/actions/github-script/pull/532">actions/github-script#532</a></li> <li>chore: Remove .vscode settings by <a href="https://github.com/nschonni"><code>@nschonni</code></a> in <a href="https://redirect.github.com/actions/github-script/pull/533">actions/github-script#533</a></li> <li>ci: Use github/setup-licensed by <a href="https://github.com/nschonni"><code>@nschonni</code></a> in <a href="https://redirect.github.com/actions/github-script/pull/473">actions/github-script#473</a></li> <li>make octokit instance available as octokit on top of github, to make it easier to seamlessly copy examples from GitHub rest api or octokit documentations by <a href="https://github.com/iamstarkov"><code>@iamstarkov</code></a> in <a href="https://redirect.github.com/actions/github-script/pull/508">actions/github-script#508</a></li> <li>Remove <code>octokit</code> README updates for v7 by <a href="https://github.com/joshmgross"><code>@joshmgross</code></a> in <a href="https://redirect.github.com/actions/github-script/pull/557">actions/github-script#557</a></li> <li>docs: add "exec" usage examples by <a href="https://github.com/neilime"><code>@neilime</code></a> in <a href="https://redirect.github.com/actions/github-script/pull/546">actions/github-script#546</a></li> <li>Bump ruby/setup-ruby from 1.213.0 to 1.222.0 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/actions/github-script/pull/563">actions/github-script#563</a></li> <li>Bump ruby/setup-ruby from 1.222.0 to 1.229.0 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/actions/github-script/pull/575">actions/github-script#575</a></li> <li>Clearly document passing inputs to the <code>script</code> by <a href="https://github.com/joshmgross"><code>@joshmgross</code></a> in <a href="https://redirect.github.com/actions/github-script/pull/603">actions/github-script#603</a></li> <li>Update README.md by <a href="https://github.com/nebuk89"><code>@nebuk89</code></a> in <a href="https://redirect.github.com/actions/github-script/pull/610">actions/github-script#610</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/benelan"><code>@benelan</code></a> made their first contribution in <a href="https://redirect.github.com/actions/github-script/pull/482">actions/github-script#482</a></li> <li><a href="https://github.com/Jcambass"><code>@Jcambass</code></a> made their first contribution in <a href="https://redirect.github.com/actions/github-script/pull/485">actions/github-script#485</a></li> <li><a href="https://github.com/timotk"><code>@timotk</code></a> made their first contribution in <a href="https://redirect.github.com/actions/github-script/pull/478">actions/github-script#478</a></li> <li><a href="https://github.com/iamstarkov"><code>@iamstarkov</code></a> made their first contribution in <a href="https://redirect.github.com/actions/github-script/pull/508">actions/github-script#508</a></li> <li><a href="https://github.com/neilime"><code>@neilime</code></a> made their first contribution in <a href="https://redirect.github.com/actions/github-script/pull/546">actions/github-script#546</a></li> <li><a href="https://github.com/nebuk89"><code>@nebuk89</code></a> made their first contribution in <a href="https://redirect.github.com/actions/github-script/pull/610">actions/github-script#610</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/actions/github-script/compare/v7...v7.1.0">https://github.com/actions/github-script/compare/v7...v7.1.0</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/actions/github-script/commit/ed597411d8f924073f98dfc5c65a23a2325f34cd"><code>ed59741</code></a> Merge pull request <a href="https://redirect.github.com/actions/github-script/issues/653">https://github.com/facebookincubator/velox/issues/653</a> from actions/sneha-krip/readme-for-v8</li> <li><a href="https://github.com/actions/github-script/commit/2dc352e4baefd91bec0d06f6ae2f1045d1687ca3"><code>2dc352e</code></a> Bold minimum Actions Runner version in README</li> <li><a href="https://github.com/actions/github-script/commit/01e118c8d0d22115597e46514b5794e7bc3d56f1"><code>01e118c</code></a> Update README for Node 24 runtime requirements</li> <li><a href="https://github.com/actions/github-script/commit/8b222ac82eda86dcad7795c9d49b839f7bf5b18b"><code>8b222ac</code></a> Apply suggestion from <a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a></li> <li><a href="https://github.com/actions/github-script/commit/adc0eeac992408a7b276994ca87edde1c8ce4d25"><code>adc0eea</code></a> README for updating actions/github-script from v7 to v8</li> <li><a href="https://github.com/actions/github-script/commit/20fe497b3fe0c7be8aae5c9df711ac716dc9c425"><code>20fe497</code></a> Merge pull request <a href="https://redirect.github.com/actions/github-script/issues/637">https://github.com/facebookincubator/velox/issues/637</a> from actions/node24</li> <li><a href="https://github.com/actions/github-script/commit/e7b7f222b11a03e8b695c4c7afba89a02ea20164"><code>e7b7f22</code></a> update licenses</li> <li><a href="https://github.com/actions/github-script/commit/2c81ba05f308415d095291e6eeffe983d822345b"><code>2c81ba0</code></a> Update Node.js version support to 24.x</li> <li><a href="https://github.com/actions/github-script/commit/f28e40c7f34bde8b3046d885e986cb6290c5673b"><code>f28e40c</code></a> Merge pull request <a href="https://redirect.github.com/actions/github-script/issues/610">https://github.com/facebookincubator/velox/issues/610</a> from actions/nebuk89-patch-1</li> <li><a href="https://github.com/actions/github-script/commit/1ae9958572fde544457e4d51aed5ea044e8936f3"><code>1ae9958</code></a> Update README.md</li> <li>Additional commits viewable in <a href="https://github.com/actions/github-script/compare/60a0d83039c74a4aee543508d2ffcb1c3799cdea...ed597411d8f924073f98dfc5c65a23a2325f34cd">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `dependabot rebase` will rebase this PR - `dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `dependabot merge` will merge this PR after your CI passes on it - `dependabot squash and merge` will squash and merge this PR after your CI passes on it - `dependabot cancel merge` will cancel a previously requested merge and block automerging - `dependabot reopen` will reopen this PR if it is closed - `dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Pull Request resolved: facebookincubator#14768 Reviewed By: kKPulla Differential Revision: D82466332 Pulled By: kagamiori fbshipit-source-id: 64f7b59c19ad54e8e1e4c3ab2e21442d7a009b4e
Summary: Pull Request resolved: facebookincubator#15038 Adds FlatMapVector support for functions subscript or element_at. This will allow FlatMapVector-encoded maps to be read without conversion and have its values extracted from. Additionally, this support will greatly increase performance as singular key values can be projected without full-map materialization. Reviewed By: pedroerp Differential Revision: D83867312 fbshipit-source-id: 5b31fe4c484516e2157487573f542177fe9a81b6
Summary: Pull Request resolved: facebookincubator#15069 ## Summary Implemented REMAP_KEYS UDF for Velox following the task requirements in T240490714. ## Function Signature ``` REMAP_KEYS(MAP(K,V), ARRAY[K], ARRAY[K]) -> MAP(K, V) ``` ## Key Features - **Key remapping**: Changes map keys based on old-to-new key mapping arrays - **Value preservation**: All values remain unchanged, only keys are remapped - **Null handling**: - Null values in maps are preserved - Null keys in arrays are ignored - **Mismatched array lengths**: Uses minimum of both array lengths for mapping - **Unmapped keys**: Keys not in oldKeys array remain unchanged - **Optimized implementations**: Three specialized implementations for different types ## Implementation Details ### Core Files Added/Modified: - `fbcode/velox/functions/prestosql/RemapKeys.h` - Main implementation with 3 optimized variants - `fbcode/velox/functions/prestosql/registration/MapFunctionsRegistration.cpp` - Function registration - `fbcode/velox/functions/prestosql/BUCK` - Build configuration - `fbcode/velox/functions/prestosql/tests/RemapKeysTest.cpp` - Comprehensive test suite - `fbcode/velox/functions/prestosql/tests/CMakeLists.txt` - Test build configuration - `fbcode/velox/expression/fuzzer/ExpressionFuzzerTest.cpp` - Fuzzer exclusion ### Implementation Variants: 1. **RemapKeysPrimitiveFunction**: Optimized for primitive types with hash map lookup 2. **RemapKeysVarcharFunction**: String-optimized with zero-copy semantics 3. **RemapKeysFunction**: Generic implementation for complex types ### Behavior Examples: ```sql SELECT remap_keys(MAP(ARRAY[1, 2, 3], ARRAY[10, 20, 30]), ARRAY[1, 3], ARRAY[100, 300]); -- MAP(ARRAY[100, 2, 300], ARRAY[10, 20, 30]) SELECT remap_keys(MAP(ARRAY['a', 'b'], ARRAY[1, 2]), ARRAY['a'], ARRAY['alpha']); -- MAP(ARRAY['alpha', 'b'], ARRAY[1, 2]) SELECT remap_keys(MAP(ARRAY[1, 2], ARRAY[10, null]), ARRAY[1], ARRAY[100]); -- MAP(ARRAY[100, 2], ARRAY[10, null]) ``` ## Testing Comprehensive test coverage including: - Basic functionality with various data types (int, float, bool, string, timestamp, complex) - Edge cases (empty maps, empty arrays, no matching keys) - Null handling (nulls in values, nulls in key arrays) - Mismatched array lengths - Duplicate old keys (last occurrence wins) - Partial and complete key remapping ## Compatibility - Follows existing Velox UDF patterns (similar to MAP_SUBSET and ARRAY_SUBSET) - Maintains backward compatibility with existing map functions - Velox-only function (excluded from Presto fuzzer tests) [Session trajectory link](https://www.internalfb.com/intern/devai/devmate/inspector/?id=T240490714-e771fd55-1a8a-411e-acab-2cd2313a8296) Reviewed By: zacw7 Differential Revision: D83999440 fbshipit-source-id: c52f6a7d6b95a95a368bbe3233b5ac11be3407ae
…ncubator#15087) Summary: Remove velox_cudf_hive_config library as it is not required. cudf::cudf has to be a public dependency on velox_cudf_hive_connector due to the build error below. Remove other unrelated dependencies. Add a log message for the constructor. ``` /deepak/presto/presto-native-execution/velox/velox/experimental/cudf/connectors/hive/CudfHiveConfig.h:21:10: fatal error: cudf/types.hpp: No such file or directory ``` Pull Request resolved: facebookincubator#15087 Reviewed By: xiaoxmeng Differential Revision: D84640709 Pulled By: kKPulla fbshipit-source-id: 03d7b4ab46f0ce74448db79e2c6b9a7643462f05
…bookincubator#15182) Summary: Pull Request resolved: facebookincubator#15182 `-Wunused-exception-parameter` has identified an unused exception parameter. This diff removes it. This: ``` try { ... } catch (exception& e) { // no use of e } ``` should instead be written as ``` } catch (exception&) { ``` If the code compiles, this is safe to land. Differential Revision: D84732654 fbshipit-source-id: cb155bb70bb568d7d34d07134ed280ce10088178
Summary: Implement lock free updates. For queries that need to skip a lot of not relevant (e.g.: 20251013_203818_00022_jsvmi) stripes accessing cache can become a bottleneck: https://fburl.com/strobelight/vazgb08u Reviewed By: Yuhta Differential Revision: D84661649 fbshipit-source-id: a6ed6383f1255d0cd5f79248a652455b1a08d569
Summary: Pull Request resolved: facebookincubator#15181 Persistent shuffle might need to extend serialized page to include persistent shuffle specific data structure Reviewed By: tanjialiang Differential Revision: D84527691 fbshipit-source-id: 07832685ea15b9250a655b275ebf382d55c8e3e2
…kincubator#15171) Summary: Pull Request resolved: facebookincubator#15171 These are clearer than `node->sources()[0]` or `[1]`. Reviewed By: kgpai Differential Revision: D84710426 fbshipit-source-id: 753ce26d991d53029e4dabb0700ae90d054c5b5d
Summary: Pull Request resolved: facebookincubator#15174 Add cosco shuffle write trace and replay support. Reviewed By: xiaoxmeng Differential Revision: D84580118 fbshipit-source-id: 23ec3500d0a03fc55b203eb313049f74ad325799
…acebookincubator#15105) Summary: Pull Request resolved: facebookincubator#15105 X-link: facebookincubator/nimble#276 Refactored read operation parameters to use a `FileStorageContext` struct instead of separate `ioStats` and `fileReadOps` parameters. This addresses the problem where adding new parameters to read functions requires updating all implementations across the codebase. Reviewed By: sdruzkin, Yuhta Differential Revision: D84112628 fbshipit-source-id: 6eea1c6698850c67613d6e855daac6fb0e91b504
…15158) Summary: The 'cache_load_quantum' gflag is not used anywhere, so we should remove its declaration. Pull Request resolved: facebookincubator#15158 Reviewed By: Yuhta Differential Revision: D84640818 Pulled By: kKPulla fbshipit-source-id: 038e454edd26986f874b78d99ed72e4f5bb34d21
…cubator#14491) Summary: Fixes facebookincubator#14492, facebookincubator#14021. Currently `BaseVector::flattenVector` doesn't unwrap lazy vectors. This patch makes it unwrap the lazy vectors. It should also fix a bunch of vulnerabilities in the code base. For example code like: https://github.com/facebookincubator/velox/blob/42193a8015081187e06ed4e8ed77b2bb1002a236/velox/expression/FieldReference.cpp#L176-L179 could crash the program with a lazy input. Some historical issues that relate to this topic: facebookincubator#6168 facebookincubator#6170 facebookincubator#8697 facebookincubator#9282 Pull Request resolved: facebookincubator#14491 Reviewed By: kKPulla Differential Revision: D84733842 Pulled By: pedroerp fbshipit-source-id: a48cd6d9e2ba3ed96a0829b21b9c4c9a92767377
… construction (facebookincubator#15168) Summary: X-link: prestodb/presto#26094 `ExchangeQueue::promises_` is a `folly::F14FastMap<int, ContinuePromise>`. Using the subscript operator: ``` promises_[consumerId] = std::move(promise); ``` 1. Default-construct a empty `ContinuePromise` when inserting a new 'consumerId' key. 2. However, this temporary promise is then immediately overwritten by the move-assignment. However, for an empty `folly::Promise` object that is expected to be overwritten, we should use `folly::makeEmpty()` to initialize it (it is 'invalid') instead of the default constructor (it will be 'valid' but 'not fulfilled', assigning to it will cause an exception. Creating an exception triggers a stack unwind, which can saturate the CPU in high-concurrency scenarios, **causing significant performance issues**. In high-concurrency scenarios, I can see a lot of CPU consumption here, for the reason mentioned above. This patch replaces the subscript-based insertion with: ``` promises_.emplace(consumerId, std::move(promise)); ``` which constructs the `ContinuePromise` in place and avoids creating and overwrite a temporary empty promise. This completely eliminates the redundant expensive `folly::Promise` stack backtrace, and thus, saves a lot of CPU. Pull Request resolved: facebookincubator#15168 Reviewed By: tanjialiang Differential Revision: D84775483 Pulled By: pedroerp fbshipit-source-id: cc2f26291a16f72745f4bf7116bebd144acb8841
…incubator#15186) Summary: Pull Request resolved: facebookincubator#15186 Adding default implementation for the virtual methods in BaseStatsReporter. The purpose here is two-fold: * Minimize boilerplate code for specializations, so they don't need to provide an empty body for each virtual method on the API * Reduce the amount of symbols leaked to downstream dependencies of this API. Also removing unnecessary string allocation in `statTypeString()` Reviewed By: tanjialiang Differential Revision: D84781701 fbshipit-source-id: 98d663029a7de9d18683a43164a5b3e367fc0546
Summary: Pull Request resolved: facebookincubator#15058 - dded new `call` method handling `arg_type<Time>` input - TIME values stored as milliseconds since midnight (0-86399999 range) - Extracts seconds using: `(time / kMillisecondsInSecond) % 60` - Includes input validation for valid TIME range Reviewed By: duxiao1212 Differential Revision: D83989772 fbshipit-source-id: 3abae3cf2f2c4af6eb6adff4774b1402d0c10318
) Summary: Pull Request resolved: facebookincubator#15184 - added `isConstantEncoding()` check before allocation - for constant input: return `ConstantVector<Timestamp>` with single converted value (O(1) memory) - for flat input: fallback to element-wise copy (O(n) memory) - handles both null and non-null constant cases Reviewed By: duxiao1212 Differential Revision: D84774507 fbshipit-source-id: 80e668fd23ea445475c3b1e83a6caf9ab86ee67c
Summary: Pull Request resolved: facebookincubator#15079 - added a new `call` method in `HourFunction` that accepts `arg_type<Time>` parameter - TIME values are stored as milliseconds since midnight (0-86399999) - extraction logic: `result = time / kMillisInHour` where `kMillisInHour = 3600000` - includes input validation to ensure TIME values are within valid range [0, 86400000) Reviewed By: kgpai Differential Revision: D84084889 fbshipit-source-id: a0fa49e880bacb20d93685ee08c68d8ba6c04ddd
…acebookincubator#15185) Summary: Pull Request resolved: facebookincubator#15185 Removing backward compatibility code from remote function refactor, now that we ensured all users of that code are updated. Reviewed By: sebastianopeluso Differential Revision: D84776510 fbshipit-source-id: c23a7f06d80edc378b8dd40203a68a488de0b7f2
…#15155) Summary: Pull Request resolved: facebookincubator#15155 Log queryId, schema, user, source, and table for open file requests. Code flow: - In `SplitReader`, we send fileReadOps as part of `FileProperties` when generating `FileHandle` - In `FileHandle`, we extract and put fileReadOps into `FileOptions` when calling openFileForRead in `WSFileSystem` - In `WSFileSystem`, when we create `WSReadFile`, we pass fileReadOps as a parameter and populate `fileCreateOptions.commonOptions.requestOptions`, which are the user tags we send for open file requests. There are still some requests missing tags, specifically for SpillReadFile path, stacktrace: P1992651580. Place where we would add FileOptions: https://www.internalfb.com/code/fbsource/[13af4aa6be187a240cc71edbcbc873aa03e2ba8c]/fbcode/velox/serializers/SerializedPageFile.cpp?lines=193, but this is more tricky as it is not the normal flow where we have access to query info. Reviewed By: Yuhta Differential Revision: D84553856 fbshipit-source-id: a24d1b63411ad7d34ac099aba2331fa7b2fd5181
…w partitions (facebookincubator#14585) Summary: Extend Window operator to read spilled data in batches of window partitions to improve performance in the presence of small partitions. A new configuration setting window_spill_min_read_batch_rows with default value of 1'000 controls the minimum number of rows for a reading batch. Setting window_spill_min_read_batch_rows to 1 loads a single partition rather than a partition batch each time. The preferred semantic would be to set a memory budget and load as many partitions that fit. This is not feasible at the moment because (1) estimating a single row's size is not efficient or accurate enough and might cause performance issues for variable-width data; (2) spilled data format doesn't include information about how many rows are present in any given window partition. Fixes facebookincubator#14469 Pull Request resolved: facebookincubator#14585 Reviewed By: kevinwilfong Differential Revision: D84822583 Pulled By: pedroerp fbshipit-source-id: f149c7c5cf32f21999fd3104d70884ee79ae0e84
…#15194) Summary: Pull Request resolved: facebookincubator#15194 Reviewed By: tanjialiang Differential Revision: D84853884 fbshipit-source-id: e706aa52c5f89c8aecc23295525524119c9b7986
…tor#15193) Summary: Pull Request resolved: facebookincubator#15193 Avoiding copying the vector of names on ROW type construction, if the API client moves it in. Reviewed By: bikramSingh91 Differential Revision: D84852651 fbshipit-source-id: 61564f4ffe9d8d24e96b00ba80594b9d31e33db7
Summary: Pull Request resolved: facebookincubator#14390 # LocalRunnerService Overview --------------------------- **LocalRunnerService** is a Thrift service that enables remote execution of Velox query plans, primarily designed for fuzzing and regression-testing to identify behavior mismatches between new changes in diffs and prior builds. **Module Interactions:** * **Thrift Layer** (`if/LocalRunnerService.thrift`): * Defines an interface using `execute()` as a primary driver * Provides comprehensive typing congruent with Velox data types (primitives, arrays, maps, rows) * Handles request/response serialization with structured result batches * **Service Handler** (`LocalRunnerService.cpp`): * Deserializes JSON-encoded query plans into Velox `PlanNode` objects * Executes plans using `AssertQueryBuilder` (from Velox test utilities) * Converts Velox vector results into Thrift format (deefined above) through recursive type conversion * Captures stdout and exceptions during execution for debugging and execution comparisons * Returns structured results with column names, types, and data in columnar format * **Service Runner** (`LocalRunnerServiceRunner.cpp`): * Bootstraps the Thrift server on a configurable port (default 9091) * Initializes Velox subsystems (memory manager, serialization, function registrations) * Runs as a standalone server process waiting for query execution requests **Data Flow:** Client → Serialize Plan → Thrift Request → Deserialize Plan → Execute Query → Convert Results to Thrift → Response → Client Reviewed By: kagamiori Differential Revision: D79850066 fbshipit-source-id: a1b1904488a134140d1ec001c726a6420131ca7f
Summary: Pull Request resolved: facebookincubator#14967 Add support for TIME type to the millisecond() function in Velox to enable extracting milliseconds from TIME values. **Changes:** - Added new `call` method in `MillisecondFunction` that handles TIME type - TIME values are stored as milliseconds since midnight, so extraction uses modulo operation to get milliseconds within the current second - Registered the new function signature `MillisecondFunction<int64_t, Time>` - Added comprehensive tests covering various TIME values including edge cases **Planned:** - support for TIME WITH TIMEZONE Reviewed By: kgpai Differential Revision: D83291159 fbshipit-source-id: ce4ffda39df3c781a4b1bc47cb8360236b827d1e
…kincubator#15111) Summary: Pull Request resolved: facebookincubator#15111 Due to extra integration uncertainty in the metalake path and some previous QB signals, we decide to separately control row size tracking for metalake with the session property. Instead of adding an additional session property, we decide to extend the current one. However, due to changing the session property type breaking backward compatibility, we still ended up introducing a new query config. We will delete the deprecated bool session property as the 3rd diff in the stack. Reviewed By: Yuhta Differential Revision: D84228406 fbshipit-source-id: d4720ced4dc47d140d1d835a708f8f54d1b4a224
Summary: Pull Request resolved: facebookincubator#15081 - added `diffTime()` function that calculates differences between TIME values - simple millisecond arithmetic with unit conversion (TIME = ms since midnight) - supports millisecond, second, minute, hour (rejects date-related units) - `getTimeUnit()` for TIME-specific unit validation - `initialize()` and `call()` method overloads for `<Time, Time>` parameters - only allows time-related units, rejects day/month/year Reviewed By: kgpai Differential Revision: D84103016 fbshipit-source-id: 7a090e6dbe1d41882f9a3665c53831d283fa4d74
…tor#14956) Summary: facebookincubator#6395 fixes a deadlock caused by allocating memory in driver creation, so we should not initialize operator in DriverAdapter. Removed the driver_.initializeOperators(). Expose the filer node from FilterProject operator because when both project and filter exists, we can only get the project node id from op->planNodeId, then we cannot construct CudfFilterProject operator. Move the CudfFilterProject initialization to function initialize(). Further more, if Cudf ExpressionEvaluator can get information from ITypedExpr, we can even remove compileExpression. The cudf tests are broken, the test failed with or without this PR, other tests passed/ ``` [ RUN ] OrderByTest.singleKey 2: /opt/velox/velox/exec/tests/utils/QueryAssertions.cpp:1285: Failure 2: Failed 2: Expected keys: 999, actual: null 2: Note: DuckDB only supports timestamps of millisecond precision. If this test involves timestamp inputs, please make sure you use the right precision. 2: DuckDB query: SELECT * FROM tmp WHERE c0 % 2 >= 0 ORDER BY c0 DESC NULLS FIRST ``` Resolves: facebookincubator#14943 Pull Request resolved: facebookincubator#14956 Reviewed By: mbasmanova Differential Revision: D84876948 Pulled By: pedroerp fbshipit-source-id: 1a955737d81ff50768bc281be98d5805899aad14
…ebookincubator#15195) Summary: Pull Request resolved: facebookincubator#15195 There are a few identical methods to quickly compose a serde option across the codebase. Centralize them for consistency and maintainability. Reviewed By: xiaoxmeng Differential Revision: D84853981 fbshipit-source-id: 8da7086d4ae84509c846a708df263c5bf2837955
…4787) Summary: Pull Request resolved: facebookincubator#14787 Add P4HyperLogLog cast from/to HyperLogLog There is no cast from/to varbinary supported on Java, it has not been added here. https://www.internalfb.com/code/fbsource/fbcode/github/presto-trunk/presto-main-base/src/main/java/com/facebook/presto/type/HyperLogLogOperators.java?lines=24%2C26 Reviewed By: kagamiori Differential Revision: D81630497 fbshipit-source-id: 768579ccdf6d874184b2307221d73a92a5855748
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
A copy of facebookincubator#14766