Skip to content

Conversation

@PingLiuPing
Copy link
Contributor

Description

This is a follow up PR of #26237 (comment)

This is a straightforward refactor.

Motivation and Context

Impact

Test Plan

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.
  • If adding new dependencies, verified they have an OpenSSF Scorecard score of 5.0 or higher (or obtained explicit TSC approval for lower scores).

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== NO RELEASE NOTE ==

@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Oct 21, 2025
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Oct 21, 2025

Reviewer's Guide

The PR refactors the Hive connector by moving all Hive-specific conversion logic and class implementation out of the generic PrestoToVeloxConnector into a dedicated HivePrestoToVeloxConnector module and consolidates shared helper functions into a utils file, updating build and includes accordingly.

Class diagram for refactored HivePrestoToVeloxConnector

classDiagram
    class PrestoToVeloxConnector {
        <<abstract>>
        +connectorName_: string
        +~PrestoToVeloxConnector()
        +toVeloxSplit(...)
        +toVeloxColumnHandle(...)
        +toVeloxTableHandle(...)
        +toVeloxInsertTableHandle(...)
        +createVeloxPartitionFunctionSpec(...)
        +createConnectorProtocol()
    }
    class HivePrestoToVeloxConnector {
        +HivePrestoToVeloxConnector(connectorName: string)
        +toVeloxSplit(...)
        +toVeloxColumnHandle(...)
        +toVeloxTableHandle(...)
        +toVeloxInsertTableHandle(...)
        +createVeloxPartitionFunctionSpec(...)
        +createConnectorProtocol()
        -toHiveColumns(...)
    }
    PrestoToVeloxConnector <|-- HivePrestoToVeloxConnector

    class TpchPrestoToVeloxConnector {
        +TpchPrestoToVeloxConnector(connectorName: string)
        +toVeloxSplit(...)
        +toVeloxColumnHandle(...)
        +toVeloxTableHandle(...)
        +toVeloxInsertTableHandle(...)
        +createVeloxPartitionFunctionSpec(...)
        +createConnectorProtocol()
    }
    PrestoToVeloxConnector <|-- TpchPrestoToVeloxConnector
Loading

Class diagram for helper function relocation

classDiagram
    class PrestoToVeloxConnectorUtils {
        +toRequiredSubfields(...)
        +toFileCompressionKind(...)
        +toVeloxFileFormat(...)
        +toJsonString(...)
        +toFilter(...)
    }
    class HivePrestoToVeloxConnector {
        +toVeloxSplit(...)
        +toVeloxColumnHandle(...)
        +toVeloxTableHandle(...)
        +toVeloxInsertTableHandle(...)
        +createVeloxPartitionFunctionSpec(...)
        +createConnectorProtocol()
        -toHiveColumns(...)
    }
    HivePrestoToVeloxConnector ..> PrestoToVeloxConnectorUtils : uses
Loading

File-Level Changes

Change Details Files
Extract Hive connector into standalone module
  • Introduced HivePrestoToVeloxConnector.h/.cpp with full HivePrestoToVeloxConnector implementation
  • Removed Hive-specific methods and logic from PrestoToVeloxConnector.cpp/.h
  • Updated PrestoToVeloxConnector to focus on generic connector API
PrestoToVeloxConnector.cpp
PrestoToVeloxConnector.h
HivePrestoToVeloxConnector.cpp
HivePrestoToVeloxConnector.h
Relocate shared helper functions to utils
  • Created PrestoToVeloxConnectorUtils.h/.cpp housing toRequiredSubfields, toFileCompressionKind, toVeloxFileFormat, toJsonString
  • Removed duplicate utility implementations from other source files
  • Updated utils header with proper includes for protocol and Velox DWIO
PrestoToVeloxConnectorUtils.cpp
PrestoToVeloxConnectorUtils.h
Update build and include references
  • Added HivePrestoToVeloxConnector.cpp to CMakeLists.txt
  • Registered new Hive connector in Registration.cpp
  • Replaced PrestoToVeloxConnector.h includes with HivePrestoToVeloxConnector.h in tests and Iceberg connector
CMakeLists.txt
Registration.cpp
IcebergPrestoToVeloxConnector.cpp
TaskInfoTest.cpp
PlanConverterTest.cpp
PrestoToVeloxConnectorTest.cpp
PrestoToVeloxSplitTest.cpp
TaskManagerTest.cpp
TaskUpdateRequestTest.cpp

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@PingLiuPing PingLiuPing force-pushed the lp_refactor_hive_connector branch from e1edc89 to ea11d81 Compare October 21, 2025 14:35
@PingLiuPing PingLiuPing marked this pull request as ready for review October 21, 2025 14:35
@PingLiuPing PingLiuPing requested review from a team as code owners October 21, 2025 14:35
@prestodb-ci prestodb-ci requested review from a team, jkhaliqi and libianoss and removed request for a team October 21, 2025 14:35
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and they look great!


Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@PingLiuPing
Copy link
Contributor Author

@aditi-pandit This is a followup PR based on your review comments. Can you help to have a look? Thanks.

@PingLiuPing PingLiuPing force-pushed the lp_refactor_hive_connector branch from ea11d81 to 926a424 Compare October 22, 2025 08:53
Copy link
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @PingLiuPing. Code looks good minus nit.

@PingLiuPing PingLiuPing force-pushed the lp_refactor_hive_connector branch from 926a424 to db17fc6 Compare October 22, 2025 21:05
Copy link
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @PingLiuPing

@aditi-pandit aditi-pandit merged commit 84765b6 into prestodb:master Oct 23, 2025
80 of 81 checks passed
@PingLiuPing PingLiuPing deleted the lp_refactor_hive_connector branch October 23, 2025 08:39
velox::connector::hive::HiveColumnHandle::ColumnType toHiveColumnType(
protocol::hive::ColumnType type);

std::unique_ptr<velox::connector::ConnectorTableHandle> toHiveTableHandle(
Copy link
Contributor

@amitkdutta amitkdutta Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PingLiuPing We have a dependency on this function interanlly at Meta but moving it to annonymous namespace broke the internal build :(. Probably better to keep it in header, otherwise we need to copy paste the function. Since the function is in a header, its an open dependency. Probably we keep this in header so that downstream does not need to copy it.

hantangwangd added a commit that referenced this pull request Oct 26, 2025
…26420)

## Description

We have separate `HivePrestoToVeloxConnector` to a standalone file in PR
#26380. This PR makes some further refactor to let
`IcebergPrestoToVeloxConnector` get rid of relying on
`HivePrestoToVeloxConnector`.

`toVeloxFileFormat(const presto::protocol::hive::StorageFormat& format)`
is a Hive connector specific function, thus move it into
`HivePrestoToVeloxConnector`; while
`toHiveColumnType(protocol::hive::ColumnType type)` is a common function
used by Hive and Iceberg, thus move it into
`PrestoToVeloxConnectorUtils`. This eliminates the need for
`IcebergPrestoToVeloxConnector` to rely on the
`HivePrestoToVeloxConnector`.

## Motivation and Context

- Decouple `IcebergPrestoToVeloxConnector` from
`HivePrestoToVeloxConnector`

## Impact

N/A

## Test Plan

N/A

## Contributor checklist

- [ ] Please make sure your submission complies with our [contributing
guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md),
in particular [code
style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style)
and [commit
standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards).
- [ ] PR description addresses the issue accurately and concisely. If
the change is non-trivial, a GitHub Issue is referenced.
- [ ] Documented new properties (with its default value), SQL syntax,
functions, or other functionality.
- [ ] If release notes are required, they follow the [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines).
- [ ] Adequate tests were added if applicable.
- [ ] CI passed.
- [ ] If adding new dependencies, verified they have an [OpenSSF
Scorecard](https://securityscorecards.dev/#the-checks) score of 5.0 or
higher (or obtained explicit TSC approval for lower scores).

## Release Notes

```
== NO RELEASE NOTE ==
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants