Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Align Spark PPL Data Type with OpenSearch PPL Data Type #1057

Open
penghuo opened this issue Feb 17, 2025 · 1 comment
Open

[FEATURE] Align Spark PPL Data Type with OpenSearch PPL Data Type #1057

penghuo opened this issue Feb 17, 2025 · 1 comment
Labels
enhancement New feature or request

Comments

@penghuo
Copy link
Collaborator

penghuo commented Feb 17, 2025

Is your feature request related to a problem?

  • create table in spark
CREATE TABLE numeric_types_table (
  tinyint_col    TINYINT, 
  smallint_col   SMALLINT, 
  int_col        INT,       
  bigint_col     BIGINT,     
  float_col      FLOAT,       
  double_col     DOUBLE,      
  decimal_col    DECIMAL(10,2)
)
>>> describe numeric_types_table;
tinyint_col             tinyint
smallint_col            smallint
int_col                 int
bigint_col              bigint
float_col               float
double_col              double
decimal_col             decimal(10,2)

What solution would you like?
[RFC] Unified PPL Data Type

@penghuo penghuo added enhancement New feature or request untriaged and removed untriaged labels Feb 17, 2025
@penghuo penghuo removed the untriaged label Feb 17, 2025
@penghuo penghuo changed the title [FEATURE] Spark PPL does not aligned with OpenSearch PPL Data Type [FEATURE] Align Spark PPL Data Type with OpenSearch PPL Data Type Feb 17, 2025
@LantaoJin
Copy link
Member

LantaoJin commented Feb 18, 2025

My first thought was that the type system should be engine-related rather than language-related. Because we can't predict which engines PPL will be used with in future. This means we'd have to match all PPL types with these execution engines if we want to do alignment work. As we known the execution engines are concrete and designed for specific scenarios, while language definitions are usually more generalized. However, looking at it from another perspective, it is necessary to define types for a language. Even ANSI SQL has its predefined types (although these types generally need to be implemented or mapped in various SQL execution engines).

But this is challenge work to align all engine data type with PPL data type. The current OpenSearch PPL data type IMO, it looks more like OpenSearch data type. Fortunately, Spark supports relatively few types. It's almost a subset of OpenSearch's types. For example, we are adding more OpenSearch type to Spark: #1044

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants