Skip to content

[FEATURE REQUEST]: Spark 3.0 Readiness #633

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
47 of 61 tasks
suhsteve opened this issue Aug 19, 2020 · 9 comments
Closed
47 of 61 tasks

[FEATURE REQUEST]: Spark 3.0 Readiness #633

suhsteve opened this issue Aug 19, 2020 · 9 comments
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Milestone

Comments

@suhsteve
Copy link
Member

suhsteve commented Aug 19, 2020

APIs

SparkSession

DataFrame

DataFrameStatFunctions

DataFrameWriterV2

RelationalGroupedDataset

  • scala def as[K: Encoder, T: Encoder]: KeyValueGroupedDataset[K, T]

Functions

  • csharp public static Column XXHash64(params Column[] columns) (Spark 3.0 readiness part 2 #649)
  • csharp public static Column Split(Column column, string pattern, int limit) (Spark 3.0 readiness part 2 #649)
  • csharp public static Column Overlay(Column src, Column replace, Column pos, Column len) (Spark 3.0 readiness part 2 #649)
  • csharp public static Column Overlay(Column src, Column replace, Column pos) (Spark 3.0 readiness part 2 #649)
  • csharp public static Column AddMonths(Column startDate, Column numMonths) (Spark 3.0 readiness part 2 #649)
  • csharp public static Column DateAdd(Column start, Column days) (Spark 3.0 readiness part 2 #649)
  • csharp public static Column DateSub(Column start, Column days) (Spark 3.0 readiness part 2 #649)
  • scala def transform(column: Column, f: Column => Column): Column Unsupported: passing function as parameter
  • scala def transform(column: Column, f: (Column, Column) => Column): Column Unsupported: passing function as parameter
  • scala def exists(column: Column, f: Column => Column): Column Unsupported: passing function as parameter
  • scala def forall(column: Column, f: Column => Column): Column Unsupported: passing function as parameter
  • scala def filter(column: Column, f: Column => Column): Column Unsupported: passing function as parameter
  • scala def filter(column: Column, f: (Column, Column) => Column): Column Unsupported: passing function as parameter
  • scala def aggregate(expr: Column, initialValue: Column, merge: (Column, Column) => Column, finish: Column => Column): Column Unsupported: passing function as parameter
  • scala def aggregate(expr: Column, initialValue: Column, merge: (Column, Column) => Column): Column Unsupported: passing function as parameter
  • scala def zip_with(left: Column, right: Column, f: (Column, Column) => Column): Column Unsupported: passing function as parameter
  • scala def transform_keys(expr: Column, f: (Column, Column) => Column): Column Unsupported: passing function as parameter
  • scala def transform_values(expr: Column, f: (Column, Column) => Column): Column Unsupported: passing function as parameter
  • scala def map_filter(expr: Column, f: (Column, Column) => Column): Column Unsupported: passing function as parameter
  • scala def map_zip_with(left: Column, right: Column, f: (Column, Column, Column) => Column): Column Unsupported: passing function as parameter
  • csharp public static Column SchemaOfJson(Column json, Dictionary<string, string> options) (Spark 3.0 readiness part 2 #649)
  • csharp public static Column MapEntries(Column column) (Spark 3.0 readiness part 2 #649)
  • csharp public static Column FromCsv(Column column, StructType schema, Dictionary<string, string> options) (Spark 3.0 readiness part 2 #649)
  • csharp public static Column FromCsv(Column column, Column schema, Dictionary<string, string> options) (Spark 3.0 readiness part 2 #649)
  • csharp public static Column SchemaOfCsv(string csv) (Spark 3.0 readiness part 2 #649)
  • csharp public static Column SchemaOfCsv(Column csv) (Spark 3.0 readiness part 2 #649)
  • csharp public static Column SchemaOfCsv(Column csv, Dictionary<string, string> options) (Spark 3.0 readiness part 2 #649)
  • csharp public static Column ToCsv(Column column, Dictionary<string, string> options) (Spark 3.0 readiness part 2 #649)
  • csharp public static Column ToCsv(Column column) (Spark 3.0 readiness part 2 #649)
  • csharp public static Column Years(Column column) (Spark 3.0 readiness part 2 #649)
  • csharp public static Column Months(Column column) (Spark 3.0 readiness part 2 #649)
  • csharp public static Column Days(Column column) (Spark 3.0 readiness part 2 #649)
  • csharp public static Column Hours(Column column) (Spark 3.0 readiness part 2 #649)
  • csharp public static Column Bucket(Column numBuckets, Column column) (Spark 3.0 readiness part 2 #649)
  • csharp public static Column Bucket(int numBuckets, Column column) (Spark 3.0 readiness part 2 #649)
@suhsteve suhsteve added the enhancement New feature or request label Aug 19, 2020
@rrekapalli
Copy link

rrekapalli commented Aug 19, 2020

Is DataSet API & Full MLLib Implementation also going to be part of this version?

@suhsteve suhsteve added good first issue Good for newcomers help wanted Extra attention is needed labels Aug 19, 2020
@suhsteve suhsteve added this to the 1.0.0 milestone Aug 19, 2020
@imback82
Copy link
Contributor

@suhsteve Thanks for compiling this. Can you also compile the list from Scala as well?

@imback82
Copy link
Contributor

@rrekapalli Dataset is not supported due to this: #103 (comment)

We will not have a full MLLib as a part of 1.0.0, but we will track it separately in #381.

@rapoth rapoth pinned this issue Aug 19, 2020
@rapoth
Copy link
Contributor

rapoth commented Aug 19, 2020

CC: @GoEddie on the MLLib question in case he has more to add.

@GoEddie
Copy link
Contributor

GoEddie commented Aug 19, 2020

@rrekapalli I have been implementing the newer ML API (some internal classes are still MLLib), if you would like to contribute I am happy to help you with any pr’s.

#381 is for ML.Features and then there is more as well after features.

@suhsteve
Copy link
Member Author

@imback82 added the list of scala apis as well.

@rrekapalli
Copy link

@GoEddie , I am new to Spark and just trying out this library for one of my use case. However, I can definitely give it a try from September 2nd week.

@Niharikadutta
Copy link
Collaborator

I'm thinking of dividing this into 3 PRs:

  1. For SparkSession, DataFrame and DataFrameStatFunctions APIs
  2. For DataFrameWriterV2
  3. For Functions APIs

@Niharikadutta
Copy link
Collaborator

All features in this issue have been merged, this can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

6 participants