Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-37842: [R] Implement infer_schema.data.frame() #37843

Merged
merged 5 commits into from
Sep 28, 2023

Conversation

thisisnic
Copy link
Member

@thisisnic thisisnic commented Sep 23, 2023

Rationale for this change

Users will be able to easily see the schema which their data.frame object will have when it's converted into an Arrwo table.

What changes are included in this PR?

Implements infer_schema() method for data.frame objects.

Before:

library(arrow)
schema(mtcars)
#> Error in UseMethod("infer_schema"): no applicable method for 'infer_schema' applied to an object of class "data.frame"

After:

library(arrow)
schema(mtcars)
#> Schema
#> mpg: double
#> cyl: double
#> disp: double
#> hp: double
#> drat: double
#> wt: double
#> qsec: double
#> vs: double
#> am: double
#> gear: double
#> carb: double
#> 
#> See $metadata for additional Schema metadata

Are these changes tested?

Yes

Are there any user-facing changes?

Yes

r/R/schema.R Outdated
@@ -285,6 +285,9 @@ infer_schema.Dataset <- function(x) x$schema
#' @export
infer_schema.arrow_dplyr_query <- function(x) implicit_schema(x)

#' @export
infer_schema.data.frame <- function(x) infer_schema(arrow_table(x))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arrow_table(some_data_frame) is potentially a very expensive operation (possibly requiring a full copy of the data if, for example, they are all string columns).

Another option is infer_schema(arrow_table(x[integer(), , drop = FALSE])), although this won't quite work for things like list() columns where we actually need some values to properly infer the type.

Unless I'm missing some prior art in the package, I think schema(!!! lapply(x, infer_type)) might be the way to go.

r/tests/testthat/test-schema.R Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Sep 27, 2023
@github-actions github-actions bot added awaiting change review Awaiting change review awaiting changes Awaiting changes and removed awaiting changes Awaiting changes awaiting change review Awaiting change review labels Sep 27, 2023
@@ -300,6 +300,9 @@ test_that("schema extraction", {
expect_equal(schema(example_data), tbl$schema)
expect_equal(schema(tbl), tbl$schema)

expect_equal(schema(data.frame(a = 1, a = "x")), schema(a = double(), a.1 = string()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
expect_equal(schema(data.frame(a = 1, a = "x")), schema(a = double(), a.1 = string()))
expect_equal(schema(data.frame(a = 1, a = "x", check.names = FALSE)), schema(a = double(), a = string()))

(This might error and that's OK too, can just be expect_error())

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good shout, that actually captures the true nature of what you're wanting to test there!

@github-actions github-actions bot added awaiting change review Awaiting change review awaiting changes Awaiting changes and removed awaiting changes Awaiting changes awaiting change review Awaiting change review labels Sep 27, 2023
@thisisnic thisisnic merged commit 79abb73 into apache:main Sep 28, 2023
11 checks passed
@thisisnic thisisnic removed the awaiting changes Awaiting changes label Sep 28, 2023
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 79abb73.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

JerAguilon pushed a commit to JerAguilon/arrow that referenced this pull request Oct 23, 2023
### Rationale for this change

Users will be able to easily see the schema which their `data.frame` object will have when it's converted into an Arrwo table.

### What changes are included in this PR?

Implements `infer_schema()` method for `data.frame` objects.

Before:

``` r
library(arrow)
schema(mtcars)
#> Error in UseMethod("infer_schema"): no applicable method for 'infer_schema' applied to an object of class "data.frame"
```
After:

``` r
library(arrow)
schema(mtcars)
#> Schema
#> mpg: double
#> cyl: double
#> disp: double
#> hp: double
#> drat: double
#> wt: double
#> qsec: double
#> vs: double
#> am: double
#> gear: double
#> carb: double
#> 
#> See $metadata for additional Schema metadata
```

### Are these changes tested?

Yes

### Are there any user-facing changes?

Yes
* Closes: apache#37842

Authored-by: Nic Crane <[email protected]>
Signed-off-by: Nic Crane <[email protected]>
JerAguilon pushed a commit to JerAguilon/arrow that referenced this pull request Oct 23, 2023
### Rationale for this change

Users will be able to easily see the schema which their `data.frame` object will have when it's converted into an Arrwo table.

### What changes are included in this PR?

Implements `infer_schema()` method for `data.frame` objects.

Before:

``` r
library(arrow)
schema(mtcars)
#> Error in UseMethod("infer_schema"): no applicable method for 'infer_schema' applied to an object of class "data.frame"
```
After:

``` r
library(arrow)
schema(mtcars)
#> Schema
#> mpg: double
#> cyl: double
#> disp: double
#> hp: double
#> drat: double
#> wt: double
#> qsec: double
#> vs: double
#> am: double
#> gear: double
#> carb: double
#> 
#> See $metadata for additional Schema metadata
```

### Are these changes tested?

Yes

### Are there any user-facing changes?

Yes
* Closes: apache#37842

Authored-by: Nic Crane <[email protected]>
Signed-off-by: Nic Crane <[email protected]>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
### Rationale for this change

Users will be able to easily see the schema which their `data.frame` object will have when it's converted into an Arrwo table.

### What changes are included in this PR?

Implements `infer_schema()` method for `data.frame` objects.

Before:

``` r
library(arrow)
schema(mtcars)
#> Error in UseMethod("infer_schema"): no applicable method for 'infer_schema' applied to an object of class "data.frame"
```
After:

``` r
library(arrow)
schema(mtcars)
#> Schema
#> mpg: double
#> cyl: double
#> disp: double
#> hp: double
#> drat: double
#> wt: double
#> qsec: double
#> vs: double
#> am: double
#> gear: double
#> carb: double
#> 
#> See $metadata for additional Schema metadata
```

### Are these changes tested?

Yes

### Are there any user-facing changes?

Yes
* Closes: apache#37842

Authored-by: Nic Crane <[email protected]>
Signed-off-by: Nic Crane <[email protected]>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
### Rationale for this change

Users will be able to easily see the schema which their `data.frame` object will have when it's converted into an Arrwo table.

### What changes are included in this PR?

Implements `infer_schema()` method for `data.frame` objects.

Before:

``` r
library(arrow)
schema(mtcars)
#> Error in UseMethod("infer_schema"): no applicable method for 'infer_schema' applied to an object of class "data.frame"
```
After:

``` r
library(arrow)
schema(mtcars)
#> Schema
#> mpg: double
#> cyl: double
#> disp: double
#> hp: double
#> drat: double
#> wt: double
#> qsec: double
#> vs: double
#> am: double
#> gear: double
#> carb: double
#> 
#> See $metadata for additional Schema metadata
```

### Are these changes tested?

Yes

### Are there any user-facing changes?

Yes
* Closes: apache#37842

Authored-by: Nic Crane <[email protected]>
Signed-off-by: Nic Crane <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[R] Implement infer_schema.data.frame()
2 participants