-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-37842: [R] Implement infer_schema.data.frame() #37843
GH-37842: [R] Implement infer_schema.data.frame() #37843
Conversation
r/R/schema.R
Outdated
@@ -285,6 +285,9 @@ infer_schema.Dataset <- function(x) x$schema | |||
#' @export | |||
infer_schema.arrow_dplyr_query <- function(x) implicit_schema(x) | |||
|
|||
#' @export | |||
infer_schema.data.frame <- function(x) infer_schema(arrow_table(x)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arrow_table(some_data_frame)
is potentially a very expensive operation (possibly requiring a full copy of the data if, for example, they are all string columns).
Another option is infer_schema(arrow_table(x[integer(), , drop = FALSE]))
, although this won't quite work for things like list()
columns where we actually need some values to properly infer the type.
Unless I'm missing some prior art in the package, I think schema(!!! lapply(x, infer_type))
might be the way to go.
r/tests/testthat/test-schema.R
Outdated
@@ -300,6 +300,9 @@ test_that("schema extraction", { | |||
expect_equal(schema(example_data), tbl$schema) | |||
expect_equal(schema(tbl), tbl$schema) | |||
|
|||
expect_equal(schema(data.frame(a = 1, a = "x")), schema(a = double(), a.1 = string())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expect_equal(schema(data.frame(a = 1, a = "x")), schema(a = double(), a.1 = string())) | |
expect_equal(schema(data.frame(a = 1, a = "x", check.names = FALSE)), schema(a = double(), a = string())) |
(This might error and that's OK too, can just be expect_error()
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good shout, that actually captures the true nature of what you're wanting to test there!
After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 79abb73. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
### Rationale for this change Users will be able to easily see the schema which their `data.frame` object will have when it's converted into an Arrwo table. ### What changes are included in this PR? Implements `infer_schema()` method for `data.frame` objects. Before: ``` r library(arrow) schema(mtcars) #> Error in UseMethod("infer_schema"): no applicable method for 'infer_schema' applied to an object of class "data.frame" ``` After: ``` r library(arrow) schema(mtcars) #> Schema #> mpg: double #> cyl: double #> disp: double #> hp: double #> drat: double #> wt: double #> qsec: double #> vs: double #> am: double #> gear: double #> carb: double #> #> See $metadata for additional Schema metadata ``` ### Are these changes tested? Yes ### Are there any user-facing changes? Yes * Closes: apache#37842 Authored-by: Nic Crane <[email protected]> Signed-off-by: Nic Crane <[email protected]>
### Rationale for this change Users will be able to easily see the schema which their `data.frame` object will have when it's converted into an Arrwo table. ### What changes are included in this PR? Implements `infer_schema()` method for `data.frame` objects. Before: ``` r library(arrow) schema(mtcars) #> Error in UseMethod("infer_schema"): no applicable method for 'infer_schema' applied to an object of class "data.frame" ``` After: ``` r library(arrow) schema(mtcars) #> Schema #> mpg: double #> cyl: double #> disp: double #> hp: double #> drat: double #> wt: double #> qsec: double #> vs: double #> am: double #> gear: double #> carb: double #> #> See $metadata for additional Schema metadata ``` ### Are these changes tested? Yes ### Are there any user-facing changes? Yes * Closes: apache#37842 Authored-by: Nic Crane <[email protected]> Signed-off-by: Nic Crane <[email protected]>
### Rationale for this change Users will be able to easily see the schema which their `data.frame` object will have when it's converted into an Arrwo table. ### What changes are included in this PR? Implements `infer_schema()` method for `data.frame` objects. Before: ``` r library(arrow) schema(mtcars) #> Error in UseMethod("infer_schema"): no applicable method for 'infer_schema' applied to an object of class "data.frame" ``` After: ``` r library(arrow) schema(mtcars) #> Schema #> mpg: double #> cyl: double #> disp: double #> hp: double #> drat: double #> wt: double #> qsec: double #> vs: double #> am: double #> gear: double #> carb: double #> #> See $metadata for additional Schema metadata ``` ### Are these changes tested? Yes ### Are there any user-facing changes? Yes * Closes: apache#37842 Authored-by: Nic Crane <[email protected]> Signed-off-by: Nic Crane <[email protected]>
### Rationale for this change Users will be able to easily see the schema which their `data.frame` object will have when it's converted into an Arrwo table. ### What changes are included in this PR? Implements `infer_schema()` method for `data.frame` objects. Before: ``` r library(arrow) schema(mtcars) #> Error in UseMethod("infer_schema"): no applicable method for 'infer_schema' applied to an object of class "data.frame" ``` After: ``` r library(arrow) schema(mtcars) #> Schema #> mpg: double #> cyl: double #> disp: double #> hp: double #> drat: double #> wt: double #> qsec: double #> vs: double #> am: double #> gear: double #> carb: double #> #> See $metadata for additional Schema metadata ``` ### Are these changes tested? Yes ### Are there any user-facing changes? Yes * Closes: apache#37842 Authored-by: Nic Crane <[email protected]> Signed-off-by: Nic Crane <[email protected]>
Rationale for this change
Users will be able to easily see the schema which their
data.frame
object will have when it's converted into an Arrwo table.What changes are included in this PR?
Implements
infer_schema()
method fordata.frame
objects.Before:
After:
Are these changes tested?
Yes
Are there any user-facing changes?
Yes
infer_schema.data.frame()
#37842