BUG: fix error in write_dataframe when writing an empty or all-None object column with use_arrow #512

theroggy · 2024-12-21T10:50:50Z

When a dataframe is being written with an object column without any rows or with only None values in the object column, the object column is converted to an null type arrow column, which is not supported by gdal and leads to an error being thrown.

Fixes #513

…ing arrow

brendan-ward · 2024-12-23T17:11:17Z

I'm not clear on what the proper fix is going to be in this case. Should we instead raise our own error when writing an empty dataframe with one or more object dtype columns present using arrow, and direct user to the non-arrow interface? Or should we fall back to using non-arrow ourselves when detecting an empty data frame? No benefit of using arrow for this case.

theroggy · 2024-12-24T11:24:20Z

I'm not clear on what the proper fix is going to be in this case. Should we instead raise our own error when writing an empty dataframe with one or more object dtype columns present using arrow, and direct user to the non-arrow interface? Or should we fall back to using non-arrow ourselves when detecting an empty data frame? No benefit of using arrow for this case.

Yes, I didn't have a clear idea yet either... The thing I was wondering about (as indicated in #513) was if it was a very conscious in pyarrow.Table.from_pandas to convert an object column to null datatype, as all other datatypes (int,...) are retained as such. object is obviously a very special case, so I understand it is a different case compared to int,... but the null datatype doesn't seem super useful to me (I might be wrong)...

It is a good point however that arrow doesn't have a lot of added value if there is no data to be written, so it could be an easy fix to detect the dataframe being empty upfront and disabling use of arrow...

…w-error-when-an-empty-object-column-is-written-using-arrow

theroggy · 2025-01-23T22:15:03Z

I found an extra, related problem. The same error occurs with object type columns with all None values: these are converted to a null type column as well by pyarrow.from_pandas.

I also found a fix that solves both issues: convert all null-type columns to string type.

jorisvandenbossche · 2025-04-26T09:52:31Z

The thing I was wondering about was if it was a very conscious in pyarrow.Table.from_pandas to convert an object column to null datatype, as all other datatypes (int,...) are retained as such. object is obviously a very special case, so I understand it is a different case compared to int,... but the null datatype doesn't seem super useful to me (I might be wrong)...

This is a conscious choice, yes, AFAIK (although it was already like that before my involvement in pyarrow). For other data types in pandas like int, there is a clear equivalent in Arrow, and so it can be retained even for empty dataframes. But "object" dtype has no equivalent in Arrow, and thus the resulting arrow type always has to be "inferred" from the data when converting from pandas to arrow. However, if the column is empty or all-None, there is no data to infer .. At that point there is no ideal choice, but the benefit of using the "null" type is that it essentially does not make a choice, and the type is then not "viral" (if you would infer it as string instead, but it should actually have been something else, then you cannot combine the string column anymore with an int column, while with a null column this can still work).

Now in practice, given that object dtype in pandas is often used as strings, and given that GDAL does not support null and so we have to do some conversion to make it work, I think casting to string as you did now in the PR is indeed the best solution (in practice, if you have an object dtype column, we also convert this to their string representation in the non-arrow write path anyway, so this makes that more consistent between both write paths)

pyogrio/tests/test_geopandas_io.py

TST: add test to show error when an empty object column is written us…

1daec00

…ing arrow

theroggy mentioned this pull request Dec 21, 2024

BUG: writing an empty dataframe with an object column with use_arrow fails #513

Closed

brendan-ward changed the title ~~TST: add test to show error when an empty object column is written uing arrow~~ TST: add test to show error when an empty object column is written using arrow Dec 23, 2024

theroggy added 2 commits January 23, 2025 20:01

Merge remote-tracking branch 'upstream/main' into TST-add-test-to-sho…

8814600

…w-error-when-an-empty-object-column-is-written-using-arrow

Fix issue

e35e708

theroggy changed the title ~~TST: add test to show error when an empty object column is written using arrow~~ BUG: fix error when an empty object column is written with use_arrow Jan 23, 2025

theroggy added 5 commits January 23, 2025 21:09

Update CHANGES.md

5c14569

Add test with xfail for object column with only None values

8026fa4

try fix for python 3.10 tests

a5bfa29

take 2

46ba442

Update test_geopandas_io.py

eda0ff8

theroggy added 2 commits January 26, 2025 20:19

Update geopandas.py

ebb012f

Fix for both empty gataframes and object columns with all None values

e04a8b1

theroggy self-assigned this Jan 26, 2025

theroggy changed the title ~~BUG: fix error when an empty object column is written with use_arrow~~ BUG: fix error when an empty or all-None object column is written with use_arrow Jan 26, 2025

theroggy changed the title ~~BUG: fix error when an empty or all-None object column is written with use_arrow~~ BUG: fix error in write_dataframe when writing an empty or all-None object column with use_arrow Jan 26, 2025

Update CHANGES.md

36b95a8

theroggy requested a review from brendan-ward January 26, 2025 20:33

theroggy added this to the 0.11.0 milestone Apr 10, 2025

jorisvandenbossche reviewed Apr 26, 2025

View reviewed changes

pyogrio/tests/test_geopandas_io.py Outdated Show resolved Hide resolved

theroggy added 2 commits April 26, 2025 19:34

Remove check_index_type=False

22d7f00

Make index type checking depended on pandas version

e4be1b9

theroggy requested a review from jorisvandenbossche April 26, 2025 20:35

jorisvandenbossche approved these changes Apr 27, 2025

View reviewed changes

jorisvandenbossche merged commit 98bb7cd into geopandas:main Apr 27, 2025
25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: fix error in write_dataframe when writing an empty or all-None object column with use_arrow #512

BUG: fix error in write_dataframe when writing an empty or all-None object column with use_arrow #512

Uh oh!

theroggy commented Dec 21, 2024 •

edited

Loading

Uh oh!

brendan-ward commented Dec 23, 2024

Uh oh!

theroggy commented Dec 24, 2024 •

edited

Loading

Uh oh!

theroggy commented Jan 23, 2025 •

edited

Loading

Uh oh!

jorisvandenbossche commented Apr 26, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BUG: fix error in write_dataframe when writing an empty or all-None object column with use_arrow #512

BUG: fix error in write_dataframe when writing an empty or all-None object column with use_arrow #512

Uh oh!

Conversation

theroggy commented Dec 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brendan-ward commented Dec 23, 2024

Uh oh!

theroggy commented Dec 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

theroggy commented Jan 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jorisvandenbossche commented Apr 26, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

theroggy commented Dec 21, 2024 •

edited

Loading

theroggy commented Dec 24, 2024 •

edited

Loading

theroggy commented Jan 23, 2025 •

edited

Loading