Skip to content

ENH: reading Parquet with PyArrrow : read_parquet equivalent of date_as_object=False #62262

@simonaubertbd

Description

@simonaubertbd

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

Hello,

This follows apache/arrow#47464 (comment)

To quote the excellente answer I got
`Parquet has a date type, and Arrow as well. And so when reading the Parquet file into an Arrow table using pyarrow, you would see that the type is preserved. You can try:

import pyarrow.parquet as pq
table = pq.read_table("C:/Users/qnsv2207/Desktop/test_amphi_compare.parquet")
table

This should show that the "OrderDate" column has a date32 type.

But then pandas does not have a built-in "date" type. Therefore, in the arrow->pandas conversion, pyarrow by default converts its date types into an object column with python datetime.date objects.

See the documentation about this at https://arrow.apache.org/docs/python/pandas.html#date-types, which also mentions the date_as_object=False option you can specify in to_pandas() to avoid this conversion to object dtype.`

But there is no way to pass in read_parquet equivalent of date_as_object=False as far as I understand the documentation

Feature Description

A new parameter for read_parquet equivalent of pyarrow date_as_object
I would even set the default to treat pyarrow date as date in pandas

Alternative Solutions

Not using pandas to read parquet but directly pyarrows. Doesn't sound practical..

Additional Context

No response

Metadata

Metadata

Assignees

Labels

EnhancementIO Parquetparquet, featherNeeds TriageIssue that has not been reviewed by a pandas team memberdatetime.datestdlib datetime.date support

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions