Skip to content

Commit

Permalink
Remove top-level import of pyarrow (#1703)
Browse files Browse the repository at this point in the history
The [release candidate artifact build
environment](https://github.com/apache/iceberg-python/blob/a58f099aa501f6fd4345a331295d81fe0133554f/.github/workflows/pypi-build-artifacts.yml#L72-L74)
does not automatically install `pyarrow`. So when the import requires
`pyarrow`, it fails. See run
https://github.com/apache/iceberg-python/actions/runs/13464626812/job/37627644985

Import is via `conftest`
```
  ImportError while loading conftest '/project/tests/conftest.py'.
  /project/tests/conftest.py:52: in <module>
      from pyiceberg.catalog import Catalog, load_catalog
  ../venv/lib/python3.9/site-packages/pyiceberg/catalog/__init__.py:51: in <module>
      from pyiceberg.serializers import ToOutputFile
  ../venv/lib/python3.9/site-packages/pyiceberg/serializers.py:25: in <module>
      from pyiceberg.table.metadata import TableMetadata, TableMetadataUtil
  ../venv/lib/python3.9/site-packages/pyiceberg/table/__init__.py:65: in <module>
      from pyiceberg.io.pyarrow import ArrowScan, expression_to_pyarrow, schema_to_pyarrow
  ../venv/lib/python3.9/site-packages/pyiceberg/io/pyarrow.py:62: in <module>
      import pyarrow as pa
  E   ModuleNotFoundError: No module named 'pyarrow'
```

This isnt caught in CI since we install all extra deps by default,
including `pyarrow`


Tested in the release candidate build action on my fork:
https://github.com/kevinjqliu/iceberg-python/actions/runs/13465085426 ✅

cc @geruh

---------

Co-authored-by: Fokko Driesprong <[email protected]>
  • Loading branch information
kevinjqliu and Fokko authored Feb 21, 2025
1 parent a58f099 commit 06404a5
Showing 1 changed file with 9 additions and 2 deletions.
11 changes: 9 additions & 2 deletions pyiceberg/table/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,6 @@
manifest_evaluator,
)
from pyiceberg.io import FileIO, load_file_io
from pyiceberg.io.pyarrow import ArrowScan, expression_to_pyarrow, schema_to_pyarrow
from pyiceberg.manifest import (
POSITIONAL_DELETE_SCHEMA,
DataFile,
Expand Down Expand Up @@ -1150,6 +1149,12 @@ def upsert(
Returns:
An UpsertResult class (contains details of rows updated and inserted)
"""
try:
import pyarrow as pa # noqa: F401
except ModuleNotFoundError as e:
raise ModuleNotFoundError("For writes PyArrow needs to be installed") from e

from pyiceberg.io.pyarrow import expression_to_pyarrow
from pyiceberg.table import upsert_util

if join_cols is None:
Expand Down Expand Up @@ -1770,7 +1775,7 @@ def to_arrow_batch_reader(self) -> pa.RecordBatchReader:
"""
import pyarrow as pa

from pyiceberg.io.pyarrow import ArrowScan
from pyiceberg.io.pyarrow import ArrowScan, schema_to_pyarrow

target_schema = schema_to_pyarrow(self.projection())
batches = ArrowScan(
Expand Down Expand Up @@ -1828,6 +1833,8 @@ def to_polars(self) -> pl.DataFrame:
return result

def count(self) -> int:
from pyiceberg.io.pyarrow import ArrowScan

# Usage: Calculates the total number of records in a Scan that haven't had positional deletes.
res = 0
# every task is a FileScanTask
Expand Down

0 comments on commit 06404a5

Please sign in to comment.