Skip to content

Commit 06404a5

Browse files
kevinjqliuFokko
andauthored
Remove top-level import of pyarrow (#1703)
The [release candidate artifact build environment](https://github.com/apache/iceberg-python/blob/a58f099aa501f6fd4345a331295d81fe0133554f/.github/workflows/pypi-build-artifacts.yml#L72-L74) does not automatically install `pyarrow`. So when the import requires `pyarrow`, it fails. See run https://github.com/apache/iceberg-python/actions/runs/13464626812/job/37627644985 Import is via `conftest` ``` ImportError while loading conftest '/project/tests/conftest.py'. /project/tests/conftest.py:52: in <module> from pyiceberg.catalog import Catalog, load_catalog ../venv/lib/python3.9/site-packages/pyiceberg/catalog/__init__.py:51: in <module> from pyiceberg.serializers import ToOutputFile ../venv/lib/python3.9/site-packages/pyiceberg/serializers.py:25: in <module> from pyiceberg.table.metadata import TableMetadata, TableMetadataUtil ../venv/lib/python3.9/site-packages/pyiceberg/table/__init__.py:65: in <module> from pyiceberg.io.pyarrow import ArrowScan, expression_to_pyarrow, schema_to_pyarrow ../venv/lib/python3.9/site-packages/pyiceberg/io/pyarrow.py:62: in <module> import pyarrow as pa E ModuleNotFoundError: No module named 'pyarrow' ``` This isnt caught in CI since we install all extra deps by default, including `pyarrow` Tested in the release candidate build action on my fork: https://github.com/kevinjqliu/iceberg-python/actions/runs/13465085426 ✅ cc @geruh --------- Co-authored-by: Fokko Driesprong <[email protected]>
1 parent a58f099 commit 06404a5

File tree

1 file changed

+9
-2
lines changed

1 file changed

+9
-2
lines changed

pyiceberg/table/__init__.py

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,6 @@
6262
manifest_evaluator,
6363
)
6464
from pyiceberg.io import FileIO, load_file_io
65-
from pyiceberg.io.pyarrow import ArrowScan, expression_to_pyarrow, schema_to_pyarrow
6665
from pyiceberg.manifest import (
6766
POSITIONAL_DELETE_SCHEMA,
6867
DataFile,
@@ -1150,6 +1149,12 @@ def upsert(
11501149
Returns:
11511150
An UpsertResult class (contains details of rows updated and inserted)
11521151
"""
1152+
try:
1153+
import pyarrow as pa # noqa: F401
1154+
except ModuleNotFoundError as e:
1155+
raise ModuleNotFoundError("For writes PyArrow needs to be installed") from e
1156+
1157+
from pyiceberg.io.pyarrow import expression_to_pyarrow
11531158
from pyiceberg.table import upsert_util
11541159

11551160
if join_cols is None:
@@ -1770,7 +1775,7 @@ def to_arrow_batch_reader(self) -> pa.RecordBatchReader:
17701775
"""
17711776
import pyarrow as pa
17721777

1773-
from pyiceberg.io.pyarrow import ArrowScan
1778+
from pyiceberg.io.pyarrow import ArrowScan, schema_to_pyarrow
17741779

17751780
target_schema = schema_to_pyarrow(self.projection())
17761781
batches = ArrowScan(
@@ -1828,6 +1833,8 @@ def to_polars(self) -> pl.DataFrame:
18281833
return result
18291834

18301835
def count(self) -> int:
1836+
from pyiceberg.io.pyarrow import ArrowScan
1837+
18311838
# Usage: Calculates the total number of records in a Scan that haven't had positional deletes.
18321839
res = 0
18331840
# every task is a FileScanTask

0 commit comments

Comments
 (0)