Releases · aws/aws-sdk-pandas

18 Jun 13:15

2.9.0

89b459d

AWS Data Wrangler 2.9.0

Caveats

⚠️ For platforms without PyArrow 4 support (e.g. MWAA, EMR, Glue PySpark Job):

➡️ pip install pyarrow==2 awswrangler

Documentation

Added S3 Select tutorial #748
Clarified wr.s3.to_csv docs #730

Enhancements

Enable server-side predicate filtering using S3 Select 🚀 #678
Support VersionId parameter for S3 read operations #721
Enable prefix in output S3 files for wr.redshift.unload_to_files #729
Add option to skip commit on wr.redshift.to_sql #705
Move integration test infrastructure to CDK 🎉 #706

Bug Fix

Wait until athena query results bucket is created #735
Remove explicit Excel engine configuration #742
Fix bucketing types #719
Change end_time to UTC #720

Thanks

We thank the following contributors/users for their work on this release:

@maxispeicher, @kukushking, @jaidisido

P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!

Assets 6

19 May 13:40

jaidisido

2.8.0

b13fcd8

AWS Data Wrangler 2.8.0

Caveats

⚠️ For platforms without PyArrow 4 support (e.g. MWAA, EMR, Glue PySpark Job):

➡️ pip install pyarrow==2 awswrangler

Documentation

Install Lambda Layers and Python wheels from public S3 bucket 🎉 #666
Clarified docs around potential in-place mutation of dataframe when using to_parquet #669

Enhancements

Enable parallel s3 downloads (~20% speedup) 🚀 #644
Apache Arrow 4.0.0 support (enables ARM instances support as well) #557
Enable LOCK before concurrent COPY calls in Redshift #665
Make use of Pyarrow iter_batches (>= 3.0.0 only) #660
Enable additional options when overwriting Redshift table (drop, truncate, cascade) #671
Reuse s3 client across threads for s3 range requests #684

Bug Fix

Add dtypes for empty ctas athena queries #659
Add Serde properties when creating CSV table #672
Pass SSL properties from Glue Connection to MySQL #554

Thanks

We thank the following contributors/users for their work on this release:

@maxispeicher, @kukushking, @igorborgest, @gballardin, @eferm, @jaklan, @Falydoor, @chariottrider, @chriscugliotta, @konradsemsch, @gvermillion, @russellbrooks, @mshober.

P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!

Assets 6

15 Apr 17:17

jaidisido

2.7.0

fd1b62f

AWS Data Wrangler 2.7.0

Caveats

⚠️ For platforms without PyArrow 3 support (e.g. MWAA, EMR, Glue PySpark Job):

➡️ pip install pyarrow==2 awswrangler

Documentation

Updated documentation to clarify wr.athena.read_sql_query params argument use #609

New Functionalities

Supporting MySQL upserts #608
Enable prepending S3 parquet files with a prefix in wr.s3.write.to_parquet #617
Add exist_ok flag to safely create a Glue database #642
Add "Unsupported Pyarrow type" exception #639

Bug Fix

Fix chunked mode in wr.s3.read_parquet_table #627
Fix missing \ character from wr.s3.read_parquet_table method #638
Support postgres as an engine value #630
Add default workgroup result configuration #633
Raise exception when merge_upsert_table fails or data_quality is insufficient #601
Fixing nested structure bug in athena2pyarrow method #612

Thanks

We thank the following contributors/users for their work on this release:

@maxispeicher, @igorborgest, @mattboyd-aws, @vlieven, @bentkibler, @adarsh-chauhan, @impredicative, @nmduarteus, @JoshCrosby, @TakumiHaruta, @zdk123, @tuannguyen0901, @jiteshsoni, @luminita.

P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run!

Assets 6

16 Mar 18:50

jaidisido

2.6.0

049d1d6

AWS Data Wrangler 2.6.0

Caveats

⚠️ For platforms without PyArrow 3 support (e.g. MWAA, EMR, Glue PySpark Job):

➡️ pip install pyarrow==2 awswrangler

Enhancements

Added a chunksize parameter to the to_sql function. Default set to 200. Decreased insertion time from 120 to 1 second #599
path argument is now optional in s3.to_parquet and s3.to_csv functions #586
Added a map_types boolean (set to True by default) to convert pyarrow DataTypes to pandas ExtensionDtypes #580
Added optional ctas_database_name argument to store ctas_temporary_table in an alternative database #576

Thanks

We thank the following contributors/users for their work on this release:

@maxispeicher, @igorborgest, @ilyanoskov, @VashMKS, @jmahlik, @dimapod, @Reeska

P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run!

Assets 6

03 Mar 16:59

igorborgest

2.5.0

ac5407b

AWS Data Wrangler 2.5.0

Caveats

⚠️ For platforms without PyArrow 3 support (e.g. MWAA, EMR, Glue PySpark Job):

➡️ pip install pyarrow==2 awswrangler

Documentation

New HTML tutorials #551
Use bump2version for changing version numbers #573
Mishandling of wildcard characters in read_parquet #564

Enhancements

Support for ExpectedBucketOwner #562

Thanks

We thank the following contributors/users for their work on this release:

@maxispeicher, @impredicative, @adarsh-chauhan, @Malkard.

P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run!

Assets 6

04 Feb 13:24

igorborgest

2.4.0-docs

a1ec7d3

AWS Data Wrangler 2.4.0 (Docs updated)

Caveats

⚠️ For platforms without PyArrow 3 support (e.g. EMR, Glue PySpark Job):

➡️ pip install pyarrow==2 awswrangler

Documentation

Update to include PyArrow 3 caveats for EMR and Glue PySpark Job. #546 #547

New Functionalities

Redshift COPY now supports the new SUPER type (i.e. SERIALIZETOJSON) #514
S3 Upload/download files #506
Include dataset BUCKETING for s3 datasets writing #443
Enable Merge Upsert for existing Glue Tables on Primary Keys #503
Support Requester Pays S3 Buckets #430
Add botocore Config to wr.config #535

Enhancements

Pandas 1.2.1 support #525
Numpy 1.20.0 support
Apache Arrow 3.0.0 support #531
Python 3.9 support #454

Bug Fix

Return DataFrame with unique index for Athena CTAS queries #527
Remove unnecessary schema inference. #524

Thanks

We thank the following contributors/users for their work on this release:

@maxispeicher, @danielwo, @jiteshsoni, @igorborgest, @njdanielsen, @eric-valente, @gvermillion, @zseder, @gdbassett, @orenmazor, @senorkrabs, @Natalie-Caruana, @dragonH, @nikwerhypoport, @hwangji.

P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run!

Assets 6

03 Feb 23:26

igorborgest

2.4.0

4b9f270

AWS Data Wrangler 2.4.0

New Functionalities

Redshift COPY now supports the new SUPER type (i.e. SERIALIZETOJSON) #514
S3 Upload/download files #506
Include dataset BUCKETING for s3 datasets writing #443
Enable Merge Upsert for existing Glue Tables on Primary Keys #503
Support Requester Pays S3 Buckets #430
Add botocore Config to wr.config #535

Enhancements

Pandas 1.2.1 support #525
Numpy 1.20.0 support
Apache Arrow 3.0.0 support #531
Python 3.9 support #454

Bug Fix

Return DataFrame with unique index for Athena CTAS queries #527
Remove unnecessary schema inference. #524

Thanks

We thank the following contributors/users for their work on this release:

@maxispeicher, @danielwo, @jiteshsoni, @igorborgest, @njdanielsen, @eric-valente, @gvermillion, @zseder, @gdbassett, @orenmazor, @senorkrabs, @Natalie-Caruana.

P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run!

Assets 6

10 Jan 14:36

igorborgest

2.3.0

13e71a0

AWS Data Wrangler 2.3.0

New Functionalities

DynamoDB support #448
SQLServer support (Driver must be installed separately) #356
Excel files support #419 #509
Amazon S3 Access Point support #393
Amazon Chime initial support #494
Write compressed CSV and JSON files on S3 #308 #359 #412

Enhancements

Add query parameters for Athena #432
Add metadata caching for Athena #461
Add suffix filters for s3.read_parquet_table() #495

Bug Fix

Fix keep_files behavior for failed Redshift COPY executions #505

Thanks

We thank the following contributors/users for their work on this release:

@maxispeicher, @danielwo, @jiteshsoni, @gvermillion, @rodalarcon, @imanebosch, @dwbelliston, @tochandrashekhar, @kylepierce, @njdanielsen, @jasadams, @gtossou, @JasonSanchez, @kokes, @hanan-vian @igorborgest.

P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run!

Assets 6

23 Dec 00:05

igorborgest

2.2.0

c241095

AWS Data Wrangler 2.2.0

New Functionalities

Add aws_access_key_id, aws_secret_access_key, aws_session_token and boto3_session for Redshift copy/unload #484

Bug Fix

Remove dtype print statement #487

Thanks

We thank the following contributors/users for their work on this release:

@danielwo, @thetimbecker, @njdanielsen, @igorborgest.

P.S. Lambda Layer zip file and Glue wheel/egg files are available below. Just upload it and run!

Assets 7

21 Dec 11:11

igorborgest

2.1.0

92ae19d

AWS Data Wrangler 2.1.0

New Functionalities

Add secretmanager module and support for databases connections #402

con = wr.redshift.connect(secret_id="my-secret", dbname="my-db")
df = wr.redshift.read_sql_query("SELECT ...", con=con)
con.close()

Bug Fix

Fix connection attributes quoting for wr.*.connect() #481
Fix parquet table append for nested struct columns #480

Thanks

We thank the following contributors/users for their work on this release:

@danielwo, @nmduarteus, @nivf33, @kinghuang, @igorborgest.

P.S. Lambda Layer zip file and Glue wheel/egg files are available below. Just upload it and run!

Assets 7

Releases: aws/aws-sdk-pandas

AWS Data Wrangler 2.9.0

Caveats

Documentation

Enhancements

Bug Fix

Thanks

Uh oh!

AWS Data Wrangler 2.8.0

Caveats

Documentation

Enhancements

Bug Fix

Thanks

Uh oh!

AWS Data Wrangler 2.7.0

Caveats

Documentation

New Functionalities

Bug Fix

Thanks

Uh oh!

AWS Data Wrangler 2.6.0

Caveats

Enhancements

Thanks

Uh oh!

AWS Data Wrangler 2.5.0

Caveats

Documentation

Enhancements

Thanks

Uh oh!

AWS Data Wrangler 2.4.0 (Docs updated)

Caveats

Documentation

New Functionalities

Enhancements

Bug Fix

Thanks

Uh oh!

AWS Data Wrangler 2.4.0

New Functionalities

Enhancements

Bug Fix

Thanks

Uh oh!

AWS Data Wrangler 2.3.0

New Functionalities

Enhancements

Bug Fix

Thanks

Uh oh!

AWS Data Wrangler 2.2.0

New Functionalities

Bug Fix

Thanks

Uh oh!

AWS Data Wrangler 2.1.0

New Functionalities

Bug Fix

Thanks

Uh oh!