Releases: aws/aws-sdk-pandas
AWS Data Wrangler 2.9.0
Caveats
⚠️ For platforms without PyArrow 4 support (e.g. MWAA, EMR, Glue PySpark Job):
➡️pip install pyarrow==2 awswrangler
Documentation
- Added S3 Select tutorial #748
- Clarified wr.s3.to_csv docs #730
Enhancements
- Enable server-side predicate filtering using
S3 Select🚀 #678 - Support
VersionIdparameter for S3 read operations #721 - Enable prefix in output S3 files for
wr.redshift.unload_to_files#729 - Add option to skip commit on
wr.redshift.to_sql#705 - Move integration test infrastructure to CDK 🎉 #706
Bug Fix
- Wait until athena query results bucket is created #735
- Remove explicit Excel engine configuration #742
- Fix bucketing types #719
- Change end_time to UTC #720
Thanks
We thank the following contributors/users for their work on this release:
@maxispeicher, @kukushking, @jaidisido
P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!
AWS Data Wrangler 2.8.0
Caveats
⚠️ For platforms without PyArrow 4 support (e.g. MWAA, EMR, Glue PySpark Job):
➡️pip install pyarrow==2 awswrangler
Documentation
- Install Lambda Layers and Python wheels from public S3 bucket 🎉 #666
- Clarified docs around potential in-place mutation of dataframe when using
to_parquet#669
Enhancements
- Enable parallel s3 downloads (~20% speedup) 🚀 #644
- Apache Arrow 4.0.0 support (enables ARM instances support as well) #557
- Enable
LOCKbefore concurrentCOPYcalls in Redshift #665 - Make use of Pyarrow
iter_batches(>= 3.0.0 only) #660 - Enable additional options when overwriting Redshift table (
drop,truncate,cascade) #671 - Reuse s3 client across threads for s3 range requests #684
Bug Fix
- Add
dtypesfor empty ctas athena queries #659 - Add Serde properties when creating CSV table #672
- Pass SSL properties from Glue Connection to MySQL #554
Thanks
We thank the following contributors/users for their work on this release:
@maxispeicher, @kukushking, @igorborgest, @gballardin, @eferm, @jaklan, @Falydoor, @chariottrider, @chriscugliotta, @konradsemsch, @gvermillion, @russellbrooks, @mshober.
P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!
AWS Data Wrangler 2.7.0
Caveats
⚠️ For platforms without PyArrow 3 support (e.g. MWAA, EMR, Glue PySpark Job):
➡️pip install pyarrow==2 awswrangler
Documentation
- Updated documentation to clarify
wr.athena.read_sql_queryparams argument use #609
New Functionalities
- Supporting MySQL upserts #608
- Enable prepending S3 parquet files with a prefix in
wr.s3.write.to_parquet#617 - Add
exist_okflag to safely create a Glue database #642 - Add "Unsupported Pyarrow type" exception #639
Bug Fix
- Fix
chunkedmode inwr.s3.read_parquet_table#627 - Fix missing
\character fromwr.s3.read_parquet_tablemethod #638 - Support
postgresas an engine value #630 - Add default workgroup result configuration #633
- Raise exception when
merge_upsert_tablefails or data_quality is insufficient #601 - Fixing nested structure bug in
athena2pyarrowmethod #612
Thanks
We thank the following contributors/users for their work on this release:
@maxispeicher, @igorborgest, @mattboyd-aws, @vlieven, @bentkibler, @adarsh-chauhan, @impredicative, @nmduarteus, @JoshCrosby, @TakumiHaruta, @zdk123, @tuannguyen0901, @jiteshsoni, @luminita.
P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run!
AWS Data Wrangler 2.6.0
Caveats
⚠️ For platforms without PyArrow 3 support (e.g. MWAA, EMR, Glue PySpark Job):
➡️pip install pyarrow==2 awswrangler
Enhancements
- Added a
chunksizeparameter to theto_sqlfunction. Default set to 200. Decreased insertion time from 120 to 1 second #599 pathargument is now optional ins3.to_parquetands3.to_csvfunctions #586- Added a
map_typesboolean (set to True by default) to convert pyarrow DataTypes to pandas ExtensionDtypes #580 - Added optional
ctas_database_nameargument to storectas_temporary_tablein an alternative database #576
Thanks
We thank the following contributors/users for their work on this release:
@maxispeicher, @igorborgest, @ilyanoskov, @VashMKS, @jmahlik, @dimapod, @Reeska
P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run!
AWS Data Wrangler 2.5.0
Caveats
⚠️ For platforms without PyArrow 3 support (e.g. MWAA, EMR, Glue PySpark Job):
➡️pip install pyarrow==2 awswrangler
Documentation
- New HTML tutorials #551
- Use bump2version for changing version numbers #573
- Mishandling of wildcard characters in read_parquet #564
Enhancements
- Support for
ExpectedBucketOwner#562
Thanks
We thank the following contributors/users for their work on this release:
@maxispeicher, @impredicative, @adarsh-chauhan, @Malkard.
P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run!
AWS Data Wrangler 2.4.0 (Docs updated)
Caveats
⚠️ For platforms without PyArrow 3 support (e.g. EMR, Glue PySpark Job):
➡️pip install pyarrow==2 awswrangler
Documentation
New Functionalities
- Redshift COPY now supports the new SUPER type (i.e. SERIALIZETOJSON) #514
- S3 Upload/download files #506
- Include dataset BUCKETING for s3 datasets writing #443
- Enable Merge Upsert for existing Glue Tables on Primary Keys #503
- Support Requester Pays S3 Buckets #430
- Add botocore Config to wr.config #535
Enhancements
- Pandas 1.2.1 support #525
- Numpy 1.20.0 support
- Apache Arrow 3.0.0 support #531
- Python 3.9 support #454
Bug Fix
- Return DataFrame with unique index for Athena CTAS queries #527
- Remove unnecessary schema inference. #524
Thanks
We thank the following contributors/users for their work on this release:
@maxispeicher, @danielwo, @jiteshsoni, @igorborgest, @njdanielsen, @eric-valente, @gvermillion, @zseder, @gdbassett, @orenmazor, @senorkrabs, @Natalie-Caruana, @dragonH, @nikwerhypoport, @hwangji.
P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run!
AWS Data Wrangler 2.4.0
New Functionalities
- Redshift COPY now supports the new SUPER type (i.e. SERIALIZETOJSON) #514
- S3 Upload/download files #506
- Include dataset BUCKETING for s3 datasets writing #443
- Enable Merge Upsert for existing Glue Tables on Primary Keys #503
- Support Requester Pays S3 Buckets #430
- Add botocore Config to wr.config #535
Enhancements
- Pandas 1.2.1 support #525
- Numpy 1.20.0 support
- Apache Arrow 3.0.0 support #531
- Python 3.9 support #454
Bug Fix
- Return DataFrame with unique index for Athena CTAS queries #527
- Remove unnecessary schema inference. #524
Thanks
We thank the following contributors/users for their work on this release:
@maxispeicher, @danielwo, @jiteshsoni, @igorborgest, @njdanielsen, @eric-valente, @gvermillion, @zseder, @gdbassett, @orenmazor, @senorkrabs, @Natalie-Caruana.
P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run!
AWS Data Wrangler 2.3.0
New Functionalities
- DynamoDB support #448
- SQLServer support (Driver must be installed separately) #356
- Excel files support #419 #509
- Amazon S3 Access Point support #393
- Amazon Chime initial support #494
- Write compressed CSV and JSON files on S3 #308 #359 #412
Enhancements
- Add query parameters for Athena #432
- Add metadata caching for Athena #461
- Add suffix filters for
s3.read_parquet_table()#495
Bug Fix
- Fix
keep_filesbehavior for failed Redshift COPY executions #505
Thanks
We thank the following contributors/users for their work on this release:
@maxispeicher, @danielwo, @jiteshsoni, @gvermillion, @rodalarcon, @imanebosch, @dwbelliston, @tochandrashekhar, @kylepierce, @njdanielsen, @jasadams, @gtossou, @JasonSanchez, @kokes, @hanan-vian @igorborgest.
P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run!
AWS Data Wrangler 2.2.0
New Functionalities
- Add
aws_access_key_id,aws_secret_access_key,aws_session_tokenandboto3_sessionfor Redshift copy/unload #484
Bug Fix
- Remove dtype print statement #487
Thanks
We thank the following contributors/users for their work on this release:
@danielwo, @thetimbecker, @njdanielsen, @igorborgest.
P.S. Lambda Layer zip file and Glue wheel/egg files are available below. Just upload it and run!
AWS Data Wrangler 2.1.0
New Functionalities
- Add secretmanager module and support for databases connections #402
con = wr.redshift.connect(secret_id="my-secret", dbname="my-db")
df = wr.redshift.read_sql_query("SELECT ...", con=con)
con.close()Bug Fix
- Fix connection attributes quoting for
wr.*.connect()#481 - Fix parquet table append for nested struct columns #480
Thanks
We thank the following contributors/users for their work on this release:
@danielwo, @nmduarteus, @nivf33, @kinghuang, @igorborgest.
P.S. Lambda Layer zip file and Glue wheel/egg files are available below. Just upload it and run!