A comprehensive toolkit for exporting and managing user data from AWS Cognito User Pools.
- 📤 Robust Cognito Data Export: Export user attributes from AWS Cognito User Pools to CSV format
- 🔄 Exponential Backoff with Jitter: Automatically handles AWS API rate limits with intelligent retry logic
- ⏸️ Checkpoint & Resume: Save progress during exports and resume from where you left off
- 🧹 CSV Deduplication: Remove duplicate user entries from exported CSV files
- 🔍 Flexible Attribute Selection: Export specific attributes or discover and export all available attributes
- 📃 Pagination Support: Efficiently handles large user pools with proper pagination
- Python 3.10+
- AWS credentials configured (either via environment variables, credentials file, or IAM role)
- Required Python packages:
- boto3
- botocore
-
Clone this repository:
git clone https://github.com/tblakex01/Cognito-Attribute-Exporter.git cd Cognito-Attribute-Exporter
-
Install dependencies:
pip install boto3
The main export tool supports various options for exporting user data from Cognito:
python cognito_exporter.py --user-pool-id YOUR_POOL_ID --export-all
--user-pool-id
: Your Cognito User Pool ID (required)--export-all
: Export all available attributes-attr, --export-attributes
: List specific attributes to export--region
: AWS region (default: us-east-1)--profile
: AWS profile to use-f, --file-name
: Output CSV filename--max-retries
: Maximum retry attempts for rate-limited requests--resume
: Resume from last saved checkpoint
📋 Export all attributes:
python cognito_exporter.py --user-pool-id us-east-1_abcdefghi --export-all
📋 Export specific attributes:
python cognito_exporter.py --user-pool-id us-east-1_abcdefghi --export-attributes username email phone_number
📋 Resume an interrupted export:
python cognito_exporter.py --user-pool-id us-east-1_abcdefghi --export-all --resume
📋 Custom retry settings:
python cognito_exporter.py --user-pool-id us-east-1_abcdefghi --export-all --max-retries 10 --base-delay 1.0
The deduplication tool helps remove duplicate entries from exported CSV files:
python cognito_csv_deduplicator.py CognitoUsers.csv
input_file
: Path to the CSV file to deduplicate (required)-o, --output-file
: Custom output file path-k, --keys
: Column names to use as unique keys (default: sub)--keep-last
: Keep the last occurrence of duplicates instead of the first--dry-run
: Check for duplicates without modifying files
📝 Basic deduplication:
python cognito_csv_deduplicator.py CognitoUsers.csv
📝 Custom key fields:
python cognito_csv_deduplicator.py CognitoUsers.csv -k username email
📝 Check for duplicates without making changes:
python cognito_csv_deduplicator.py CognitoUsers.csv --dry-run
The Cognito Exporter includes built-in features to handle AWS API rate limits:
- 📈 Exponential Backoff: Automatically increases wait time between retries
- 🎲 Jitter: Adds randomness to retry intervals to prevent synchronized retries
- ⚙️ Configurable Retry Parameters: Customize max retries and delay settings
- 🛡️ Built-in Rate Limiting: Adds small delays between API calls to reduce throttling
The export process automatically saves checkpoints to allow resuming interrupted exports:
- Checkpoints are saved every 10 pages or 500 records
- Use the
--resume
flag to continue from the last checkpoint - Checkpoint files are saved with the
.checkpoint
extension
When using --export-all
, the tool automatically:
- Samples users to discover all available attributes
- Includes both standard and custom attributes
- Falls back to common attributes if no users are found
- Rate Limiting Errors: Try increasing
--base-delay
and--max-retries
- Memory Issues: Export specific attributes instead of all attributes
- CSV Parsing Problems: Ensure the CSV is properly encoded (UTF-8)
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.
Made with ❤️ for AWS Cognito users by Anthony Michaels.