AWS Glue Reusable Module Project

Overview

This project demonstrates how to set up a reusable module within the AWS Glue service stack. The key objectives and components include:

Reusable Glue Modules: The project focuses on creating modular Glue jobs that can be reused across different data transformation tasks, showcasing best practices for modular design in AWS Glue.
Data Transformation: It includes Glue jobs that read data from S3, perform transformations (such as converting product names to uppercase and lowercase), and write the transformed data back to S3.
Infrastructure as Code: The project utilizes Terraform to define and provision the necessary AWS infrastructure, including IAM roles, S3 buckets, and Glue jobs, ensuring that the setup is reproducible and manageable.
Utility Functions: The project contains utility functions for data manipulation, which can be reused in different Glue jobs, promoting code reusability and maintainability.
Testing: Unit tests are included to validate the functionality of the utility functions, ensuring reliability and correctness in data transformations.
Development Environment: A requirements-dev.txt file is provided for managing development dependencies, facilitating an easy setup for development and testing.

Requirements

Python 3.8 or higher
Apache Spark
AWS Glue
Terraform

Development Requirements

To install the development requirements, run:

   pip install -r requirements-dev.txt

Usage

Set Up AWS Infrastructure: Use Terraform to set up the necessary AWS resources. Navigate to the infra/aws directory and run:
```
terraform init
terraform apply
```
Run Glue Jobs: After setting up the infrastructure, you can run the Glue jobs defined in the code/aws_glue directory. Ensure that the necessary data is available in the specified S3 buckets.
Testing: To run the tests, use pytest:
```
pytest
```

Contributing

Contributions are welcome! Please feel free to submit a pull request or open an issue for any enhancements or bug fixes.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
code		code
infra/aws		infra/aws
.gitignore		.gitignore
README.md		README.md
requirements-dev.txt		requirements-dev.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AWS Glue Reusable Module Project

Overview

Requirements

Development Requirements

Usage

Contributing

License

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

DataReply/aws-glue-medium

Folders and files

Latest commit

History

Repository files navigation

AWS Glue Reusable Module Project

Overview

Requirements

Development Requirements

Usage

Contributing

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages