StartSmart is a modular, scalable, and cloud-native event processing pipeline built entirely on AWS. It ingests raw event data through AWS Lambda and delivers it into a partitioned S3-based data lake using Amazon Kinesis Firehose, with support for real-time querying via AWS Athena.
- Supports real-time and historical queries
- Strictly serverless – no EC2, ECS, or containers
-
Real-time Event Ingestion: Events sent through Lambda directly into Firehose.
-
Dynamic Partitioning: Firehose writes to S3 with partitioning based on year, month, and day fields extracted from the event JSON.
-
Schema Discovery: AWS Glue crawler scans and updates the schema automatically.
-
Athena Queries: SQL-based access to S3 data with support for partitions and visualisation using QuickSight.
-
Secure IAM Roles: Granular policies for least-privilege access.
- Terraform – Infrastructure as Code
- AWS Lambda – Event ingestion, Athena querying
- AWS Kinesis Firehose – Streaming delivery to S3
- Amazon S3 – Storage for raw and partitioned events
- AWS EventBridge - Scheduled triggers for glue crawler
- AWS Glue Crawler – Metadata extraction and table creation
- Amazon Athena – Query engine for the S3 data lake
- IAM – Secure role-based access control
- CloudWatch – Monitoring and logs
{
"title": "Test purchase",
"description": "Triggered event type: purchase",
"endTime": "2025-07-30T20:06:21.596Z",
"startTime": "2025-07-30T19:06:21.596Z",
}- A frontend sends an event via an API call.
- The Lambda ingests the payload and sends it to Firehose.
- Firehose extracts the date and stores it in S3 under a partitioned path.
- Glue Crawler runs (scheduled by EventBridge) and updates the Athena table.
- Scheduled Athena queries can now be run or visualized via QuickSight.
- AWS Credentials with valid permissions to deploy infrastructure.
- HashiCorp Terraform installed.
- Clone the repository
- Navigate to
terraform/ - Rename the file
example_terraform.tfvarstoterraform.tfvarsand provide values for required variables such as region, bucket name, and Athena settings. - (Optional) Modify the "query" variable in
./src/athena/lambda.pyto your use-case, then make sure to runterraform applyagain. - Run the following one after another:
terraform init # Setups the backend for terraformterraform plan # Lists out resources to be provisionedterraform apply # Provisions the resources in your aws accountYou have now successfully deployed your infrastructure. You can now create insightful dashboards in QuickSight based on your Athena query results.
- Optimise Glue Crawler Recrawl policy to reduce cost.
Contributions, issues, and feature requests are welcome! Feel free to open a PR or submit an issue.
This project is licensed under the MIT License. See LICENSE for more information.
Built with ❤️ by Ofor David Tochukwu Open to feedback and collaboration. My Email: [email protected]
