Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[0.3.8-alpha] - 2024-06-03

Changed

Podcast notes timestamps for each title

[0.3.7-alpha] - 2024-05-24

Fixed

Author first and last names reversed
Podcast scripts double author names

[0.3.6-alpha] - 2024-05-24

Added

Podcasts generated automatically daily
Podcast model

Removed

Podcasts audio generation manual trigger

[0.3.5-alpha] - 2024-06-01

Changed

Persist summaries uses DAOs to store records

[0.3.4-alpha] - 2024-05-18

Added

Processing DAG persists arXiv research to S3 and neo4j
neo4j seeding script refactored to DAG

Removed

arXiv research processing Lambdas

[0.3.3-alpha] - 2024-05-16

Changed

Use single host in dev for orchestration and neo4j

[0.3.2-alpha] - 2024-05-04

Added

Daily arXiv research DAG publishes to Kafka

Changed

Orchestration host init and service deployments re-factored for modularity

[0.3.1-alpha] - 2024-04-30

Added

Kafka single node cluster
UI for Apache Kafka

[0.3.0-alpha] - 2024-04-29

Phase 3 architecture started

Added

Apache Airflow orchestrates data pipeline
Pipeline can ingest multiple arXiv sets (i.e. computer science, physics, etc.)

Changed

arXiv summaries are fetched by Airflow DAG

Removed

fetch_daily_arxiv_summaries lambda

[0.2.1-alpha] - 2024-04-23

Phase 2 architecture complete

Added

ETL works for arXiv records using Lambdas
DAO/models for all major ETL pipeline entities

Changed

Migrated to neo4j for research data and data lineage

Removed

RDS

[0.2.0-alpha] - 2024-02-05

Added

Working prototype from Jupyter notebook
Integrated with RDS and S3
Research summary themes generated by OpenAI
research_fetch_status Lambda implemented

[0.0.1-alpha] - 2023-11-11

Added

Initial project setup
Phase 1 core infrastructure