- Unable to view the DataFrame & Storage tabs in the Spark UI
- Cannot navigate from the Master UI to the Worker UIs
- Unable to analyse the job DAG / diagram in the Spark UI
- Switch caching strategy to
MEMORY_AND_DISKand inspect Storage tab - Write sample PySpark streaming job and inspect Streaming tab (if available)
- Add Jenkins/GitHub Actions CI to build Docker image, run basic job
- Prepare automated test job that fails on invalid configs
- Add resource-allocation tuning (cores, memory) and document best practices
- Initialise Dockerfile with Python 3.11 + Spark 3.4.0
- Created entrypoint script for master/worker/history modes
- Basic Compose setup with master, worker, history services
- Mounted apps folder (
./spark_apps) and data folder (./book_data)