Skip to content

Latest commit

 

History

History
19 lines (16 loc) · 966 Bytes

File metadata and controls

19 lines (16 loc) · 966 Bytes

TODOs for Spark-Cluster Project

🔧 Major Tasks

  • Unable to view the DataFrame & Storage tabs in the Spark UI
  • Cannot navigate from the Master UI to the Worker UIs
  • Unable to analyse the job DAG / diagram in the Spark UI

Enhancements & Improvements

  • Switch caching strategy to MEMORY_AND_DISK and inspect Storage tab
  • Write sample PySpark streaming job and inspect Streaming tab (if available)
  • Add Jenkins/GitHub Actions CI to build Docker image, run basic job
  • Prepare automated test job that fails on invalid configs
  • Add resource-allocation tuning (cores, memory) and document best practices

Completed Tasks ✓

  • Initialise Dockerfile with Python 3.11 + Spark 3.4.0
  • Created entrypoint script for master/worker/history modes
  • Basic Compose setup with master, worker, history services
  • Mounted apps folder (./spark_apps) and data folder (./book_data)