Would something like Apache Beam, be a more modern way of doing the same Spark stuff but in an agnostic fashion? This would allow us to be less dependant on spark versions, which is a jar/packaging PITA and give users the option to run on a range of different engines: https://beam.apache.org/documentation/runners/capability-matrix/