Skip to content

Conversation

@Yikun
Copy link
Member

@Yikun Yikun commented May 4, 2023

What changes were proposed in this pull request?

This PR changes Dockerfile and workflow based on base image to save space by sharing layers by having one image from another.

After this PR:

  • The spark / PySpark / SparkR related files extract into base image
  • Install PySpark / SparkR deps in PySpark / SparkR images.
  • Add the base image build step
  • Apply changes to template: ./add-dockerfiles.sh 3.4.0 to make it work.
  • This PR didn't contain changes on 3.3.X Dockerfiles to make PR more clear, the 3.3.x changes will be a separate PR when we address all comments for 3.4.0.

[1] docker-library/official-images#13089

Why are the changes needed?

Address DOI comments, and also to save space by sharing layers by having one image from another.

Does this PR introduce any user-facing change?

No

How was this patch tested?

CI passed.

@Yikun Yikun marked this pull request as ready for review May 4, 2023 07:29
Comment on lines +68 to +70
mv python/pyspark /opt/spark/python/pyspark/; \
mv python/lib /opt/spark/python/lib/; \
mv R /opt/spark/; \
Copy link
Member Author

@Yikun Yikun May 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

download/extract spark (maybe keeping python and R files too? they seem relatively small compared to the rest)

This the key change:

2.0M	./lib
11M	./pyspark
5.6M	./R

Compare to complete docker image size (500-600MB), I think it can be accepted, otherwise we have to keep download/extract/move these step into Pyspark/R dockerfiles.

@Yikun
Copy link
Member Author

Yikun commented May 4, 2023

cc @HyukjinKwon @zhengruifeng

also cc @yosifkit This PR address question 1 of docker-library/official-images#13089

@Yikun Yikun closed this in 7f83637 May 6, 2023
@Yikun
Copy link
Member Author

Yikun commented May 6, 2023

@HyukjinKwon Thanks! Merged.

@zhengruifeng
Copy link

Late LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants