Skip to content

Add documentation for Zeppelin with Spark on Kubernetes #21

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open

Add documentation for Zeppelin with Spark on Kubernetes #21

wants to merge 9 commits into from

Conversation

echarles
Copy link
Member

First draft documentation to further discuss and prepare the WIP for Zeppelin with Spark on Kubernetes.

@erikerlandson
Copy link
Member

Is the transparent background a potential problem? Presumably fine against a white background, but could that change?

@erikerlandson
Copy link
Member

I can't pull the netlify link up - does anybody else have that issue?

@erikerlandson
Copy link
Member

The doc looks good - I am wondering if we should include this while it is experimental. Or somehow tag this doc as experimental. @foxish what do you think?

@foxish
Copy link
Member

foxish commented Nov 16, 2017

I like the idea of marking as experimental and getting it out. It would help us garner feedback. If someone can verify the working of the tutorial in its current state, we can go ahead.

@foxish
Copy link
Member

foxish commented Nov 16, 2017

@echarles, would you be open to demo-ing this at next week's SIG meeting? It would help a lot of us understand where this effort is at.
cc/ @felixcheung

Copy link

@felixcheung felixcheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's cool to document this.
Although it could be hard to maintain if we reference ongoing PRs

> At the time being, the needed code is not integrated in the `master` branches of `apache-zeppelin` nor the `apache-spark-on-k8s/spark` repositories.
> You are welcome to already ty it out and send any feedback and question.

Firs things firs, you have to choose the following modes in which you will run Zeppelin with Spark on Kubernetes:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

For now, to be able to test these combinations, you need to build specific branches (see hereafter) or to use third-party Helm charts or Docker images. The needed branches and related PR are listed here:

1. Spark-k8s driven branch: In-cluster client mode [see pull request #456](https://github.com/apache-spark-on-k8s/spark/pull/456)
2. Apache Zeppeoin driven branch: Add support to run Spark interpreter on a Kubernetes cluster [see pull request #2637](https://github.com/apache/zeppelin/pull/2637)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zeppelin?
what is driven branch?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to point where this branch resides... I have remove that to avoid confusion.


![In-Cluster with Spark-Client](/img/zeppelin_in-cluster_spark-client.png "In-Cluster with Spark-Client")

Build a new Zepplin based on [#456 In-cluster client mode](https://github.com/apache-spark-on-k8s/spark/pull/456).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zeppelin, extra space

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


![In-Cluster with Spark-Cluster](/img/zeppelin_in-cluster_spark-cluster.png "In-Cluster with Spark-Cluster")

Build a new Zepplin based on [#2637 Spark interpreter on a Kubernetes](https://github.com/apache/zeppelin/pull/2637).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zeppelin

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one doesn't seem to be updated...?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done now.


Firs things firs, you have to choose the following modes in which you will run Zeppelin with Spark on Kubernetes:

+ The `Kubernetes modes`: Can be `in-cluster` (within a Pod) or `out-cluster` (from outside the Kubernetes cluster).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are the proper terminology in k8s world? is "out-cluster" the right term?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same question and from the already used/seen in-cluster, I have deduced 'out-cluster`. Happy to change to any other more official terminology.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@echarles
Copy link
Member Author

@felixcheung Thx a lot for your reviews (just pushed the fixes).
@erikerlandson I have pushed the 3 images with white backgrounds.

@foxish Happy to demo this during the next SIG meeting (22 Nov).

IMHO it is not bad to publish early docs if it the needed steps are clear (no release, need to build branches...) to get early-adopters feedbacks as much as possible.


Build a new Spark and their associated docker images based on [#2637 Spark interpreter on a Kubernetes](https://github.com/apache/zeppelin/pull/2637).

Once done, any vanilla Apache Zeppelin deployed in a Kubernetes Pod (your can use a Helm chart for this) will work out-of-the box with the following interpreter settings:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this helm chart for this (use a different image for a newer Zeppelin though)
https://github.com/kubernetes/charts/blob/master/stable/spark/templates/spark-zeppelin-deployment.yaml

shall we link it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a section at the end "how to test" and linked to the chart.

@erikerlandson
Copy link
Member

I am still OK with documentation, as long as it's clearly marked experimental

@echarles
Copy link
Member Author

@erikerlandson It is now documented as experimental at the beginning of the doc.

2. `in-cluster` with `spark-cluster` mode.
3. `out-cluster` with `spark-cluster` mode.

For now, to be able to test these combinations, you need to build specific branches (see hereafter) or to use third-party Helm charts or Docker images. The needed branches and related PR are listed here:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in the meeting today, we want to ensure that these branches merge before we can publish documentation.

cc @felixcheung @erikerlandson @liyinan926 @mccheah

@echarles
Copy link
Member Author

Should I close this one? Doesn't seem like it will be merged and we will move soon to apache repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants