-
Notifications
You must be signed in to change notification settings - Fork 468
[FLINK-37515] Basic support for Blue/Green deployments #969
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release-1.11
Are you sure you want to change the base?
[FLINK-37515] Basic support for Blue/Green deployments #969
Conversation
...tor-api/src/main/java/org/apache/flink/kubernetes/operator/api/bluegreen/DeploymentType.java
Show resolved
Hide resolved
...main/java/org/apache/flink/kubernetes/operator/api/status/FlinkBlueGreenDeploymentState.java
Show resolved
Hide resolved
...java/org/apache/flink/kubernetes/operator/controller/FlinkBlueGreenDeploymentController.java
Outdated
Show resolved
Hide resolved
...java/org/apache/flink/kubernetes/operator/controller/FlinkBlueGreenDeploymentController.java
Outdated
Show resolved
Hide resolved
I am concerned about the number of TODOs. Things like dealing with errors and timing conditions should be in place in the initial drop of code - unless we can mark this capability as beta or the like. WDYT @gyfora ? |
@davidradl thanks for your review/comments. I've cleaned up all the outdated/invalid TODOs which pointed me 1 missing unit test, added. At this point the few remaining TODOs do not interfere with the main functionality and are not critical/open for discussion. |
… BASIC is supported).Tests to confirm the deployments are deleted accurately after the specified deletion delay.
👍 |
when this feature gonna be released? |
Hi @hansh0801, trying to release it ASAP but I also want to make sure the logic and contracts are as robust as possible to minimize changes later. Looks pretty stable in my current testing and I've reopened it for review. Thanks! |
cool ! and I'm also waiting for blue green phase 2 :) |
@hansh0801 indeed... Phase 2 will take a little bit longer due to its complexity but already working on it. |
...tor-api/src/main/java/org/apache/flink/kubernetes/operator/api/bluegreen/DeploymentType.java
Outdated
Show resolved
Hide resolved
...tor-api/src/main/java/org/apache/flink/kubernetes/operator/api/bluegreen/TransitionMode.java
Outdated
Show resolved
Hide resolved
...src/main/java/org/apache/flink/kubernetes/operator/api/spec/FlinkDeploymentTemplateSpec.java
Outdated
Show resolved
Hide resolved
...ain/java/org/apache/flink/kubernetes/operator/api/status/FlinkBlueGreenDeploymentStatus.java
Outdated
Show resolved
Hide resolved
private long deploymentReadyTimestamp; | ||
|
||
/** Information about the TaskManagers for the scale subresource. */ | ||
private TaskManagerInfo taskManager; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to support this? I feel that the scale logic using TM count is not really used feature for regular FlinkDeployments either
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could remove it for now and only add it if there is a need
...java/org/apache/flink/kubernetes/operator/controller/FlinkBlueGreenDeploymentController.java
Outdated
Show resolved
Hide resolved
if (deploymentStatus == null) { | ||
deploymentStatus = new FlinkBlueGreenDeploymentStatus(); | ||
return patchStatusUpdateControl( | ||
bgDeployment, | ||
deploymentStatus, | ||
FlinkBlueGreenDeploymentState.INITIALIZING_BLUE, | ||
null) | ||
.rescheduleAfter(100); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The BlueGreenDeployment could simply implement initStatus and this can be removed completely.
...java/org/apache/flink/kubernetes/operator/controller/FlinkBlueGreenDeploymentController.java
Outdated
Show resolved
Hide resolved
...java/org/apache/flink/kubernetes/operator/controller/FlinkBlueGreenDeploymentController.java
Outdated
Show resolved
Hide resolved
...java/org/apache/flink/kubernetes/operator/controller/FlinkBlueGreenDeploymentController.java
Outdated
Show resolved
Hide resolved
...java/org/apache/flink/kubernetes/operator/controller/FlinkBlueGreenDeploymentController.java
Outdated
Show resolved
Hide resolved
...java/org/apache/flink/kubernetes/operator/controller/FlinkBlueGreenDeploymentController.java
Outdated
Show resolved
Hide resolved
...java/org/apache/flink/kubernetes/operator/controller/FlinkBlueGreenDeploymentController.java
Outdated
Show resolved
Hide resolved
private boolean hasSpecChanged( | ||
FlinkBlueGreenDeploymentSpec newSpec, FlinkBlueGreenDeploymentStatus deploymentStatus) { | ||
|
||
String lastReconciledSpec = deploymentStatus.getLastReconciledSpec(); | ||
String newSpecSerialized = SpecUtils.serializeObject(newSpec, "spec"); | ||
|
||
// TODO: in FLIP-504 check here the TransitionMode has not been changed | ||
|
||
return !lastReconciledSpec.equals(newSpecSerialized); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to use the DiffBuilder/ existing utilities to get the type of the spec change here and make sure we don't trigger deployments on non-upgrade changes.
Things like operator config changes etc should not go through the Blue/Green flow I believe but simply applied on the currently active deployment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, let me get familiar with the DiffBuilder
and see how to incorporate it
boolean deleted = | ||
deletedStatus.size() == 1 | ||
&& deletedStatus.get(0).getKind().equals("FlinkDeployment"); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to wait here until the deletion is actually finished (the object is not there anymore?) that would require some additional logic here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not introduce waiting logic. I immediately reschedule a reconciliation afterwards which will re-execute this portion if the deployment is not yet gone.
TRANSITIONING_TO_BLUE, | ||
|
||
/** Identifies the system is transitioning from "Blue" to "Green". */ | ||
TRANSITIONING_TO_GREEN, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what state are we in during shutdown?
...tes-operator-api/src/main/java/org/apache/flink/kubernetes/operator/api/utils/SpecUtils.java
Outdated
Show resolved
Hide resolved
return objectMapper.writeValueAsString(wrapper); | ||
} catch (JsonProcessingException e) { | ||
throw new RuntimeException( | ||
"Could not serialize " + wrapperKey + ", this indicates a bug...", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should be include object.toString() in the error also?
@@ -101,6 +103,7 @@ rules: | |||
- apiGroups: | |||
- flink.apache.org | |||
resources: | |||
- flinkbluegreendeployments/status | |||
- flinkdeployments/status |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this maybe a basic question. Are flinkbluegreendeployments a type of flinkdeployments. Would we expect to see them as flinkdeployments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the original idea but we shifted to a has-is
relationship rather than a is-a
one.
What is the purpose of the change
This pull request adds basic support for Blue/Green deployments as outlined by FLIP-503 (https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=337677648).
Brief change log
FlinkBlueGreenDeploymentController
controller, with the capability of managing the lifecycle of these deployments.FlinkBlueGreenDeploymentController
will manage this CRD and hide from the user the details of the actual Blue/Green (Active/StandBy) jobs.FlinkDeployment
controller.Verifying this change
Does this pull request potentially affect one of the following parts:
CustomResourceDescriptors
: noDocumentation
docs/content/docs