Skip to content

Conversation

@Allda
Copy link
Collaborator

@Allda Allda commented Oct 17, 2025

A new backup controller orchestrates a backup process for workspace PVC. A new configuration option is added to DevWorkspaceOperatorConfig that enables running regular cronjob that is responsible for backup mechanism. The job executes following steps:

  • Find a workspaces
  • Finds out that workspace has been recently stopped
  • Detect a workspace PVC
  • Execute a job in the same namespace that does the backup

The last step is currently not fully implemented as it requires running a buildah inside the container and it will be delivered as a separate feature.

Issue: eclipse-che/che#23570

What does this PR do?

What issues does this PR fix or reference?

Is it tested? How?

The feature has been tested locally and using integration tests. Following configuration should be added to the config to enable this feature:

config:                                                                         
  workspace:                                                                    
    backupCronJob:                                                              
      enable: true                                                              
      registry: kind-registry:5000/backup                                       
      schedule: '* * * * *'

After a config is added, stop any workspace and wait till a backup job is created.

$ kubectl get jobs
devworkspace-backup-2l679   Running    0/1           138m       138m
devworkspace-backup-2xvgl   Running    0/1           139m       139m
devworkspace-backup-45vxb   Running    0/1           145m       145m

The job creates a backup and push image to registry

+ set -e
+ exec /workspace-recovery.sh --backup
+ set -e
+ for i in "$@"
+ case $i in
+ backup
+ BACKUP_IMAGE=kind-registry:5000/backup/backup-default-common-pvc-test:latest
++ buildah from scratch
+ NEW_IMAGE=working-container
+ buildah copy working-container /workspace/workspacedfd9f53065ea452c//projects /
f099c09f924cf051a01d78cd34ca87a4c161d7c217df5ac627e90e66926fbe9f
+ buildah config --label DEVWORKSPACE=common-pvc-test working-container
+ buildah config --label NAMESPACE=default working-container
+ buildah commit working-container kind-registry:5000/backup/backup-default-common-pvc-test:latest
Getting image source signatures
Copying blob sha256:137b2a0909654325b7eff0a9dfe623e5abdc685c4d6ad8e4c8d163e0984cf805
Copying config sha256:86693ca728855121a4dce059d91c6c9a196b4611fea4cb17d7b38015310cf193
Writing manifest to image destination
86693ca728855121a4dce059d91c6c9a196b4611fea4cb17d7b38015310cf193
+ buildah umount working-container
+ buildah push --tls-verify=false kind-registry:5000/backup/backup-default-common-pvc-test:latest
Getting image source signatures
Copying blob sha256:137b2a0909654325b7eff0a9dfe623e5abdc685c4d6ad8e4c8d163e0984cf805
Copying config sha256:86693ca728855121a4dce059d91c6c9a196b4611fea4cb17d7b38015310cf193
Writing manifest to image destination
stream closed: EOF for default/devworkspace-backup-zjzk5-82psq (backup-workspace)

PR Checklist

  • E2E tests pass (when PR is ready, comment /test v8-devworkspace-operator-e2e, v8-che-happy-path to trigger)
    • v8-devworkspace-operator-e2e: DevWorkspace e2e test
    • v8-che-happy-path: Happy path for verification integration with Che

@openshift-ci
Copy link

openshift-ci bot commented Oct 17, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Allda
Once this PR has been reviewed and has the lgtm label, please assign dkwon17 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Allda Allda force-pushed the 23570 branch 2 times, most recently from 42dd45c to dffd7e6 Compare October 17, 2025 11:06
@rohanKanojia
Copy link
Member

@Allda : Really appreciate you taking the time to contribute this in such a short time. 🎉

Could you please also fill out the “Is it tested? How?” section in the PR template? It’ll help reviewers and future contributors verify the change more easily.

Thanks again for your effort! 🙌

@rohanKanojia
Copy link
Member

I tested this PR and it seems to work.

  1. Created DevWorkspaceOperatorConfig with this BackupCronJobConfig (backup every 3 minutes)
config:
  workspace:
    backupCronJob:
      enable: true
      schedule: "*/3 * * * *"
  1. Created a DevWorkspace and wait for it to get running
  2. Stopped workspace
  3. Controller detected stopped workspace and started creating jobs for backups:
NAME               STATUS    COMPLETIONS   DURATION   AGE
backup-job-8tnsp   Running   0/1                      0s
backup-job-8tnsp   Running   0/1           0s         0s
backup-job-8tnsp   Running   0/1           16s        16s
backup-job-8tnsp   Running   0/1           17s        17s
backup-job-8tnsp   Running   0/1           18s        18s
backup-job-8tnsp   Complete   1/1           18s        18s
backup-job-kc8rm   Running    0/1                      0s
backup-job-kc8rm   Running    0/1           0s         0s
backup-job-kc8rm   Running    0/1           6s         6s
backup-job-kc8rm   Running    0/1           7s         7s
backup-job-kc8rm   Running    0/1           8s         8s
backup-job-kc8rm   Complete   1/1           8s         8s

@Allda Allda force-pushed the 23570 branch 3 times, most recently from 0bc74b1 to 8427ba5 Compare October 29, 2025 10:24
@Allda
Copy link
Collaborator Author

Allda commented Oct 29, 2025

/retest

@codecov
Copy link

codecov bot commented Nov 3, 2025

Codecov Report

❌ Patch coverage is 64.13043% with 165 lines in your changes missing coverage. Please review.
✅ Project coverage is 35.30%. Comparing base (d92e750) to head (2679783).
⚠️ Report is 16 commits behind head on main.

Files with missing lines Patch % Lines
...trollers/backupcronjob/backupcronjob_controller.go 71.95% 87 Missing and 19 partials ⚠️
apis/controller/v1alpha1/zz_generated.deepcopy.go 0.00% 43 Missing ⚠️
main.go 0.00% 9 Missing ⚠️
internal/images/image.go 0.00% 7 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1530      +/-   ##
==========================================
+ Coverage   34.09%   35.30%   +1.21%     
==========================================
  Files         160      161       +1     
  Lines       13348    13802     +454     
==========================================
+ Hits         4551     4873     +322     
- Misses       8487     8599     +112     
- Partials      310      330      +20     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@ibuziuk ibuziuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Allda great job!
discussed the overall PR with @dkwon17 and I believe we should target it to be merged in the DWO 0.39.0 version

@Allda
Copy link
Collaborator Author

Allda commented Nov 12, 2025

/retest

Allda added 12 commits November 14, 2025 09:54
A new backup controller orchestrates a backup process for workspace PVC.
A new configuration option is added to DevWorkspaceOperatorConfig that
enables running regular cronjob that is responsible for backup
mechanism. The job executes following steps:
- Find a workspaces
- Finds out that workspace has been recently stopped
- Detect a workspace PVC
- Execute a job in the same namespace that does the backup

The last step is currently not fully implemented as it requires running
a buildah inside the container and it will be delivered as a separate
feature.

Issue: eclipse-che/che#23570

Signed-off-by: Ales Raszka <[email protected]>
A backup of workspace is done using Buildah and storing a content of the
workspace PVC into a container image. The image is later stored in a
registry and can be used to recover data.

A prototype script was updated and stored under project-backup
directory and is build alongside the controller.

The backup job calls the script and execute following steps:
- mount a volume with workspace data
- build container image using buildah
- push image to registry configured by the operator admin

Signed-off-by: Ales Raszka <[email protected]>
A new sub-object was added to the operator config that reflect a current
status of the backup controller and stores a last time the backup was
executed. This value is used to determine whether a backup of the
workspace is needed or if it already has been executed.

Signed-off-by: Ales Raszka <[email protected]>
A backup job use a PVC name from a default value or from the config if
user configured custom name.

Signed-off-by: Ales Raszka <[email protected]>
The backup job can now push to registries which requires auth token. The
token is provided as a secret in operator namespace and added to the
operator config.

Signed-off-by: Ales Raszka <[email protected]>
A backup job now determines the name of pvc based on used storage type.
It distinguish between different storage types (common and per-workspace) and
mount the volume dynamically.

Signed-off-by: Ales Raszka <[email protected]>
It turns out the capabilities from the prototype are not needed.

Signed-off-by: Ales Raszka <[email protected]>
A new SA is created for the backup jobs to limit the permission to just
what is necessary.

Signed-off-by: Ales Raszka <[email protected]>
- Make registry field required
- Replace custom bool comparison with library
- Minor tweeks

Signed-off-by: Ales Raszka <[email protected]>
Use single logger across the controller and only add context if needed.

Signed-off-by: Ales Raszka <[email protected]>
@dkwon17
Copy link
Collaborator

dkwon17 commented Nov 14, 2025

@Allda maybe I'm missing something but I am getting this error:

LAST SEEN   TYPE      REASON                  OBJECT                                                    MESSAGE
25s         Warning   FailedCreate            job/devworkspace-backup-29mb2                             Error creating: pods "devworkspace-backup-29mb2-" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, provider restricted-v2: .spec.securityContext.fsGroup: Invalid value: []int64{0}: 0 is not an allowed group, provider restricted-v2: .containers[0].runAsUser: Invalid value: 0: must be in the ranges: [1000870000, 1000879999], provider "restricted": Forbidden: not usable by user or serviceaccount, provider "container-build": Forbidden: not usable by user or serviceaccount, provider "user-namespace": Forbidden: not usable by user or serviceaccount, provider "nonroot-v2": Forbidden: not usable by user or serviceaccount, provider "nonroot": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid-v2": Forbidden: not usable by user or serviceaccount, provider "machine-api-termination-handler": Forbidden: not usable by user or serviceaccount, provider "hostnetwork-v2": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "hostpath-provisioner-csi": Forbidden: not usable by user or serviceaccount, provider "insights-runtime-extractor-scc": Forbidden: not usable by user or serviceaccount, provider "node-exporter": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount]

This is my DWOC:

apiVersion: controller.devfile.io/v1alpha1
config:
  workspace:
    backupCronJob:
      enable: true
      registry: quay.io/dkwon17/test
      registryAuthSecret: quay-credentials
      schedule: '* * * * *'
kind: DevWorkspaceOperatorConfig
metadata:
  name: devworkspace-operator-config
  namespace: openshift-operators

Any ideas?

@Allda
Copy link
Collaborator Author

Allda commented Nov 18, 2025

@Allda maybe I'm missing something but I am getting this error:

LAST SEEN   TYPE      REASON                  OBJECT                                                    MESSAGE
25s         Warning   FailedCreate            job/devworkspace-backup-29mb2                             Error creating: pods "devworkspace-backup-29mb2-" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, provider restricted-v2: .spec.securityContext.fsGroup: Invalid value: []int64{0}: 0 is not an allowed group, provider restricted-v2: .containers[0].runAsUser: Invalid value: 0: must be in the ranges: [1000870000, 1000879999], provider "restricted": Forbidden: not usable by user or serviceaccount, provider "container-build": Forbidden: not usable by user or serviceaccount, provider "user-namespace": Forbidden: not usable by user or serviceaccount, provider "nonroot-v2": Forbidden: not usable by user or serviceaccount, provider "nonroot": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid-v2": Forbidden: not usable by user or serviceaccount, provider "machine-api-termination-handler": Forbidden: not usable by user or serviceaccount, provider "hostnetwork-v2": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "hostpath-provisioner-csi": Forbidden: not usable by user or serviceaccount, provider "insights-runtime-extractor-scc": Forbidden: not usable by user or serviceaccount, provider "node-exporter": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount]

This is my DWOC:

apiVersion: controller.devfile.io/v1alpha1
config:
  workspace:
    backupCronJob:
      enable: true
      registry: quay.io/dkwon17/test
      registryAuthSecret: quay-credentials
      schedule: '* * * * *'
kind: DevWorkspaceOperatorConfig
metadata:
  name: devworkspace-operator-config
  namespace: openshift-operators

Any ideas?

Could you please share more details about the created Job? Are there any events or logs? I tried it today with a similar config (only difference is registry location) and the push Job passed.

@dkwon17
Copy link
Collaborator

dkwon17 commented Nov 18, 2025

Here is the job's yaml: job-devworkspace-backup-29mb2.yaml

There weren't any pod logs (since the pod never started) but only a lot of the Error creating: pods "devworkspace-backup-29mb2-" is forbidden: unable to validate against any security context constraint: ... events:

Screen.Recording.2025-11-18.at.3.59.33.PM.mov

Output of oc get events -n admin-devspaces: events.log

A registry configuration is now stored under a separated nested struct.

Signed-off-by: Ales Raszka <[email protected]>
A SA is created for every backup workspace to avoid ownership conflict.

Signed-off-by: Ales Raszka <[email protected]>
A switching a based image to podman allowed us to run a backup job as a
regular user 1000 without any privileged escalation.

Signed-off-by: Ales Raszka <[email protected]>
@openshift-ci
Copy link

openshift-ci bot commented Nov 19, 2025

@Allda: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/v14-devworkspace-operator-e2e efe0538 link true /test v14-devworkspace-operator-e2e

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@rohanKanojia
Copy link
Member

rohanKanojia commented Nov 20, 2025

@Allda :

Edit: I was able to resolve this issue by granting additional permissions to ServiceAccount , however I'm not sure whether this is required step or some issue:

oc adm policy add-scc-to-user anyuid -z devworkspace-job-runner-workspacea5357a3c22ce497a

After updating ServiceAccount permissions, I'm able to see backup job getting executed and creating images on the configured registry:

Screenshot 2025-11-20 at 20 20 56

Question: Will the backup image will be platform dependent? I see only linux/amd64 arch being pushed for it. My cluster was also linux/amd64.


I'm also facing the same issue as David. I was trying to test your changes on CRC.

Environment:

OS: Linux

CRC Version:

CRC version: 2.53.0+a6f712
OpenShift version: 4.19.3
MicroShift version: 4.19.0

Steps to Reproduce:

  1. Install DWO based on your PR changes
  2. Edit DevWorkspaceOperatorConfig to add backup config
  config:
    workspace:
      backupCronJob:
        enable: true
        registry:
          authSecret: dockerhub-push-secret
          path: docker.io/rohankanojia
        schedule: '* * * * *'
  1. Create Registry Auth secret in openshift-operators namespace
  2. Create DevWorkspace
# run From DWO root dir
oc create -f samples/code-latest.yaml
  1. Stop DevWorkspace
oc patch devworkspace code-latest \
        --type=merge \
        -p '{"spec": {"started": false}}'
  1. I was able to see job getting created, however it never got ready
oc get jobs -w                                                                                                          ─╯
NAME                        STATUS    COMPLETIONS   DURATION   AGE
devworkspace-backup-ntzf7   Running   0/1           34s        34s

Upon checking details I see this error:

oc describe job                                                                                                         ─╯
Name:             devworkspace-backup-ntzf7
Namespace:        rokumar-dev
Selector:         batch.kubernetes.io/controller-uid=14a22818-4416-4f52-b387-2cc0dd2e8df9
Labels:           controller.devfile.io/backup-job=true
                  controller.devfile.io/devworkspace_id=workspace388fc6409bc642e7
Annotations:      <none>
Controlled By:    DevWorkspace/code-latest
Parallelism:      1
Completions:      1
Completion Mode:  NonIndexed
Suspend:          false
Backoff Limit:    6
Start Time:       Thu, 20 Nov 2025 17:35:00 +0530
Pods Statuses:    0 Active (0 Ready) / 0 Succeeded / 0 Failed
Pod Template:
  Labels:           batch.kubernetes.io/controller-uid=14a22818-4416-4f52-b387-2cc0dd2e8df9
                    batch.kubernetes.io/job-name=devworkspace-backup-ntzf7
                    controller-uid=14a22818-4416-4f52-b387-2cc0dd2e8df9
                    job-name=devworkspace-backup-ntzf7
  Annotations:      io.kubernetes.cri-o.Devices: /dev/fuse
  Service Account:  devworkspace-job-runner-workspace388fc6409bc642e7
  Containers:
   backup-workspace:
    Image:      quay.io/devfile/project-backup:next
    Port:       <none>
    Host Port:  <none>
    Args:
      /workspace-recovery.sh
      --backup
    Environment:
      DEVWORKSPACE_NAME:             code-latest
      DEVWORKSPACE_NAMESPACE:        rokumar-dev
      WORKSPACE_ID:                  workspace388fc6409bc642e7
      BACKUP_SOURCE_PATH:            /workspace/workspace388fc6409bc642e7/projects
      DEVWORKSPACE_BACKUP_REGISTRY:  docker.io/rohankanojia
      PODMAN_PUSH_OPTIONS:           --tls-verify=false
      REGISTRY_AUTH_FILE:            /home/podman/.docker/.dockerconfigjson
    Mounts:
      /home/podman/.docker from registry-auth-secret (ro)
      /var/lib/containers from build-storage (rw)
      /workspace from workspace-data (rw)
  Volumes:
   workspace-data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  claim-devworkspace
    ReadOnly:   false
   build-storage:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
   registry-auth-secret:
    Type:          Secret (a volume populated by a Secret)
    SecretName:    devworkspace-backup-registry-auth
    Optional:      false
  Node-Selectors:  <none>
  Tolerations:     <none>
Events:
  Type     Reason        Age               From            Message
  ----     ------        ----              ----            -------
  Warning  FailedCreate  8s (x6 over 39s)  job-controller  Error creating: pods "devworkspace-backup-ntzf7-" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, provider restricted-v2: .containers[0].runAsUser: Invalid value: 1000: must be in the ranges: [1000660000, 1000669999], provider "restricted": Forbidden: not usable by user or serviceaccount, provider "nonroot-v2": Forbidden: not usable by user or serviceaccount, provider "nonroot": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid-v2": Forbidden: not usable by user or serviceaccount, provider "machine-api-termination-handler": Forbidden: not usable by user or serviceaccount, provider "hostnetwork-v2": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "hostpath-provisioner": Forbidden: not usable by user or serviceaccount, provider "node-exporter": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount]

Perhaps pod creation is getting forbidden by OpenShift SecurityContextConstraints due to this:

							SecurityContext: &corev1.SecurityContext{
								RunAsUser: ptr.To[int64](1000),
							},

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants