Add OpenShift deployment for running Claude Code RCA skill#29
Add OpenShift deployment for running Claude Code RCA skill#29Shreyanand wants to merge 1 commit intoredhat-et:mainfrom
Conversation
Dockerfile, pod spec, secret templates, and eval runner script for deploying the RCA skill on OpenShift with Vertex AI backend. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
📝 WalkthroughWalkthroughThese changes introduce deployment infrastructure for a Claude RCA evaluation system, including a Docker image, Kubernetes pod manifest with init container setup, an evaluation execution script, secrets configuration template, and gitignore exclusions for sensitive deployment files. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@deploy/Dockerfile`:
- Around line 17-20: Remove the zero-byte placeholder created by touch for
/workspace/.config/gcloud/application_default_credentials.json so
GOOGLE_APPLICATION_CREDENTIALS does not point to an empty file; update the
Dockerfile to stop creating that file (remove the touch command) and instead
rely on the init container to stage the real secret or make the init container
validate the source secret and fail fast if the credential file is missing or
empty (reference the touch of
/workspace/.config/gcloud/application_default_credentials.json and the init
container that populates /workspace/.config/gcloud).
In `@deploy/pod.yaml`:
- Line 11: Replace the non-reproducible image tag "claude-rca:latest" used by
both the init container and main container in deploy/pod.yaml with a fixed,
immutable tag (e.g., a commit SHA or semantic version) so redeploys are
deterministic; update the image fields that currently reference
image-registry.openshift-image-registry.svc:5000/claude-ci-test/claude-rca:latest
to use the chosen pinned tag (for example claude-rca:<commit-sha> or
claude-rca:v1.2.3) and ensure your CI/build pipeline injects that exact tag into
the manifest for both containers.
- Around line 7-82: The pod currently lacks explicit securityContext so it
relies on cluster SCC; add a pod-level securityContext and matching
container-level securityContext for the initContainer "setup-ssh" and the main
container "claude": set runAsNonRoot: true, allowPrivilegeEscalation: false,
securityContext.capabilities.drop: ["ALL"], and
securityContext.seccompProfile.type: RuntimeDefault (apply the same block to
both the pod and each container's securityContext to ensure portability and
satisfy Checkov/Trivy).
In `@deploy/run-eval.sh`:
- Line 43: The oc wait call using POD_NAME can fail immediately if the pod
doesn't exist; add a pre-check before the oc wait invocation that tests
existence (e.g., run oc get pod "${POD_NAME}" and, if it returns non-zero, print
a clear message like "Pod ${POD_NAME} not found; apply deploy/pod.yaml first"
and exit non-zero) so the script does not abort with an opaque "not found" error
under set -e; modify the block around the oc wait "pod/${POD_NAME}"
--for=condition=Ready --timeout=60s to perform this guard and only call oc wait
when the pod exists.
- Around line 40-53: The script dangerously injects unescaped values into remote
shell commands (PROMPT, BRANCH, REPO_URL and ALLOWED_TOOLS) when calling oc exec
and claude; fix by avoiding direct interpolation into the single-quoted bash -c
string: pass PROMPT via a safe channel (write it into the pod and read it there,
or pass it via oc exec --env if supported) and have claude read the prompt from
a file or stdin (e.g., claude -p "$(cat /tmp/prompt)"); ensure git clone uses
safely quoted/terminated args for BRANCH and REPO_URL (use --branch "$BRANCH"
and git clone -- "$REPO_URL" or otherwise shell-escape/quote BRANCH/REPO_URL
before executing inside the pod); and build ALLOWED_TOOLS into a single properly
quoted argument (e.g., join with spaces into one quoted string) so oc
exec/claude never receives untrusted raw shell-expanded input.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 9e0108fe-bce8-4b72-a89a-2176c81539a7
📒 Files selected for processing (5)
.gitignoredeploy/Dockerfiledeploy/pod.yamldeploy/run-eval.shdeploy/secrets.yaml.template
| RUN mkdir -p /workspace/.config/gcloud /workspace/.claude \ | ||
| && touch /workspace/.config/gcloud/application_default_credentials.json \ | ||
| && chmod -R g+rwX /workspace \ | ||
| && chgrp -R 0 /workspace |
There was a problem hiding this comment.
Empty placeholder credentials file may mask misconfiguration.
touching an empty application_default_credentials.json in the image means that if the init container's credential copy ever fails silently or the secret volume is misconfigured, GOOGLE_APPLICATION_CREDENTIALS will point to a zero-byte file and Vertex auth errors may be confusing to diagnose. Consider omitting the touch so the file is only present when the init container actually stages it, or have the init container fail fast if the source secret file is missing/empty.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@deploy/Dockerfile` around lines 17 - 20, Remove the zero-byte placeholder
created by touch for
/workspace/.config/gcloud/application_default_credentials.json so
GOOGLE_APPLICATION_CREDENTIALS does not point to an empty file; update the
Dockerfile to stop creating that file (remove the touch command) and instead
rely on the init container to stage the real secret or make the init container
validate the source secret and fail fast if the credential file is missing or
empty (reference the touch of
/workspace/.config/gcloud/application_default_credentials.json and the init
container that populates /workspace/.config/gcloud).
| spec: | ||
| restartPolicy: Never | ||
| initContainers: | ||
| - name: setup-ssh | ||
| image: image-registry.openshift-image-registry.svc:5000/claude-ci-test/claude-rca:latest | ||
| command: ["bash", "-c"] | ||
| args: | ||
| - | | ||
| mkdir -p /workspace/.ssh /workspace/.claude /workspace/.config/gcloud | ||
| cp /tmp/ssh-secret/id_ed25519 /workspace/.ssh/id_ed25519 | ||
| chmod 600 /workspace/.ssh/id_ed25519 | ||
| cat > /workspace/.ssh/config <<EOF | ||
| Host ci-bastion | ||
| HostName $(cat /tmp/ssh-secret/bastion-hostname) | ||
| User $(cat /tmp/ssh-secret/bastion-username) | ||
| Port $(cat /tmp/ssh-secret/bastion-port) | ||
| IdentityFile /workspace/.ssh/id_ed25519 | ||
| StrictHostKeyChecking no | ||
| UserKnownHostsFile /dev/null | ||
|
|
||
| Host ci-jumpbox | ||
| HostName $(cat /tmp/ssh-secret/jumpbox-hostname) | ||
| User $(cat /tmp/ssh-secret/jumpbox-username) | ||
| Port $(cat /tmp/ssh-secret/jumpbox-port) | ||
| IdentityFile /workspace/.ssh/id_ed25519 | ||
| StrictHostKeyChecking no | ||
| UserKnownHostsFile /dev/null | ||
| EOF | ||
| chmod 600 /workspace/.ssh/config | ||
| cp /tmp/claude-settings/settings.json /workspace/.claude/settings.json | ||
| cp /tmp/gcp-creds/application_default_credentials.json /workspace/.config/gcloud/application_default_credentials.json | ||
| volumeMounts: | ||
| - name: ssh-secret | ||
| mountPath: /tmp/ssh-secret | ||
| readOnly: true | ||
| - name: claude-settings | ||
| mountPath: /tmp/claude-settings | ||
| readOnly: true | ||
| - name: gcp-creds | ||
| mountPath: /tmp/gcp-creds | ||
| readOnly: true | ||
| - name: workspace | ||
| mountPath: /workspace | ||
| containers: | ||
| - name: claude | ||
| image: image-registry.openshift-image-registry.svc:5000/claude-ci-test/claude-rca:latest | ||
| command: ["sleep", "infinity"] | ||
| env: | ||
| - name: CLAUDE_CODE_USE_VERTEX | ||
| value: "1" | ||
| - name: CLOUD_ML_REGION | ||
| valueFrom: | ||
| secretKeyRef: | ||
| name: claude-vertex-creds | ||
| key: CLOUD_ML_REGION | ||
| - name: ANTHROPIC_VERTEX_PROJECT_ID | ||
| valueFrom: | ||
| secretKeyRef: | ||
| name: claude-vertex-creds | ||
| key: ANTHROPIC_VERTEX_PROJECT_ID | ||
| - name: GOOGLE_APPLICATION_CREDENTIALS | ||
| value: "/workspace/.config/gcloud/application_default_credentials.json" | ||
| - name: HOME | ||
| value: "/workspace" | ||
| - name: CLAUDE_CODE_ACCEPT_TOS | ||
| value: "true" | ||
| volumeMounts: | ||
| - name: workspace | ||
| mountPath: /workspace | ||
| resources: | ||
| requests: | ||
| memory: "512Mi" | ||
| cpu: "500m" | ||
| limits: | ||
| memory: "2Gi" | ||
| cpu: "2" |
There was a problem hiding this comment.
Add an explicit securityContext (non-root, drop capabilities).
The pod and both containers rely entirely on the namespace's default SCC. On OpenShift with restricted-v2 this is usually fine, but explicitly declaring runAsNonRoot: true, allowPrivilegeEscalation: false, capabilities.drop: ["ALL"], and seccompProfile: RuntimeDefault makes the manifest portable across clusters/namespaces and satisfies the Checkov/Trivy findings without behavior change. readOnlyRootFilesystem: true is likely not viable here because npm/git may write outside /workspace, so leave that off.
Proposed addition
spec:
restartPolicy: Never
+ securityContext:
+ runAsNonRoot: true
+ seccompProfile:
+ type: RuntimeDefault
initContainers:
- name: setup-ssh
image: image-registry.openshift-image-registry.svc:5000/claude-ci-test/claude-rca:latest
+ securityContext:
+ allowPrivilegeEscalation: false
+ capabilities:
+ drop: ["ALL"]
command: ["bash", "-c"](Apply the same securityContext block to the claude container.)
🧰 Tools
🪛 Trivy (0.69.3)
[error] 51-82: Root file system is not read-only
Container 'claude' of Pod 'claude-rca-eval' should set 'securityContext.readOnlyRootFilesystem' to true
Rule: KSV-0014
(IaC/Kubernetes)
[error] 10-49: Root file system is not read-only
Container 'setup-ssh' of Pod 'claude-rca-eval' should set 'securityContext.readOnlyRootFilesystem' to true
Rule: KSV-0014
(IaC/Kubernetes)
[error] 10-49: Default security context configured
container claude-rca-eval in default namespace is using the default security context
Rule: KSV-0118
(IaC/Kubernetes)
[error] 51-82: Default security context configured
container claude-rca-eval in default namespace is using the default security context
Rule: KSV-0118
(IaC/Kubernetes)
[error] 7-100: Default security context configured
pod claude-rca-eval in default namespace is using the default security context, which allows root privileges
Rule: KSV-0118
(IaC/Kubernetes)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@deploy/pod.yaml` around lines 7 - 82, The pod currently lacks explicit
securityContext so it relies on cluster SCC; add a pod-level securityContext and
matching container-level securityContext for the initContainer "setup-ssh" and
the main container "claude": set runAsNonRoot: true, allowPrivilegeEscalation:
false, securityContext.capabilities.drop: ["ALL"], and
securityContext.seccompProfile.type: RuntimeDefault (apply the same block to
both the pod and each container's securityContext to ensure portability and
satisfy Checkov/Trivy).
| restartPolicy: Never | ||
| initContainers: | ||
| - name: setup-ssh | ||
| image: image-registry.openshift-image-registry.svc:5000/claude-ci-test/claude-rca:latest |
There was a problem hiding this comment.
Avoid :latest image tag for reproducible deploys.
Both the init container and main container pin to claude-rca:latest. Re-running the pod after a rebuild silently picks up a different image, which makes eval results non-reproducible and complicates rollback. Consider tagging images with a commit SHA or semantic version and referencing that tag here.
Also applies to: 52-52
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@deploy/pod.yaml` at line 11, Replace the non-reproducible image tag
"claude-rca:latest" used by both the init container and main container in
deploy/pod.yaml with a fixed, immutable tag (e.g., a commit SHA or semantic
version) so redeploys are deterministic; update the image fields that currently
reference
image-registry.openshift-image-registry.svc:5000/claude-ci-test/claude-rca:latest
to use the chosen pinned tag (for example claude-rca:<commit-sha> or
claude-rca:v1.2.3) and ensure your CI/build pipeline injects that exact tag into
the manifest for both containers.
| PROMPT="$1" | ||
|
|
||
| echo "==> Checking pod status..." | ||
| oc wait "pod/${POD_NAME}" --for=condition=Ready --timeout=60s | ||
|
|
||
| echo "==> Cloning repo (branch: ${BRANCH})..." | ||
| oc exec "${POD_NAME}" -- bash -c \ | ||
| "rm -rf /workspace/rhdp-rca-plugin && git clone --branch ${BRANCH} ${REPO_URL} /workspace/rhdp-rca-plugin" | ||
|
|
||
| echo "==> Running eval..." | ||
| oc exec "${POD_NAME}" -- bash -c \ | ||
| "cd /workspace/rhdp-rca-plugin && claude -p '${PROMPT}' \ | ||
| --allowedTools $(printf '"%s" ' "${ALLOWED_TOOLS[@]}") \ | ||
| --output-format json" |
There was a problem hiding this comment.
Command injection risk via $PROMPT, ${BRANCH}, and ${REPO_URL}.
PROMPT is interpolated directly into a single-quoted remote bash -c string at Line 51. A prompt containing a single quote (e.g. "Why did Alice's job fail?") will break out of the quoting and, at best, cause a syntax error — at worst, execute attacker-controlled shell in the pod. ${BRANCH} and ${REPO_URL} at Line 47 are similarly spliced unquoted into the remote shell. Given this script is intended for eval workflows where prompts may come from datasets or CI inputs, this is a realistic injection vector.
Prefer passing the prompt via stdin (or an env var set through oc exec --env / a staged file) and quoting the git args, e.g.:
Proposed fix
-echo "==> Cloning repo (branch: ${BRANCH})..."
-oc exec "${POD_NAME}" -- bash -c \
- "rm -rf /workspace/rhdp-rca-plugin && git clone --branch ${BRANCH} ${REPO_URL} /workspace/rhdp-rca-plugin"
-
-echo "==> Running eval..."
-oc exec "${POD_NAME}" -- bash -c \
- "cd /workspace/rhdp-rca-plugin && claude -p '${PROMPT}' \
- --allowedTools $(printf '"%s" ' "${ALLOWED_TOOLS[@]}") \
- --output-format json"
+echo "==> Cloning repo (branch: ${BRANCH})..."
+oc exec "${POD_NAME}" -- bash -s -- "${BRANCH}" "${REPO_URL}" <<'EOF'
+set -euo pipefail
+branch="$1"; repo="$2"
+rm -rf /workspace/rhdp-rca-plugin
+git clone --branch "$branch" "$repo" /workspace/rhdp-rca-plugin
+EOF
+
+echo "==> Running eval..."
+# Pass prompt via env var so it is never interpreted by the shell.
+oc exec -i "${POD_NAME}" --env="CLAUDE_PROMPT=${PROMPT}" -- bash -s -- \
+ "${ALLOWED_TOOLS[@]}" <<'EOF'
+set -euo pipefail
+cd /workspace/rhdp-rca-plugin
+claude -p "$CLAUDE_PROMPT" --allowedTools "$@" --output-format json
+EOFNote: oc exec --env support depends on your oc version; alternatively write the prompt to a file inside the pod via oc cp / stdin and read it with claude -p "$(cat /tmp/prompt)".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@deploy/run-eval.sh` around lines 40 - 53, The script dangerously injects
unescaped values into remote shell commands (PROMPT, BRANCH, REPO_URL and
ALLOWED_TOOLS) when calling oc exec and claude; fix by avoiding direct
interpolation into the single-quoted bash -c string: pass PROMPT via a safe
channel (write it into the pod and read it there, or pass it via oc exec --env
if supported) and have claude read the prompt from a file or stdin (e.g., claude
-p "$(cat /tmp/prompt)"); ensure git clone uses safely quoted/terminated args
for BRANCH and REPO_URL (use --branch "$BRANCH" and git clone -- "$REPO_URL" or
otherwise shell-escape/quote BRANCH/REPO_URL before executing inside the pod);
and build ALLOWED_TOOLS into a single properly quoted argument (e.g., join with
spaces into one quoted string) so oc exec/claude never receives untrusted raw
shell-expanded input.
| PROMPT="$1" | ||
|
|
||
| echo "==> Checking pod status..." | ||
| oc wait "pod/${POD_NAME}" --for=condition=Ready --timeout=60s |
There was a problem hiding this comment.
oc wait fails if the pod does not yet exist.
oc wait pod/${POD_NAME} --for=condition=Ready returns a non-zero "not found" error immediately (and set -e aborts the script) when the pod has not been created yet. Consider either documenting that the pod must already be applied, or guarding with oc get pod "${POD_NAME}" >/dev/null 2>&1 || { echo "Pod ${POD_NAME} not found; apply deploy/pod.yaml first"; exit 1; } before the wait.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@deploy/run-eval.sh` at line 43, The oc wait call using POD_NAME can fail
immediately if the pod doesn't exist; add a pre-check before the oc wait
invocation that tests existence (e.g., run oc get pod "${POD_NAME}" and, if it
returns non-zero, print a clear message like "Pod ${POD_NAME} not found; apply
deploy/pod.yaml first" and exit non-zero) so the script does not abort with an
opaque "not found" error under set -e; modify the block around the oc wait
"pod/${POD_NAME}" --for=condition=Ready --timeout=60s to perform this guard and
only call oc wait when the pod exists.
Dockerfile, pod spec, secret templates, and eval runner script for deploying the RCA skill on OpenShift with Vertex AI backend.
Summary by CodeRabbit