Skip to content

Add OpenShift deployment for running Claude Code RCA skill#29

Open
Shreyanand wants to merge 1 commit intoredhat-et:mainfrom
Shreyanand:openshift-runner
Open

Add OpenShift deployment for running Claude Code RCA skill#29
Shreyanand wants to merge 1 commit intoredhat-et:mainfrom
Shreyanand:openshift-runner

Conversation

@Shreyanand
Copy link
Copy Markdown
Member

@Shreyanand Shreyanand commented Apr 23, 2026

Dockerfile, pod spec, secret templates, and eval runner script for deploying the RCA skill on OpenShift with Vertex AI backend.

Summary by CodeRabbit

  • Chores
    • Added containerized deployment infrastructure including a Docker image configuration built on Node 20 with development tools and Claude support.
    • Introduced Kubernetes pod manifest for orchestrated container deployment, featuring SSH key management and Google Cloud credential staging capabilities.
    • Added automated evaluation execution script and secrets template for managing sensitive credentials and deployment configuration.

Dockerfile, pod spec, secret templates, and eval runner script
for deploying the RCA skill on OpenShift with Vertex AI backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 23, 2026

📝 Walkthrough

Walkthrough

These changes introduce deployment infrastructure for a Claude RCA evaluation system, including a Docker image, Kubernetes pod manifest with init container setup, an evaluation execution script, secrets configuration template, and gitignore exclusions for sensitive deployment files.

Changes

Cohort / File(s) Summary
Git Configuration
.gitignore
Excludes two OpenShift-related files (deploy/settings.json, deploy/secrets.yaml) from version control.
Container & Orchestration
deploy/Dockerfile, deploy/pod.yaml
New Dockerfile builds Node 20 slim image with Git, SSH, JSON, and Python tooling; installs @anthropic-ai/claude-code globally. New Kubernetes pod definition configures claude-rca-eval pod with init container for SSH/credential setup and main container with Vertex AI environment variables and mounted secrets.
Deployment Automation
deploy/run-eval.sh, deploy/secrets.yaml.template
New bash script waits for pod readiness, clones plugin repository, and executes Claude evaluation with tool allowlist and JSON output. New Kubernetes Secrets template provides base64 placeholders for Vertex credentials, SSH keys, and Claude configuration.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main purpose of the pull request—adding OpenShift deployment artifacts for the Claude Code RCA skill. It is specific, concise, and directly reflects the primary changes across all modified files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Shreyanand Shreyanand requested a review from taagarwa-rh April 23, 2026 14:27
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deploy/Dockerfile`:
- Around line 17-20: Remove the zero-byte placeholder created by touch for
/workspace/.config/gcloud/application_default_credentials.json so
GOOGLE_APPLICATION_CREDENTIALS does not point to an empty file; update the
Dockerfile to stop creating that file (remove the touch command) and instead
rely on the init container to stage the real secret or make the init container
validate the source secret and fail fast if the credential file is missing or
empty (reference the touch of
/workspace/.config/gcloud/application_default_credentials.json and the init
container that populates /workspace/.config/gcloud).

In `@deploy/pod.yaml`:
- Line 11: Replace the non-reproducible image tag "claude-rca:latest" used by
both the init container and main container in deploy/pod.yaml with a fixed,
immutable tag (e.g., a commit SHA or semantic version) so redeploys are
deterministic; update the image fields that currently reference
image-registry.openshift-image-registry.svc:5000/claude-ci-test/claude-rca:latest
to use the chosen pinned tag (for example claude-rca:<commit-sha> or
claude-rca:v1.2.3) and ensure your CI/build pipeline injects that exact tag into
the manifest for both containers.
- Around line 7-82: The pod currently lacks explicit securityContext so it
relies on cluster SCC; add a pod-level securityContext and matching
container-level securityContext for the initContainer "setup-ssh" and the main
container "claude": set runAsNonRoot: true, allowPrivilegeEscalation: false,
securityContext.capabilities.drop: ["ALL"], and
securityContext.seccompProfile.type: RuntimeDefault (apply the same block to
both the pod and each container's securityContext to ensure portability and
satisfy Checkov/Trivy).

In `@deploy/run-eval.sh`:
- Line 43: The oc wait call using POD_NAME can fail immediately if the pod
doesn't exist; add a pre-check before the oc wait invocation that tests
existence (e.g., run oc get pod "${POD_NAME}" and, if it returns non-zero, print
a clear message like "Pod ${POD_NAME} not found; apply deploy/pod.yaml first"
and exit non-zero) so the script does not abort with an opaque "not found" error
under set -e; modify the block around the oc wait "pod/${POD_NAME}"
--for=condition=Ready --timeout=60s to perform this guard and only call oc wait
when the pod exists.
- Around line 40-53: The script dangerously injects unescaped values into remote
shell commands (PROMPT, BRANCH, REPO_URL and ALLOWED_TOOLS) when calling oc exec
and claude; fix by avoiding direct interpolation into the single-quoted bash -c
string: pass PROMPT via a safe channel (write it into the pod and read it there,
or pass it via oc exec --env if supported) and have claude read the prompt from
a file or stdin (e.g., claude -p "$(cat /tmp/prompt)"); ensure git clone uses
safely quoted/terminated args for BRANCH and REPO_URL (use --branch "$BRANCH"
and git clone -- "$REPO_URL" or otherwise shell-escape/quote BRANCH/REPO_URL
before executing inside the pod); and build ALLOWED_TOOLS into a single properly
quoted argument (e.g., join with spaces into one quoted string) so oc
exec/claude never receives untrusted raw shell-expanded input.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 9e0108fe-bce8-4b72-a89a-2176c81539a7

📥 Commits

Reviewing files that changed from the base of the PR and between 1454f36 and 29f4c58.

📒 Files selected for processing (5)
  • .gitignore
  • deploy/Dockerfile
  • deploy/pod.yaml
  • deploy/run-eval.sh
  • deploy/secrets.yaml.template

Comment thread deploy/Dockerfile
Comment on lines +17 to +20
RUN mkdir -p /workspace/.config/gcloud /workspace/.claude \
&& touch /workspace/.config/gcloud/application_default_credentials.json \
&& chmod -R g+rwX /workspace \
&& chgrp -R 0 /workspace
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Empty placeholder credentials file may mask misconfiguration.

touching an empty application_default_credentials.json in the image means that if the init container's credential copy ever fails silently or the secret volume is misconfigured, GOOGLE_APPLICATION_CREDENTIALS will point to a zero-byte file and Vertex auth errors may be confusing to diagnose. Consider omitting the touch so the file is only present when the init container actually stages it, or have the init container fail fast if the source secret file is missing/empty.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy/Dockerfile` around lines 17 - 20, Remove the zero-byte placeholder
created by touch for
/workspace/.config/gcloud/application_default_credentials.json so
GOOGLE_APPLICATION_CREDENTIALS does not point to an empty file; update the
Dockerfile to stop creating that file (remove the touch command) and instead
rely on the init container to stage the real secret or make the init container
validate the source secret and fail fast if the credential file is missing or
empty (reference the touch of
/workspace/.config/gcloud/application_default_credentials.json and the init
container that populates /workspace/.config/gcloud).

Comment thread deploy/pod.yaml
Comment on lines +7 to +82
spec:
restartPolicy: Never
initContainers:
- name: setup-ssh
image: image-registry.openshift-image-registry.svc:5000/claude-ci-test/claude-rca:latest
command: ["bash", "-c"]
args:
- |
mkdir -p /workspace/.ssh /workspace/.claude /workspace/.config/gcloud
cp /tmp/ssh-secret/id_ed25519 /workspace/.ssh/id_ed25519
chmod 600 /workspace/.ssh/id_ed25519
cat > /workspace/.ssh/config <<EOF
Host ci-bastion
HostName $(cat /tmp/ssh-secret/bastion-hostname)
User $(cat /tmp/ssh-secret/bastion-username)
Port $(cat /tmp/ssh-secret/bastion-port)
IdentityFile /workspace/.ssh/id_ed25519
StrictHostKeyChecking no
UserKnownHostsFile /dev/null

Host ci-jumpbox
HostName $(cat /tmp/ssh-secret/jumpbox-hostname)
User $(cat /tmp/ssh-secret/jumpbox-username)
Port $(cat /tmp/ssh-secret/jumpbox-port)
IdentityFile /workspace/.ssh/id_ed25519
StrictHostKeyChecking no
UserKnownHostsFile /dev/null
EOF
chmod 600 /workspace/.ssh/config
cp /tmp/claude-settings/settings.json /workspace/.claude/settings.json
cp /tmp/gcp-creds/application_default_credentials.json /workspace/.config/gcloud/application_default_credentials.json
volumeMounts:
- name: ssh-secret
mountPath: /tmp/ssh-secret
readOnly: true
- name: claude-settings
mountPath: /tmp/claude-settings
readOnly: true
- name: gcp-creds
mountPath: /tmp/gcp-creds
readOnly: true
- name: workspace
mountPath: /workspace
containers:
- name: claude
image: image-registry.openshift-image-registry.svc:5000/claude-ci-test/claude-rca:latest
command: ["sleep", "infinity"]
env:
- name: CLAUDE_CODE_USE_VERTEX
value: "1"
- name: CLOUD_ML_REGION
valueFrom:
secretKeyRef:
name: claude-vertex-creds
key: CLOUD_ML_REGION
- name: ANTHROPIC_VERTEX_PROJECT_ID
valueFrom:
secretKeyRef:
name: claude-vertex-creds
key: ANTHROPIC_VERTEX_PROJECT_ID
- name: GOOGLE_APPLICATION_CREDENTIALS
value: "/workspace/.config/gcloud/application_default_credentials.json"
- name: HOME
value: "/workspace"
- name: CLAUDE_CODE_ACCEPT_TOS
value: "true"
volumeMounts:
- name: workspace
mountPath: /workspace
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "2"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add an explicit securityContext (non-root, drop capabilities).

The pod and both containers rely entirely on the namespace's default SCC. On OpenShift with restricted-v2 this is usually fine, but explicitly declaring runAsNonRoot: true, allowPrivilegeEscalation: false, capabilities.drop: ["ALL"], and seccompProfile: RuntimeDefault makes the manifest portable across clusters/namespaces and satisfies the Checkov/Trivy findings without behavior change. readOnlyRootFilesystem: true is likely not viable here because npm/git may write outside /workspace, so leave that off.

Proposed addition
 spec:
   restartPolicy: Never
+  securityContext:
+    runAsNonRoot: true
+    seccompProfile:
+      type: RuntimeDefault
   initContainers:
     - name: setup-ssh
       image: image-registry.openshift-image-registry.svc:5000/claude-ci-test/claude-rca:latest
+      securityContext:
+        allowPrivilegeEscalation: false
+        capabilities:
+          drop: ["ALL"]
       command: ["bash", "-c"]

(Apply the same securityContext block to the claude container.)

🧰 Tools
🪛 Trivy (0.69.3)

[error] 51-82: Root file system is not read-only

Container 'claude' of Pod 'claude-rca-eval' should set 'securityContext.readOnlyRootFilesystem' to true

Rule: KSV-0014

Learn more

(IaC/Kubernetes)


[error] 10-49: Root file system is not read-only

Container 'setup-ssh' of Pod 'claude-rca-eval' should set 'securityContext.readOnlyRootFilesystem' to true

Rule: KSV-0014

Learn more

(IaC/Kubernetes)


[error] 10-49: Default security context configured

container claude-rca-eval in default namespace is using the default security context

Rule: KSV-0118

Learn more

(IaC/Kubernetes)


[error] 51-82: Default security context configured

container claude-rca-eval in default namespace is using the default security context

Rule: KSV-0118

Learn more

(IaC/Kubernetes)


[error] 7-100: Default security context configured

pod claude-rca-eval in default namespace is using the default security context, which allows root privileges

Rule: KSV-0118

Learn more

(IaC/Kubernetes)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy/pod.yaml` around lines 7 - 82, The pod currently lacks explicit
securityContext so it relies on cluster SCC; add a pod-level securityContext and
matching container-level securityContext for the initContainer "setup-ssh" and
the main container "claude": set runAsNonRoot: true, allowPrivilegeEscalation:
false, securityContext.capabilities.drop: ["ALL"], and
securityContext.seccompProfile.type: RuntimeDefault (apply the same block to
both the pod and each container's securityContext to ensure portability and
satisfy Checkov/Trivy).

Comment thread deploy/pod.yaml
restartPolicy: Never
initContainers:
- name: setup-ssh
image: image-registry.openshift-image-registry.svc:5000/claude-ci-test/claude-rca:latest
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Avoid :latest image tag for reproducible deploys.

Both the init container and main container pin to claude-rca:latest. Re-running the pod after a rebuild silently picks up a different image, which makes eval results non-reproducible and complicates rollback. Consider tagging images with a commit SHA or semantic version and referencing that tag here.

Also applies to: 52-52

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy/pod.yaml` at line 11, Replace the non-reproducible image tag
"claude-rca:latest" used by both the init container and main container in
deploy/pod.yaml with a fixed, immutable tag (e.g., a commit SHA or semantic
version) so redeploys are deterministic; update the image fields that currently
reference
image-registry.openshift-image-registry.svc:5000/claude-ci-test/claude-rca:latest
to use the chosen pinned tag (for example claude-rca:<commit-sha> or
claude-rca:v1.2.3) and ensure your CI/build pipeline injects that exact tag into
the manifest for both containers.

Comment thread deploy/run-eval.sh
Comment on lines +40 to +53
PROMPT="$1"

echo "==> Checking pod status..."
oc wait "pod/${POD_NAME}" --for=condition=Ready --timeout=60s

echo "==> Cloning repo (branch: ${BRANCH})..."
oc exec "${POD_NAME}" -- bash -c \
"rm -rf /workspace/rhdp-rca-plugin && git clone --branch ${BRANCH} ${REPO_URL} /workspace/rhdp-rca-plugin"

echo "==> Running eval..."
oc exec "${POD_NAME}" -- bash -c \
"cd /workspace/rhdp-rca-plugin && claude -p '${PROMPT}' \
--allowedTools $(printf '"%s" ' "${ALLOWED_TOOLS[@]}") \
--output-format json"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Command injection risk via $PROMPT, ${BRANCH}, and ${REPO_URL}.

PROMPT is interpolated directly into a single-quoted remote bash -c string at Line 51. A prompt containing a single quote (e.g. "Why did Alice's job fail?") will break out of the quoting and, at best, cause a syntax error — at worst, execute attacker-controlled shell in the pod. ${BRANCH} and ${REPO_URL} at Line 47 are similarly spliced unquoted into the remote shell. Given this script is intended for eval workflows where prompts may come from datasets or CI inputs, this is a realistic injection vector.

Prefer passing the prompt via stdin (or an env var set through oc exec --env / a staged file) and quoting the git args, e.g.:

Proposed fix
-echo "==> Cloning repo (branch: ${BRANCH})..."
-oc exec "${POD_NAME}" -- bash -c \
-  "rm -rf /workspace/rhdp-rca-plugin && git clone --branch ${BRANCH} ${REPO_URL} /workspace/rhdp-rca-plugin"
-
-echo "==> Running eval..."
-oc exec "${POD_NAME}" -- bash -c \
-  "cd /workspace/rhdp-rca-plugin && claude -p '${PROMPT}' \
-    --allowedTools $(printf '"%s" ' "${ALLOWED_TOOLS[@]}") \
-    --output-format json"
+echo "==> Cloning repo (branch: ${BRANCH})..."
+oc exec "${POD_NAME}" -- bash -s -- "${BRANCH}" "${REPO_URL}" <<'EOF'
+set -euo pipefail
+branch="$1"; repo="$2"
+rm -rf /workspace/rhdp-rca-plugin
+git clone --branch "$branch" "$repo" /workspace/rhdp-rca-plugin
+EOF
+
+echo "==> Running eval..."
+# Pass prompt via env var so it is never interpreted by the shell.
+oc exec -i "${POD_NAME}" --env="CLAUDE_PROMPT=${PROMPT}" -- bash -s -- \
+  "${ALLOWED_TOOLS[@]}" <<'EOF'
+set -euo pipefail
+cd /workspace/rhdp-rca-plugin
+claude -p "$CLAUDE_PROMPT" --allowedTools "$@" --output-format json
+EOF

Note: oc exec --env support depends on your oc version; alternatively write the prompt to a file inside the pod via oc cp / stdin and read it with claude -p "$(cat /tmp/prompt)".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy/run-eval.sh` around lines 40 - 53, The script dangerously injects
unescaped values into remote shell commands (PROMPT, BRANCH, REPO_URL and
ALLOWED_TOOLS) when calling oc exec and claude; fix by avoiding direct
interpolation into the single-quoted bash -c string: pass PROMPT via a safe
channel (write it into the pod and read it there, or pass it via oc exec --env
if supported) and have claude read the prompt from a file or stdin (e.g., claude
-p "$(cat /tmp/prompt)"); ensure git clone uses safely quoted/terminated args
for BRANCH and REPO_URL (use --branch "$BRANCH" and git clone -- "$REPO_URL" or
otherwise shell-escape/quote BRANCH/REPO_URL before executing inside the pod);
and build ALLOWED_TOOLS into a single properly quoted argument (e.g., join with
spaces into one quoted string) so oc exec/claude never receives untrusted raw
shell-expanded input.

Comment thread deploy/run-eval.sh
PROMPT="$1"

echo "==> Checking pod status..."
oc wait "pod/${POD_NAME}" --for=condition=Ready --timeout=60s
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

oc wait fails if the pod does not yet exist.

oc wait pod/${POD_NAME} --for=condition=Ready returns a non-zero "not found" error immediately (and set -e aborts the script) when the pod has not been created yet. Consider either documenting that the pod must already be applied, or guarding with oc get pod "${POD_NAME}" >/dev/null 2>&1 || { echo "Pod ${POD_NAME} not found; apply deploy/pod.yaml first"; exit 1; } before the wait.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy/run-eval.sh` at line 43, The oc wait call using POD_NAME can fail
immediately if the pod doesn't exist; add a pre-check before the oc wait
invocation that tests existence (e.g., run oc get pod "${POD_NAME}" and, if it
returns non-zero, print a clear message like "Pod ${POD_NAME} not found; apply
deploy/pod.yaml first" and exit non-zero) so the script does not abort with an
opaque "not found" error under set -e; modify the block around the oc wait
"pod/${POD_NAME}" --for=condition=Ready --timeout=60s to perform this guard and
only call oc wait when the pod exists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant