Add pickupFirst mode to vcert playbook for shared-certificate distribution (TPP)
BUSINESS PROBLEM
Many customers operate the "one cert, many endpoints" pattern: a single TLS certificate (often a wildcard) is installed on dozens to hundreds of heterogeneous endpoints — Apache servers, NGINX, F5/NetScaler load balancers, Imperva, etc. — all serving the same FQDN(s). When the cert is renewed in TPP (manually via Aperture, automatically via a renewal policy, or via vcert on a designated leader host), every follower needs to install that exact same cert + key during its own maintenance window, which may be days or weeks after the renewal happens.
vcert's current playbook (vcert run -f apache.yaml) is built around the assumption that the host running the playbook owns the enrollment — it always tries to enroll / renew through the request block. That means:
- On a shared-wildcard scenario, every follower host running the playbook would attempt to enroll its own cert against TPP, each generating its own keypair. That's the opposite of "one wildcard everywhere."
- Followers can't simply track the leader's renewal — the playbook has no mode for "fetch whatever cert TPP currently has at this object DN and install it locally if it's different."
- Operators end up writing custom shell wrappers around
vcert pickup to bridge this gap. We did exactly this for our customer — ~250 lines of bash that drives vcert pickup, compares thumbprints, decides whether to install or defer to the existing renewal path.
Business impact: every customer with shared / wildcard certs across multiple endpoints either accepts staggered-renewal pain, builds bespoke distribution scripts, or pushes the cert manually. The pattern is common enough that vcert should support it natively.
PROPOSED SOLUTION
Add an opt-in pickupFirst mode to the playbook request block. With one new boolean field (and one optional override), a follower host's playbook becomes a self-healing converger to whatever the platform currently holds at a given cert object.
certificateTasks:
- name: apache-cert
renewBefore: 30d
request:
csr: service
pickupFirst: true # ← new, default false
pickupId: '\VED\Policy\...\cert-dn' # ← new, optional; defaults to zone\CN
zone: '\VED\Policy\Demo\Apache'
subject:
commonName: 'shared.example.com'
...
installations:
- format: PEM
file: /etc/pki/tls/certs/apache.crt
keyFile: /etc/pki/tls/private/apache.key
chainFile: /etc/pki/tls/certs/apache-chain.crt
afterInstallAction: "systemctl reload httpd"
When pickupFirst: true:
- Locate (TPP) —
RetrieveCertificateMetaData(dn) — one cheap GET, returns thumbprint + ValidTo with no PEM / key payload.
- Compare the result against the installed cert's SHA-1 thumbprint and
NotAfter.
- Decide:
| State |
Action |
| Thumbprint matches installed |
Defer to the existing renewBefore window check (normal playbook flow takes over). |
| Platform cert is newer |
Full RetrieveCertificate for cert + chain + key, install at the playbook's paths via the existing installer chain, run afterInstallAction. No enrollment. |
| Platform cert is older than installed |
Log "refusing downgrade", exit cleanly. |
| Platform cert not found |
Fall through to the existing enroll flow (handles initial enrollment naturally). |
Backwards compatibility: absent pickupFirst (or pickupFirst: false), the playbook behaves byte-identically to today. Existing customer playbooks are unaffected.
Architectural notes from a working prototype
- Implemented as a single new file
pkg/playbook/app/service/pickup_first.go (~150 lines) plus three small public helpers in vcertutil and installer. The patch is purely additive: zero deletions, zero modifications to existing logic. The new field defaults make every untouched code path identical to current behavior.
- Hot path (thumbprint match) is ~50 ms on TPP — much cheaper than a full pickup. Doesn't touch the private-key vault. Doesn't exercise PKCS#8 decryption. Scales to any tenant size because
RetrieveCertificateMetaData is O(1) by DN.
- Reuses every existing component:
runInstaller, CreateX509Cert (handles PKCS#8 encrypted-key decryption), afterInstallAction, backup / rollback. No new installer code.
- The "platform older than installed" path is a genuine safety win — it prevents accidental downgrades when an admin imports an older cert into TPP by mistake.
Diffstat against v5.13.2
pkg/playbook/app/domain/playbookRequest.go | 2 +
pkg/playbook/app/installer/crypto.go | 4 +
pkg/playbook/app/service/pickup_first.go | 133 +++++++++++++++++++++++++++
pkg/playbook/app/service/service.go | 8 ++
pkg/playbook/app/vcertutil/vcertutil.go | 143 +++++++++++++++++++++++++++++
5 files changed, 290 insertions(+)
Scope for v1: TPP only
VCP support would require a different locator strategy. Its cert-object model is fundamentally different:
- Multiple cert lineages per CN can coexist.
versionType (CURRENT / OLD) and certificateStatus (ACTIVE / RETIRED) are independent state machines.
managedCertificateId (the lineage identifier) is not currently a server-side searchable field.
The proposed implementation silently no-ops on non-TPP backends so VCP / Firefly / NGTS playbooks see zero behavior change and zero error noise. VCP-native support is a clean follow-up issue once the locator abstraction lands.
CURRENT ALTERNATIVES
In production for a customer today, we have evaluated or are doing all of the following:
- Bespoke shell wrapper around
vcert pickup and vcert run. Reads install paths and renewBefore from the playbook YAML, drives the four-branch decision tree (newer pickup → install / match in window → renew / match outside window → no-op / nothing in TPP → initial enroll), handles PKCS#8 key decryption before write (because Apache without SSLPassPhraseDialog can't load encrypted keys), filters stderr noise, writes timestamped backups. Roughly 250 lines of bash that every customer in this situation ends up writing variants of.
vcert pickup driven by cron with custom diffing. Same pattern, different language.
- Cert pushed manually out of band (rsync from a leader, configuration-management drift). Skips
vcert entirely; the cert object in TPP becomes informational rather than authoritative.
- Accept staggered downtime — run
vcert run --force-renew on every host on a coordinated maintenance window, even though only one of them actually needed to enroll.
All four approaches reinvent the same logic and put the burden on the operator. Native support in vcert would replace all of them with one YAML flag.
VENAFI EXPERIENCE
- Working with
vcert v5 (currently v5.12.3 in the customer environment; verified the proposed implementation also compiles and tests cleanly against v5.13.2 / master). Daily use of the playbook engine, vcert pickup, vcert run, and the standalone vcert enroll / vcert renew commands.
- TPP customer for several years; both interactive Aperture use and API-driven via
vcert. Mix of enrollment patterns: user-provided CSR, service-generated, mixed key-retrieval policies across folders.
- Have prototyped this feature end-to-end on a live TPP lab and verified all seven decision scenarios:
- backwards-compat (no
pickupFirst field)
- hot-path match
- install-newer
- refuse-downgrade
- in-renew-window-defer-to-enroll
- initial-enroll
- VCP-silent-noop
A working prototype patch (pickupFirst.patch) is attached. Five files, +290 lines, zero deletions, zero modifications to existing code paths. Apply with git apply pickupFirst.patch from the vcert repo root.
pickupFirst.patch
Add
pickupFirstmode to vcert playbook for shared-certificate distribution (TPP)BUSINESS PROBLEM
Many customers operate the "one cert, many endpoints" pattern: a single TLS certificate (often a wildcard) is installed on dozens to hundreds of heterogeneous endpoints — Apache servers, NGINX, F5/NetScaler load balancers, Imperva, etc. — all serving the same FQDN(s). When the cert is renewed in TPP (manually via Aperture, automatically via a renewal policy, or via
vcerton a designated leader host), every follower needs to install that exact same cert + key during its own maintenance window, which may be days or weeks after the renewal happens.vcert's current playbook (vcert run -f apache.yaml) is built around the assumption that the host running the playbook owns the enrollment — it always tries to enroll / renew through therequestblock. That means:vcert pickupto bridge this gap. We did exactly this for our customer — ~250 lines of bash that drivesvcert pickup, compares thumbprints, decides whether to install or defer to the existing renewal path.Business impact: every customer with shared / wildcard certs across multiple endpoints either accepts staggered-renewal pain, builds bespoke distribution scripts, or pushes the cert manually. The pattern is common enough that
vcertshould support it natively.PROPOSED SOLUTION
Add an opt-in
pickupFirstmode to the playbookrequestblock. With one new boolean field (and one optional override), a follower host's playbook becomes a self-healing converger to whatever the platform currently holds at a given cert object.When
pickupFirst: true:RetrieveCertificateMetaData(dn)— one cheap GET, returns thumbprint +ValidTowith no PEM / key payload.NotAfter.renewBeforewindow check (normal playbook flow takes over).RetrieveCertificatefor cert + chain + key, install at the playbook's paths via the existing installer chain, runafterInstallAction. No enrollment.Backwards compatibility: absent
pickupFirst(orpickupFirst: false), the playbook behaves byte-identically to today. Existing customer playbooks are unaffected.Architectural notes from a working prototype
pkg/playbook/app/service/pickup_first.go(~150 lines) plus three small public helpers invcertutilandinstaller. The patch is purely additive: zero deletions, zero modifications to existing logic. The new field defaults make every untouched code path identical to current behavior.RetrieveCertificateMetaDatais O(1) by DN.runInstaller,CreateX509Cert(handles PKCS#8 encrypted-key decryption),afterInstallAction, backup / rollback. No new installer code.Diffstat against
v5.13.2Scope for v1: TPP only
VCP support would require a different locator strategy. Its cert-object model is fundamentally different:
versionType(CURRENT/OLD) andcertificateStatus(ACTIVE/RETIRED) are independent state machines.managedCertificateId(the lineage identifier) is not currently a server-side searchable field.The proposed implementation silently no-ops on non-TPP backends so VCP / Firefly / NGTS playbooks see zero behavior change and zero error noise. VCP-native support is a clean follow-up issue once the locator abstraction lands.
CURRENT ALTERNATIVES
In production for a customer today, we have evaluated or are doing all of the following:
vcert pickupandvcert run. Reads install paths andrenewBeforefrom the playbook YAML, drives the four-branch decision tree (newer pickup → install / match in window → renew / match outside window → no-op / nothing in TPP → initial enroll), handles PKCS#8 key decryption before write (because Apache withoutSSLPassPhraseDialogcan't load encrypted keys), filters stderr noise, writes timestamped backups. Roughly 250 lines of bash that every customer in this situation ends up writing variants of.vcert pickupdriven by cron with custom diffing. Same pattern, different language.vcertentirely; the cert object in TPP becomes informational rather than authoritative.vcert run --force-renewon every host on a coordinated maintenance window, even though only one of them actually needed to enroll.All four approaches reinvent the same logic and put the burden on the operator. Native support in
vcertwould replace all of them with one YAML flag.VENAFI EXPERIENCE
vcert v5(currentlyv5.12.3in the customer environment; verified the proposed implementation also compiles and tests cleanly againstv5.13.2/ master). Daily use of the playbook engine,vcert pickup,vcert run, and the standalonevcert enroll/vcert renewcommands.vcert. Mix of enrollment patterns: user-provided CSR, service-generated, mixed key-retrieval policies across folders.pickupFirstfield)A working prototype patch (
pickupFirst.patch) is attached. Five files, +290 lines, zero deletions, zero modifications to existing code paths. Apply withgit apply pickupFirst.patchfrom thevcertrepo root.pickupFirst.patch