Skip to content

testrunner/runners/system: fall back to packageRoot in CreatePackagePolicy when built tree is unavailable#3573

Open
kcreddy wants to merge 3 commits into
elastic:mainfrom
kcreddy:CreatePackagePolicy-fix
Open

testrunner/runners/system: fall back to packageRoot in CreatePackagePolicy when built tree is unavailable#3573
kcreddy wants to merge 3 commits into
elastic:mainfrom
kcreddy:CreatePackagePolicy-fix

Conversation

@kcreddy
Copy link
Copy Markdown

@kcreddy kcreddy commented May 15, 2026

testrunner/runners/system: fall back to packageRoot in CreatePackagePolicy when built tree is unavailable

CreatePackagePolicy called ReadBuiltPackageManifest unconditionally,
which assumed the built package directory always exists and is reachable
from a Git repository. This fails when packageRoot is an EPR-extracted
package (add_package_policy -version) because the built tree contains a
different version, or when running outside a Git repository where no
build/ ancestor exists.

ReadAllDataStreamManifests silently returned nil for the non-existent
built path, producing an empty Streams map that Fleet rejected because
required variables had no defaults.

The fix is in CreatePackagePolicy: when ReadBuiltPackageManifest fails,
fall back to reading manifests directly from packageRoot. This keeps
ReadBuiltPackageManifest strict — callers that genuinely need the built
tree (composable integrations) still get a hard error if the build is
missing. An EPR package is already fully resolved (no package:
references), so the source manifest is accurate enough.

The fallback is logged at debug level so it appears in verbose output
without masking real build failures.

Fixes #3552

Assisted-by Cursor (claude-opus-4.6)

kcreddy added 2 commits May 15, 2026 14:00
…able

ReadBuiltPackageManifest assumed the built package directory always
exists and is reachable from a Git repository. This fails when
packageRoot is an EPR-extracted package (add_package_policy -version)
because the built tree contains a different version, or when running
outside a Git repository where no build/ ancestor exists.

When BuildPackagesDirectory fails or the resolved path has no manifest,
fall back to reading directly from packageRoot. This restores correct
stream population in CreatePackagePolicy: without the fallback,
ReadAllDataStreamManifests silently returned nil for the non-existent
built path, producing an empty Streams map that Fleet rejected because
required variables had no defaults.

Fixes elastic#3552
Comment thread internal/builder/packages.go Outdated
Comment on lines +130 to +131
// Built tree unavailable or incomplete; fall back to packageRoot.
pkg, err := packages.ReadPackageManifestFromPackageRoot(packageRoot)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is error prone, a caller could silently fallback to the source manifest when they are interested on the built one.

The caller should take the responsibility of falling back to the source manifest if it thinks it is accurate enough.

If the caller is interested only in fields that appear on both kinds of manifests (such as the name and version), and is not concerned about other fields that could change in build time they they could directly use the package root.

But in general a caller that is interested on the built files, such as most testers, should ensure first that they have a fully built package.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point — the silent fallback in ReadBuiltPackageManifest is too broad and would hide real build problems for callers that genuinely need the built tree.

I've moved the fallback into CreatePackagePolicy, which is the only caller that receives EPR-extracted roots (via addPackagePolicy -version). ReadBuiltPackageManifest is now strict again — it fails if the built tree doesn't exist or contains a different version.

The reasoning: an EPR package downloaded from the registry is already fully resolved (no package: references), so reading manifests directly from packageRoot is accurate enough to build a correct policy. Callers that need composable resolution (the normal system test path, fleetpolicy.go, etc.) still get a hard error if the build is missing.

The fallback is logged at debug level so it's visible in -v output but doesn't mask problems silently.

Commit: 0901250

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CreatePackagePolicy must use a built manifest, we neither can silently fall back to the source manifest there.

Actually after re-reading #3552 I wonder if the problem is really related to use the built or the source manifest.

The failures are about required variables:

Package policy is invalid:
  inputs.cel.streams.o365.audit.vars.azure_tenant_id: Directory (tenant) ID is required
  inputs.cel.streams.o365.audit.vars.client_id: Application (client) ID is required
  inputs.cel.streams.o365.audit.vars.client_secret: Client Secret is required

In #3307 we introduced the use of the newer "simplified" format of the Fleet policy API. This API is stricter regarding validation of required fields.

If in this case these fields are really required and not set, the test should be fixed to set the required variables.
If not possible, fall back to use the legacy API. This can be done in system and policy sets setting the policy_api_format: legacy parameter. Not sure what would be the best way to do this in script tests.

Copy link
Copy Markdown
Author

@kcreddy kcreddy May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The required variables are set in the test config (test_config.yaml lines 65-74 has: client_id, client_secret, azure_tenant_id set). I believe the problem isn't missing vars or the simplified API being stricter — it's that the vars never reach Fleet at all.

The chain:

  1. ReadBuiltPackageManifest(eprRoot) resolves to build/packages/o365/3.8.1/ — which doesn't exist (only 3.8.2 is built).
  2. ReadAllDataStreamManifests(builtRoot) returns nil because that directory has no data_stream/.
  3. buildStreamsForInput iterates over empty datastreams, producing an empty Streams map.
  4. json:"streams,omitempty" drops Streams from the request.
  5. Fleet receives enabled: true with no streams, auto-creates them with no user vars, and rejects because the required fields have no defaults.

The legacy API would have the same issue — no streams means no vars get sent regardless of format.

Regarding "must use a built manifest": this seems true for composable integrations where package: references need resolution. But add_package_policy -version passes an EPR-extracted package downloaded from the registry. It's already fully resolved — there are no package: references. The EPR manifest and the built manifest are identical for non-composable packages. The built tree simply doesn't exist for the EPR version because only the dev version is built locally.

The alternative would be to also build (or extract) the EPR version into the build/ tree, but that's a much larger change for a path that doesn't need it.

Let me know what you think.

Copy link
Copy Markdown
Member

@jsoriano jsoriano May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am refactoring add_package_policy in #3576 to work only with built packages, and thus removing the conditions added for composable packages. Once built, composable packages have the same format as integration packages.

This should also fix #3552, please check if it works for you.

…t in CreatePackagePolicy when built tree is unavailable

ReadBuiltPackageManifest assumed the built package directory always
exists and is reachable from a Git repository. This fails when
packageRoot is an EPR-extracted package (add_package_policy -version)
because the built tree contains a different version, or when running
outside a Git repository where no build/ ancestor exists.

Move the fallback into CreatePackagePolicy rather than
ReadBuiltPackageManifest so the shared function stays strict — callers
that genuinely need the built tree (composable integrations) still get
a hard error if the build is missing. CreatePackagePolicy is the only
caller that receives EPR-extracted roots; an EPR package is already
fully resolved (no package: references), so reading manifests directly
from packageRoot is accurate enough to build a correct policy.

The fallback is logged at debug level so it appears in verbose output
without masking real build failures.

Fixes elastic#3552
@kcreddy kcreddy changed the title internal/builder: fall back to packageRoot when built tree is unavailable testrunner/runners/system: fall back to packageRoot in CreatePackagePolicy when built tree is unavailable May 15, 2026
@elasticmachine
Copy link
Copy Markdown
Collaborator

💚 Build Succeeded

History

@teresaromero
Copy link
Copy Markdown
Contributor

@kcreddy i am still figuring out what is the error and why this changes. perhaps is obvious but i am not understanding the following: why the error is reported at the script test runner when using certain flag, and the changes are about system test runner?

ref #3552 is the flag also used for system?

thanks

@kcreddy
Copy link
Copy Markdown
Author

kcreddy commented May 18, 2026

@kcreddy i am still figuring out what is the error and why this changes. perhaps is obvious but i am not understanding the following: why the error is reported at the script test runner when using certain flag, and the changes are about system test runner?

ref #3552 is the flag also used for system?

thanks

@teresaromero
The script test runner's addPackagePolicy command in data_stream.go calls system.CreatePackagePolicy() (line 156) — the function lives in the system package because it's shared between both runners. The script test runner doesn't have its own policy-building code; it delegates to the same CreatePackagePolicy function that system tests use.

So the bug surfaces via the script test command add_package_policy -version, but the code that actually builds the Fleet policy request (and where the wrong builtRoot gets resolved) is CreatePackagePolicy in internal/testrunner/runners/system/tester.go. That's why the fix is there — it's the shared function both runners go through.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants