Generate per-container SAS tokens for Azure AD/Managed Identity auth#6902
Generate per-container SAS tokens for Azure AD/Managed Identity auth#6902adamrtalbot wants to merge 13 commits intomasterfrom
Conversation
When using Azure Entra (Active Directory) or Managed Identity authentication with Azure Batch, generate a separate SAS token per blob container accessed by the pipeline instead of a single global SAS for the working directory only. This allows pipelines to access blobs from multiple Azure containers (e.g., az://scidev-useast/... and az://igenomes/...) without requiring any user configuration changes. Changes: - AzStorageOpts: add per-container SAS token map with getSasToken(container) fallback to global sasToken for backward compatibility - AzHelper: add generateContainerSasWithActiveDirectory overloads accepting UserDelegationKey to avoid redundant key fetches - AzFileSystemProvider: generate and register per-container SAS on newFileSystem0() for AD/MI auth - AzBatchExecutor: generate SAS for workDir container eagerly at startup for AD/MI - AzFileCopyStrategy: getSasForPath() generates SAS lazily at script-generation time; getEnvScript() exports AZ_SAS_<CONTAINER> env vars per container - AzBatchService: use getSasForPath(task.workDir) instead of global sasToken - AzBashLib: add nxf_az_sas() for runtime per-container SAS lookup from env vars; nxf_az_download directory fallback handles URLs with embedded SAS tokens - AzFusionEnv: export AZURE_STORAGE_SAS_TOKEN_<CONTAINER> per container - AzFileSystem: use provider.getSasToken() for per-container SAS on copy Generated by Claude Code Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com>
- AzFileCopyStrategy: pre-generate SAS tokens for all input file containers in the constructor so getEnvScript() exports AZ_SAS_<CONTAINER> for every container used by the task (not just the workdir) - AzBashLib: fix nxf_az_upload to split target URL at '?' before appending the filename, preventing malformed URLs like 'container?sas/filename' - AzBashLib: fix nxf_az_download directory fallback to split source URL at '?' before appending '/*', preventing malformed URLs like 'path*?sas' Generated by Claude Code Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com>
…onfig - Use ConcurrentHashMap for containerSasTokens (thread safety) - Remove dead generateContainerSasIfNeeded() from AzBatchExecutor - Make uploadCmd() an instance method; add static overload for AzPathFactory - AzPathFactory.getUploadCmd() looks up per-container SAS from AzConfig - Simplify AzFusionEnv conditional into unconditional loop - Update AzFileCopyStrategyTest and AzBashLibTest for URL-splitting bash functions - Remove accidentally committed local nextflow.config test file Generated by Claude Code Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com>
✅ Deploy Preview for nextflow-docs-staging canceled.
|
| * Used when accessing multiple blob containers with AD/MI authentication. | ||
| * ConcurrentHashMap ensures safe concurrent reads and writes from multiple task threads. | ||
| */ | ||
| private final Map<String,String> containerSasTokens = new ConcurrentHashMap<>() |
There was a problem hiding this comment.
This would be better as some object rather than Map<String, String> but I think this is sufficient to demonstrate the fix.
Generated by Claude Code Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com>
…Provider AzStorageOpts is user configuration; runtime-generated SAS tokens belong in AzFileSystemProvider which owns the filesystem lifecycle. All callers (AzBatchService, AzBatchExecutor, AzFileCopyStrategy, AzFusionEnv, AzPathFactory) now resolve the provider from the AzPath they already hold. Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com>
SAS tokens are already embedded in each URL in the generated script. Exporting them as env vars was redundant since nxf_az_upload/download never read them. Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com>
|
@adamrtalbot I am a bit lost why managed identity approach uses SAS token. https://docs.azure.cn/en-us/entra/identity/managed-identities-azure-resources/tutorial-linux-managed-identities-vm-access?pivots=identity-linux-mi-vm-access-storage Could you help me understand the motivation of not using managed identity credentials directly? |
The Managed Identity allows Nextflow to authenticate to Azure. The tasks are unaware they are in a Nextflow pipeline and can't inherit this authentication. Therefore, we generate short access tokens so Nextflow can pass authentication to each individual task. To use Managed Identities in tasks you must:
|
All auth modes (account-key, AD/MI, user-supplied SAS) now follow the same per-container SAS flow via AzFileSystemProvider. No global SAS is exported to batch scripts; tokens are embedded directly in URLs. Fusion retains a temporary account-wide SAS for account-key auth until Fusion adds per-container SAS support (AZURE_STORAGE_SAS_TOKEN_*). Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com>
…tainerClient() Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com>
Signed-off-by: Adam Talbot <12817534+adamrtalbot@users.noreply.github.com>
So we agree that it is possible to access storage from tasks under condition that managed identity is added to the pool.
Based on above it would be sufficient, if tasks could be given indication to use managed identity instead of SAS token (under a.m. condition). It could be also considered as a fallback method if SAS token is not provided. Would love to hear your input about it, however we might be under a risk of going off topic. I believe this PR is right on conceptual level and it patches an issue currently in focus. I will try to find some time to test it as well but cannot make promise on time frame right now. |
Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com>
…edded SAS format The AzBashLib was updated to embed SAS tokens directly in URLs using target_base/target_qs splitting instead of the old $AZ_SAS env var approach. BashWrapperBuilderWithAzTest still expected the old format, causing two test failures in CI. Generated by Claude Code Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com>
…puts assertions AzPathFactory.getUploadCmd() and AzFileCopyStrategy.httpUrl() now embed the SAS token directly in the URL. The unstage_outputs assertions were still comparing against the bare URL (no SAS), causing SpockComparisonFailure. Extract the SAS from the provider at test time and pass it to AzHelper.toHttpUrl() so the expected value matches the actual output. Generated by Claude Code Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com>
Summary
Fixes #5669
When using Azure Entra ID (Active Directory) or Managed Identity authentication with Azure Batch, Nextflow previously only generated a SAS token for the working directory container. Pipelines referencing blobs in other containers (e.g., `az://igenomes/...` while workdir is `az://scidev-useast/...`) failed with authentication errors.
This PR generates per-container SAS tokens for all containers accessed during a pipeline run.
Changes
Core infrastructure (
nf-azureplugin)AzFileSystemProvider: Owns the per-container SAS token cache (ConcurrentHashMap<String,String>) as runtime state;getSasToken(containerName)returns a container-specific token or falls back to the globalsasToken;setSasToken()andgetContainerSasTokens()allow callers to register and enumerate tokens;generateAndRegisterContainerSas()is called duringnewFileSystem0()for AD/MI auth, lazily generating SAS for each new container accessedAzStorageOpts: Config-only — per-container SAS state removed;sasTokenremains as the global/user-supplied tokenAzHelper: AddedgenerateUserDelegationKey()and a newgenerateContainerSasWithActiveDirectory(BlobContainerClient, Duration, UserDelegationKey)overloadAzBatchExecutor: Eagerly generates SAS for the workDir container at startup; resolvesAzFileSystemProviderfrom the workDirAzPathto register the tokenAzFileSystem:copy()uses per-container SAS lookup via the providerAzFileCopyStrategy: Holds a reference toAzFileSystemProvider(resolved frombean.workDirin the constructor); constructor pre-generates SAS for all input file containers;getSasForPath()looks up and lazily generates tokens via the provider;getEnvScript()exports per-containerAZ_SAS_<CONTAINER>env vars from the provider's token mapAzBatchService:getSasForPath()resolves the provider from theAzPathargument for per-container lookupAzFusionEnv:getOrCreateSasToken()andgetEnvironment()resolve the provider fromGlobal.session.workDir; exportsAZURE_STORAGE_SAS_TOKEN_<CONTAINER>per containerAzPathFactory:getUploadCmd()resolves the provider from the targetAzPathfor per-container SAS lookupBash library (
AzBashLib)nxf_az_uploadandnxf_az_downloadnow embed SAS tokens in URLs at script-generation time and split the URL at?using bash parameter expansion (${var%%\?*}/${var#*\?}) to correctly insert path components before the query stringnxf_az_sas()function (no longer needed)Backward Compatibility
sasToken(account key auth) is unchanged —getSasToken(containerName)falls back to it when no per-container token existssasTokenis null (i.e., AD/MI auth)Testing
AzBashLibTest,AzFileCopyStrategyTest,AzStorageOptsTestnf-canaryacross 3 separate Azure Blob containers:scidev-useast(workdir)igenomes(foreign input via--remoteFile)outputs(publishDir)