nf-tower plugin blocks JVM exit after workflow completion — missing HTTP read timeout

## Bug report

### Expected behavior and actual behavior

**Expected**: After a workflow completes successfully, the Nextflow JVM should exit cleanly, allowing the Azure Batch head task to terminate and release the node.

**Actual**: The JVM hangs indefinitely after workflow completion. The `TowerClient.onFlowComplete()` method blocks the main thread on an HTTP PUT call (`sendHttpMessage(urlTraceComplete, ...)`) to the Tower API that never receives a response. The JVM has been stuck for 35+ hours with 0% CPU. The workflow shows as COMPLETE on Seqera Platform but the Azure Batch task remains in active/running state forever.

### Steps to reproduce the problem

Run any pipeline via Seqera Platform on Azure Batch. The issue is timing-dependent — it occurs when the Tower API connection becomes stale during the shutdown HTTP call. Reproduction requires a network condition where the TCP connection is established but the response is never delivered.

### Program output

**Last lines of Nextflow log** — log goes silent after `TimelineObserver`, the next observer (`TowerClient`) never produces output:
```
Mar-03 00:33:40.068 [main] DEBUG nextflow.trace.WorkflowStatsObserver - Workflow completed
Mar-03 00:33:40.068 [main] DEBUG nextflow.trace.TimelineObserver - Workflow completed -- rendering execution timeline
<EOF — no further output, no "Session destroyed", no System.exit()>
```

**Thread state from the stuck node** (via `/proc`, 35+ hours after completion):
```
TID=4474   name=java               state=S (sleeping)    ← main thread blocked
TID=4821   name=tower-logs-chec    state=S (sleeping)    ← next observer never called
TID=4645   name=HttpClient-1-Se    state=S (sleeping)    ← Java HttpClient threads alive
(Tower-thread ABSENT — sender exited cleanly, sender.join() is not the issue)
```

**Network state from the stuck node**:
```
ESTAB 0 0  10.0.0.4:59344 → 13.41.18.99:443  (pid=4474)   ← Tower API
```
No TCP keepalive timers, no retransmission timers — stale connection sitting idle for 35+ hours. Tower API health check responds `200` in `0.25s` — the issue is specific to this stale connection.

### Environment

* Nextflow version: 25.10.4 (build 11173)
* Java version: Amazon Corretto 21.0.10+7-LTS
* Operating system: Linux 6.8.0-1044-azure (Azure Batch)
* nf-tower plugin: 1.17.5
* nf-azure plugin: 1.20.2

### Additional context

**Root cause analysis**:

`TowerClient.onFlowComplete()` calls `sendHttpMessage(urlTraceComplete, req, 'PUT')` to report workflow completion to the Tower API. The underlying `HxClient` is built with `connectTimeout(60s)` but **no read timeout** — Java HttpClient's default is infinite. When the Tower API accepts the TCP connection but never sends a response (stale/dead connection), the HTTP read blocks the main thread forever, `System.exit()` is never reached, and the JVM hangs indefinitely.

The shutdown observer chain is sequential with no timeout (`notifyEvent()` catches exceptions but cannot handle infinite blocking), so one stuck observer prevents all subsequent cleanup.

**Suggested fixes**:
1. Add a read/response timeout to `HxClient` (e.g. `.timeout(Duration.ofSeconds(300))`)
2. Add a bounded timeout to the shutdown HTTP call in `onFlowComplete()` so a stale connection can't block JVM exit
3. Consider enabling TCP keepalive to detect dead connections at the transport level
4. Close `HxClient` on shutdown (currently never closed)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nf-tower plugin blocks JVM exit after workflow completion — missing HTTP read timeout #6885

Bug report

Expected behavior and actual behavior

Steps to reproduce the problem

Program output

Environment

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

nf-tower plugin blocks JVM exit after workflow completion — missing HTTP read timeout #6885

Description

Bug report

Expected behavior and actual behavior

Steps to reproduce the problem

Program output

Environment

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions