Skip to content

Task Group Long Running Task Prevents Early Termination #82396

Closed
@mredig

Description

@mredig

Description

I can honestly see this going either way as a bug, working as expected, or even "kinda a bug, but not much we can do".

Anyways, I'm trying to get a reliable timeout behavior by leveraging a task group.

Reproduction

This is a simplistic reproduction of an issue that can be experienced in our live production code. The usleep is just an issue reproducing standin for another long running task.

@Test func example() async throws {
    let start = Date()
    try? await withThrowingTaskGroup { group in
        group.addTask {
            let startTime = Date()
            while .now < startTime.addingTimeInterval(10) {
                // I couldn't get usleep to ever return on a single 10 second call, so I'm just doing this.
                usleep(1_000)
            }
            print("🤬🤬🤬")
            throw TimeoutError.failure
        }

        group.addTask {
            try await Task.sleep(for: .seconds(1))
            print("💨 now!")
            throw TimeoutError.timedOut
        }

        defer { group.cancelAll() }
        guard
            let success = try await group.next()
        else {
            throw TimeoutError.noResult
        }
    }
    let timeElapsed = Date.now.timeIntervalSince(start)

    print("This took \(timeElapsed) seconds.")

    #expect(timeElapsed < 2) // it should be much lower than 2 seconds, but this is just to set a generous timeout.
}

public enum TimeoutError: Swift.Error {
    case timedOut
    case noResult
    case failure
}

What Happens

  1. After ~1 second, the timeout sub task prints "💨 now!" and throws TimeoutError.timedOut
  2. let success = try await group.next() resumes, in theory bubbling up the error, but this behavior doesn't pan out just yet...
  3. ~9 more seconds elapse, "🤬🤬🤬" prints out, and finally the task group bubbles up the TimeoutError.timedOut error from earlier.
  4. A statement to the effect of This took 10.decimalValue seconds. is printed
  5. The expectation fails because timeElapsed > 2 seconds.

Expected behavior

Concise/TL;DR

  1. After ~1 second, the timeout sub task should print "💨 now!" and throw TimeoutError.timedOut
  2. let success = try await group.next() should resume, bubbling up the error
  3. A statement to the effect of This took 1.decimalValue seconds. should print
  4. The expectation should pass
  5. If the test process is still running and doesn't get closed in the interim 9 seconds, "🤬🤬🤬" should print out.

I expect the usleep task to continue in the background and any returns/results/etc discarded. Nothing can be done to stop it from completion, just the completion is into the void.

I expect the task group to bubble up the thrown error immediately after print("💨 now!") and not wait for the rest of the task group to complete.

Long, but will read

We should see 💨 now! print out on the console after about a second.

If the program is still executing, I'd expect to see 🤬🤬🤬 about 9 seconds afterwards.

While I know there's no mechanism in the long running task to stop what it's doing (it should just silently discard its results in the background, tho that wouldn't affect any side effects it could affect while it's still running), I would expect the task group to exit immediately upon the thrown error.

I DO also understand the complication where if the usleep task is running on the main thread and the task group is also on the main actor, that would cause a dead lock until usleep is done, but I don't think that's what's happening here. I'm not constraining any isolation on any scope, and while some might come by default, I think the only thing that SHOULD have any default isolation is the top scope of the test. The usleep scope should be on a background thread/actor/isolation/whatever.

And, the solution can't be "change the usleep operation scope to periodically check for cancellations" because A. this should be a general, generic timeout solution B. I don't think that's possible on cross language code (specifically in our app this is interacting with a Rust library) and C. the long running code is from an SDK we don't have control over, even if we wanted to change that side of things.

Environment

swift-driver version: 1.120.5 Apple Swift version 6.1.2 (swiftlang-6.1.2.1.2 clang-1700.0.13.5)
Target: arm64-apple-macosx15.0

MacBook Pro M4 Pro

Additional information

It DOES work as expected if you periodically run Task.checkCancellation() in the long running task. And when the long running task is async aware in general, the task group works great!

Metadata

Metadata

Assignees

No one assigned

    Labels

    concurrencyFeature: umbrella label for concurrency language features

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions