Skip to content

Add a TimedCertificateReloader #269

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 35 commits into from
May 14, 2025
Merged

Add a TimedCertificateReloader #269

merged 35 commits into from
May 14, 2025

Conversation

gjcairo
Copy link
Contributor

@gjcairo gjcairo commented Apr 30, 2025

This PR adds a CertificateReloader protocol and a TimedCertificateReloader implementation.

A CertificateReloader allows for certificate-key pairs to be observed and updated. A TimedCertificateReloader does this every set interval of time.

This PR also provides an extension on TLSConfiguration to enable this reloading ability in NIO applications, so that updated certificates and keys can be used during SSL handshakes; as well as conformance of the TimedCertificateReloader to swift-service-lifecycle's Service protocol, for easy composition.

Copy link
Member

@FranzBusch FranzBusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exciting! Making TLS configuration updates easier is something that has been requested a lot

@gjcairo gjcairo requested a review from FranzBusch April 30, 2025 16:18
/// A `NIOSSLContextConfigurationOverride` that will be used as part of the NIO application's TLS configuration.
/// Its certificate and private key will be kept up-to-date via whatever mechanism the specific ``CertificateReloader``
/// implementation provides.
var sslContextConfigurationOverride: NIOSSLContextConfigurationOverride { get async }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been thinking about this some more. There is two ways to go with this here. Keep it close to NIO then it should probably not be async but rather returning an ELF. Or this is a protocol that we see as a bridge to the Swift Concurrency ecosystem then we should make it async but I also think we should be returning a type here that is based on swift-certificates and map to NIOSSL types under the hood.

I think in general this should become a method though and not a computed property.

Would be interested to hear @Lukasa thoughts here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the intention for this to effectively be a current value? The reloader implementation does any work it actually needs be it async or otherwise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory someone may want to avoid having a long running run for some reason?
Or another possible scenario would be if you were to yield new pairs into an async stream continuation from the run method, and iterate the stream in this getter instead of updating some [protected] state (like a LockedValueBox) containing the new values.

In any case, I do think that this is all avoidable and you can handle the asynchrony from elsewhere (like a run method). But not sure if Franz had something else in mind. I suppose having it async gives implementers some more freedom.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer if it's not async. Otherwise calling it from synchronous code requires more complicated state management, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with George's assessment - my understanding was that this represents the "current" value. If so, I think it should be non-throwing, and sync.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is meant to represent the current state yes. After this discussion and some more thought I'm also thinking I should revert this to be sync.

/// A `NIOSSLContextConfigurationOverride` that will be used as part of the NIO application's TLS configuration.
/// Its certificate and private key will be kept up-to-date via whatever mechanism the specific ``CertificateReloader``
/// implementation provides.
var sslContextConfigurationOverride: NIOSSLContextConfigurationOverride { get async }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the intention for this to effectively be a current value? The reloader implementation does any work it actually needs be it async or otherwise.

/// - privateKeyDescription: A ``PrivateKeyDescription``.
@available(macOS 13, iOS 16, tvOS 16, watchOS 9, *)
public init(
refreshingEvery refreshInterval: Duration,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we run the risk of having ambiguity issues with the Duration/TimeAmount overload? Both have similar static funcs for e.g. .seconds etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't actually know the answer to this but at least on the tests (even when adding an availability annotation to match the Duration overload) it resolves to using TimeAmount. I am unsure if some other combination of factors may cause the compiler to become confused.

@gjcairo gjcairo added the 🆕 semver/minor Adds new public API. label May 1, 2025
/// A long-running method to run the ``TimedCertificateReloader`` and start observing updates for the certificate and
/// private key pair.
/// - Important: You *must* call this method to get certificate and key updates.
public func run() async throws {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will throw CancellationError if cancelled during the sleep - is that intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is yeah - we want to make sure we respect cancellation. The interval may be set to a pretty high value (you likely only want to reload your certs every e.g. 10 minutes) so cancellation is important.

/// private key pair.
/// - Important: You *must* call this method to get certificate and key updates.
public func run() async throws {
while !Task.isCancelled {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to do while true to keep behavior consistent regardless of when exactly cancelation happens. Otherwise sometimes you might end up with a clean return and with a thrown cancelation error otherwise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Makes me wonder whether this should throw at all because with that change it can never return cleanly. (Not sure this actually matters much.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it'd be good to get more guidance on the expected behavior - should services return or throw CancellationError. I've seen us do both.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good callout - I'll switch to while true.

With regards to whether it should be throwing or we should return, a (maybe useful?) data point, ServiceLifecycle will throw CancellationError if you try gracefully shutting down a service that's already been cancelled - even if the service itself didn't throw CancellationError when it was cancelled. Also, many of the async APIs Apple provides (like Task.sleep in this case, or async iterators' next) will handle cancellation by throwing a CancellationError too.

I think throwing out of this Task.sleep is particularly useful because you don't want to wait for the whole timeout period, which could be minutes, to finish before the reloading task is cancelled. I do think it's the right thing to do here.

/// - refreshInterval: The interval at which attempts to update the certificate and private key should be made.
/// - certificateDescription: A ``CertificateDescription``.
/// - privateKeyDescription: A ``PrivateKeyDescription``.
public init(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we get an initializer that throws an error if it fails the first reload? Otherwise it might be harder to debug eg an incorrect path to the cert/key.

Copy link
Contributor

@czechboy0 czechboy0 May 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words - failing the first load is fatal, and the server probably wants to fail to start. Failing to reload later can be handled more gracefully - just keep the old value.

It'd be great to make that distinction easier to react to in a server startup sequence. One way is to represent this as a throwing initializer that performs (or fails) the first load.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we get an initializer that throws an error if it fails the first reload? Otherwise it might be harder to debug eg an incorrect path to the cert/key.

I originally flip flopped a bit on this. I ended up landing on not failing because what if your certificate is not yet present but will be? You could have a daemon that takes care of managing your TLS certificates and this could be the first time you're starting your service on a machine and the cert is not yet present, or the cert could be missing for whatever other reason (data corruption, mistakenly deleted, etc).

I was trying to find what other projects do in this scenario in case we want to get some inspiration. cert-manager (link) allows for this possibility and does not fail on first load.

However it isn't lost on me that not failing could potentially make debugging harder. It could also mean an insecure connection might be established with a client which obviously isn't good. So I'm okay with making this init throwing. I just want to make sure we've considered both options.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good on you for being thoughtful on this, and I'm not even saying there shouldn't be a graceful initializer that ignores the first failure as well. I mainly just wanted to have the option to have a throwing initializer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's reasonable to offer this, I'll add additional inits that validate they can be loaded.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im somewhat suspicious that there's a way to break this if you write

init(refreshInterval: .seconds(1)...
the compiler won't know if you meant Duration.seconds or TimeAmount.seconds

That said, i can't come up with a way to make this break

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im thinking about apple/swift-metrics#144

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I had to up the availability requirements to use the async timer sequence, we can just leave the Duration init - I got rid of the TimeAmount overload.

/// A `NIOSSLContextConfigurationOverride` that will be used as part of the NIO application's TLS configuration.
/// Its certificate and private key will be kept up-to-date via whatever mechanism the specific ``CertificateReloader``
/// implementation provides.
var sslContextConfigurationOverride: NIOSSLContextConfigurationOverride { get async }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer if it's not async. Otherwise calling it from synchronous code requires more complicated state management, etc.

let certificate: Certificate?
switch self.certificateDescription.format._backing {
case .der:
certificate = try? Certificate(derEncoded: certificateBytes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should let the error be thrown out and at least logged (here and all the try? below as well)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about throwing - I don't think we want the reloader to stop running when some potentially transient error is encountered.

We could have a max number of failed attempts though, and throw after that number has been reached.

W.r.t logging, we don't currently depend on swift-log in this package, so I'm not sure if that's something we want. I guess we could also just print (which I know is not ideal but it's what we already do for the debug events handler)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay so given the other discussion about having throwing inits, I have made these methods throw, but I am just ignoring the errors from the main reload loop.

Still open questions about having a max number of failed attempts and a logger.

@hamzahrmalik
Copy link

hamzahrmalik commented May 2, 2025

What's the best way to use this with a server?

let tlsConfig = TLSConfiguration.makeServerConfiguration(certificateChain: ??????) // What do i pass here?
tlsConfig.sslContextCallback = { _, promise in
    promise.succeed(timedCertificateReloader.sslContextConfigurationOverride)
}

It'd be nice if TimedCertificateReloader could expose the initial config, since it already read it in its initializer, or if TLSConfiguration could be made without an initial cert-key pair (and rely on the override).

Or do i have to just manually read the first cert-key pair like:

guard let certificateChain = certificateReloader.sslContextConfigurationOverride.certificateChain,
              let privateKey = certificateReloader.sslContextConfigurationOverride.privateKey else {
            throw ConfigurationError.invalidValue("Certificate or key path is invalid")
        }
        var tlsConfiguration = TLSConfiguration.makeServerConfiguration(certificateChain: certificateChain, privateKey: privateKey)
        tlsConfiguration.sslContextCallback = { _, promise in
            // This callback is called for every TLS hello and we'll use the latest cert from the provider.
            promise.succeed(certificateReloader.sslContextConfigurationOverride)
        }

@gjcairo
Copy link
Contributor Author

gjcairo commented May 6, 2025

@hamzahrmalik thanks for bringing this up - I've improved the docs/added some examples, and also added a new makeServerConfiguration(certificateReloader:) extension to make it easier for server side configurations.

/// Configure a ``CertificateReloader`` to observe updates for the certificate and key pair used.
/// - Parameter reloader: A ``CertificateReloader`` to watch for certificate and key pair updates.
/// - Returns: A ``NIOSSL/TLSConfiguration`` that reloads the certificate and key used in its SSL handshake.
mutating public func setCertificateReloader(_ reloader: some CertificateReloader) -> Self {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is mildly confusing to me, it's mutating but also returns self

If the return is just for convenience, then we should maybe make it discardable result

Or was it intended to not mutate the original and make a copy?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, xcode says you need to import NIOCore

Instance method 'succeed' is not available due to missing import of defining module 'NIOCore'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's for convenience, so you can chain calls. I've added @discardableResult.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm with @hamzahrmalik on this one: it's a bit confusing.

Chaining calls isn't a common pattern for config structs or for TLSConfiguration so this shouldn't return Self.

public func run() async throws {
for try await _ in AsyncTimerSequence.repeating(every: self.refreshInterval).cancelOnGracefulShutdown() {
// We don't want to throw out of this method so simply ignore errors thrown from `reloadPair`.
try? self.reloadPair()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could benefit from some logging integration, at least a callback when reload fails? I'd want to hook it into logging/metrics when running in a service.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added logging when the reloading fails.

Copy link
Contributor

@czechboy0 czechboy0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Gus!

/// Configure a ``CertificateReloader`` to observe updates for the certificate and key pair used.
/// - Parameter reloader: A ``CertificateReloader`` to watch for certificate and key pair updates.
/// - Returns: A ``NIOSSL/TLSConfiguration`` that reloads the certificate and key used in its SSL handshake.
mutating public func setCertificateReloader(_ reloader: some CertificateReloader) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm with @hamzahrmalik on this one: it's a bit confusing.

Chaining calls isn't a common pattern for config structs or for TLSConfiguration so this shouldn't return Self.

)
}

/// Attempt to initialize a new ``TimedCertificateReloader``, but throw if the given certificate and private keys cannot be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I propose that rather than having inits which also do the first refresh and can throw that we just have the non-throwing inits and also expose a refreshNow() throwing func?

This means that:

  • the init doesn't have side effects
  • refreshing is explicit
  • any throwing is directly tied to the refresh operation (throwing could come from something else in the init but as a user I don't know that unless I read the code)
  • we can remove some duplication

Generally speaking, if work should be done after creating the object it should either be a separate func which can be made more convenient by a static func which creates and then runs that func before returning the object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So are you suggesting that on top of the reloadNow() function, we also provide a static TimedCertificateReloader.makeReloaderAndValidateLocations(...) (naming TBC)?

@czechboy0 - what do you think of this change? Tagging you since you originally asked for throwing inits.

Personally, if we also provide a static method that does both things, I'm okay with it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I (personally) wouldn't bother adding the static method as well but I have no objections to it being added.

Copy link
Contributor

@czechboy0 czechboy0 May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with either a throwing init or a throwing static func (they're mostly equivalent in my mind), I just want it to be a single call that creates and returns the instance, and we know the instance was able to load the certs at least once (meaning we don't need code that handles one of these that never successfully loaded certs).

I suspect we'll add that call to many of our examples and docs, as I believe that a service that couldn't load certs at least once shouldn't ever mark itself healthy. But I'm okay with that not being the only usage pattern here, in case there are use cases where someone wants to keep running even without certs ever loading successfully.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not being able to load the certs on the first try doesn't mean they won't ever load successfully :) but I agree this could be a common pattern so I do think we should offer a way to do so easily.

Because of the switch from while true in the main loop to using the timer sequence, I've had to increase the platforms in the availability guards anyways. So I'll just leave the non-throwing init using Duration (and get rid of the TimeAmount overload, since it was only there to support older platforms which can't be supported anymore), add a new reloadNow() func, and add a new static method to init and reload.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you illustrate with a code snippet how this composes in a real-world example? When I tried thinking this through, it becomes very convoluted and hard to understand when exactly the server is healthy and running.

Can we at least go back to the static function that does the creation and running the first fetch? We'll just document that it's the recommended one in practice, that's fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just making up APIs here, but hopefully it's clear enough:

let reloader = CertificateReloader(refreshInterval: .minutes(60))
let server = SomeServer(tls: .makeServerConfig(using: reloader))
let serviceGroup = ServiceGroup(services: [reloader, server])
try await serviceGroup.run()

If reloader throws on run() (which should only be possible when doing the first refresh) then the service group will get shutdown almost immediately. I think there's a small timing window between the reloader starting and failing where the server could start but would be a lame duck. In practice this is unlikely to be an issue as the server should be failing its health checks and the timing window before shutting down should be very small.

Copy link
Contributor

@czechboy0 czechboy0 May 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this meant to be the same makeServerConfiguration as in this PR? Because that's a throwing function that will throw an error if the cert hasn't been loaded yet, so it can only be practically used with a CertificateReloader that is guaranteed to have already loaded certs (my preference).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, yeah I misremembered the API. Let's just go with the static, it's easy to understand.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, if we also get the makeClient variant, I'm happy with this PR.

private key pair.
""",
metadata: [
"error": error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add tests for the description of the errors that can be thrown? Structs wrapping enums have bad default descriptions IIRC.

Copy link
Contributor

@czechboy0 czechboy0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thank you @gjcairo 🙏

@gjcairo gjcairo requested a review from glbrntt May 12, 2025 16:43
) throws -> Self {
let override = certificateReloader.sslContextConfigurationOverride

guard override.certificateChain != nil else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be setting the cert chain and private key on the config below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I'm actually not sure it's needed since the reloader takes care of setting the override via sslContextCallback, but I'll set it just in case.

Comment on lines +103 to +105
#if compiler(>=6.0)
@available(macOS 13, iOS 16, watchOS 9, tvOS 16, macCatalyst 16, visionOS 1, *)
#else
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially silly question: is this actually required? Doesn't the * in the availability in the #else branch cover visionOS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - I think that should be fine yes. Do we care about explicitly spelling out all platforms or not really? I know we've been adding visionOS to other packages, because we weren't using *, but should we opted to use * instead? Or do we favour listing them explicitly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. I'm not opposed to spelling it out fully, it just seems overkill to compiler guard it (especially if it was applied more widely)

@gjcairo gjcairo requested a review from glbrntt May 13, 2025 15:26
Copy link
Contributor

@glbrntt glbrntt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs nit but LGTM otherwise

@gjcairo gjcairo merged commit 949cf2d into apple:main May 14, 2025
25 of 26 checks passed
@gjcairo gjcairo deleted the cert-reloader branch May 14, 2025 12:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🆕 semver/minor Adds new public API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants