Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor!: convert all SDK timestamps from HrTime to bigint #5522

Open
wants to merge 29 commits into
base: main
Choose a base branch
from

Conversation

dyladan
Copy link
Member

@dyladan dyladan commented Mar 4, 2025

We currently store and represent the timestamps as high resolution time tuples of 2 numbers [seconds, nanoseconds], but that is almost never what the user wants for export or representation. Protobufjs represents it as a long.js fixed 64-bit integer, which requires an intermediate conversion to bigint before converting to a Fixed64. Using bigint as the underlying storage removes the need for this intermediate conversion. We could store it directly as a long.js fixed64 but this would require taking a dependency on long in the SDK, where bigint is a zero-dependency solution. Also, other protobuf libraries like buf use bigint directly, so I believe it is likely more future-proof to use bigint.

As evidence this is not what users typically want, see all of our own exporters:

  • jaeger
  • zipkin
  • OTLP - In the case of OTLP it's actually converting to bigint first, then converting to Fixed64 ({ high: number, low: number })

@dyladan dyladan requested a review from a team as a code owner March 4, 2025 21:35
Copy link

codecov bot commented Mar 4, 2025

Codecov Report

Attention: Patch coverage is 86.66667% with 16 lines in your changes missing coverage. Please review.

Project coverage is 94.76%. Comparing base (e947bd9) to head (3f01070).

Files with missing lines Patch % Lines
packages/opentelemetry-sdk-trace-base/src/Span.ts 72.00% 7 Missing ⚠️
experimental/packages/sdk-logs/src/LogRecord.ts 42.85% 4 Missing ⚠️
packages/opentelemetry-core/src/common/time.ts 90.90% 2 Missing ⚠️
...sdk-metrics/src/aggregator/ExponentialHistogram.ts 75.00% 2 Missing ⚠️
...l/packages/shim-opencensus/src/metric-transform.ts 88.88% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5522      +/-   ##
==========================================
- Coverage   94.95%   94.76%   -0.19%     
==========================================
  Files         308      307       -1     
  Lines        7927     7927              
  Branches     1604     1608       +4     
==========================================
- Hits         7527     7512      -15     
- Misses        400      415      +15     
Files with missing lines Coverage Δ
...ry-exporter-prometheus/src/PrometheusSerializer.ts 96.10% <100.00%> (ø)
...ntal/packages/otlp-transformer/src/common/utils.ts 88.88% <100.00%> (-11.12%) ⬇️
...tal/packages/otlp-transformer/src/logs/internal.ts 100.00% <ø> (ø)
.../packages/otlp-transformer/src/metrics/internal.ts 100.00% <ø> (ø)
...al/packages/otlp-transformer/src/trace/internal.ts 98.41% <ø> (ø)
...es/sdk-logs/src/export/ConsoleLogRecordExporter.ts 83.33% <ø> (-1.29%) ⬇️
...ges/opentelemetry-exporter-jaeger/src/transform.ts 93.33% <100.00%> (ø)
...ges/opentelemetry-exporter-zipkin/src/transform.ts 90.90% <100.00%> (ø)
...y-sdk-trace-base/src/export/ConsoleSpanExporter.ts 81.25% <100.00%> (ø)
packages/sdk-metrics/src/Instruments.ts 96.00% <100.00%> (ø)
... and 26 more

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@pichlermarc pichlermarc added the target:next-major-release This PR targets the next major release (`next` branch) label Mar 5, 2025
@pichlermarc pichlermarc added this to the OpenTelemetry SDK 2.0 milestone Mar 5, 2025
Copy link
Member

@pichlermarc pichlermarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two suggestions should fix some failing tests. Still looking into the others.

@dyladan
Copy link
Member Author

dyladan commented Mar 6, 2025

I FOUND THE BUG

assert.strictEqual(millisecondsToNanoseconds(1066.969834), 1066969834);

  1 failing

  1) time
       #millisecondsToNanoseconds
         should convert to nanoseconds:
     AssertionError [ERR_ASSERTION]: Expected values to be strictly equal:
+ actual - expected

+ 1067969834n
- 1066969834
     ^

Co-authored-by: David Luna <[email protected]>
import { TimedEvent } from '../TimedEvent';

export interface ReadableSpan {
readonly name: string;
readonly kind: SpanKind;
readonly spanContext: () => SpanContext;
readonly parentSpanContext?: SpanContext;
readonly startTimeUnixNano: bigint;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comment, for any vender attempting to "reuse" this base interface (or any exporter which expects these values), this may be considered a breaking change.

So they should be either optional or documented as a breaking change, probably here and for the exporters consuming this interface

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be "better" (an alternative) to update the HrTime type to potentially be an interface that abstracts out this capability, so all users / consumers would then just use a HrTime with an optional unixNano property.

With an implementation something like the following for compatibility (incomplete)

export interface HrTime extends Array<number> {
    unixNano?: bigint;
}

const time: HrTime = [0, 0];
time.unixNano = BigInt(time[0]) * BigInt(1e9) + BigInt(time[1]);

Object.defineProperty(time, 'unixNano', {
    set(value: bigint) {
        this[0] = Number(value / BigInt(1e9));
        this[1] = Number(value % BigInt(1e9));
    },
    get() {
        return BigInt(this[0]) * BigInt(1e9) + BigInt(this[1]);
    },
    configurable: true,
    enumerable: true
});

Not sure how much TypeScript would complain about this either for legacy users.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this is a major version change I think I'd be ok with this very minor breaking change. Applying workarounds like that would be maybe required for a minor version but i'd prefer not to do that if we can avoid it.

* Converts a TimeInput to a nanosecond unix timestamp
* @param time
*/
export function timeInputToNano(time: api.TimeInput): bigint {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separately a naming nit: The different types for time are inconsistently named in function names:

  • millis, milliseconds: e.g. millisToHrTime, millisecondsToNanoseconds
  • timeUnixNanos, nanos, nanoseconds, nano (singular, e.g. timeInputToNano).

Some of these inconsistencies already existed before this PR, so no big deal.
I think it would be nice if:

  • this function was renamed to timeInputToNanos at least.
  • Perhaps also change some of the new methods, e.g. millisecondsToNanoseconds, to the shorter form (millisToNanos).
  • The OCD side of me likes timeUnixNanos as the most self-explanatory name for this representation of time. That name is used in at least one place in the OTel spec (in the metrics spec I think) and in some of the OTLP types. However, it would likely be a pain to change new functions added in this PR to that naming (millisToTimeUnixNanos, etc.).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll go through this today. In general I agree with that feedback

}

export function nanosecondsToMilliseconds(nanos: bigint): number {
return Number(nanos / 1_000_000n);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This impl (and less so the impl for nanosecondsToMicroseconds below) loses precision: the fractional-milliseconds get dropped.

> t = core.hrTime()
[ 1741582778, 35610255 ]

> core.hrTimeToMilliseconds(t)   # converted directly to millis
1741582778035.6104

# Converting to millis from nanos
> tNanos = core.hrTimeToNanoseconds(t)
1741582778035610255n
> core.nanosecondsToMilliseconds(tNanos)
1741582778035

I think this would work:

Suggested change
return Number(nanos / 1_000_000n);
return Number(nanos / 1_000_000n) + (Number(nanos % 1_000_000n) / 1_000_000)

return Number(nanos / 1_000_000n);
}

export function nanosecondsToMicroseconds(nanos: bigint): number {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above, this loses the fractional microseconds.

Copy link
Member

@pichlermarc pichlermarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this change conceptually.

However, after doing a full review I do still have a few concerns that make me feel very uneasy about it:

  • In an offline discussion, I voiced some concern around the backward compatibility being difficult to manage. After reviewing the final PR I'm still concerned about it as it increases public API surface which we've been trying to reduce in 2.0
    • It's fairly late to make the cut to be included in SDK 2.0 and we're almost one month late with the release. A few people have already adapted their code based on what's currently on main. With a release planned on Monday, there's little time for them to make adjustments. That's the reason for the proposed backward compatibility IIUC.
  • I ran some performance tests locally (npm run test:bench). Not much changes today. I wonder if the change is worth the risk without having the target serialization code in place where this will take effect. Once that is there the performance improvements will be measurable which would make this change an easier sell
    • Alternatively the Logs SDK could be a good starting place to test such a change in a minor release since it's still experimental.

I greatly appreciate all the work that you put into making this happen and I think it has provided us with a good plan on how to move forward. However, when critically considering the reasons behind our choices, I have significant concerns that the benefits of merging it for 2.0 will not outweigh the cost of maintaining the code until the next major release where we can get rid of the backward compatibility.

cc @open-telemetry/javascript-maintainers I would love to hear your opinion on the topic. We need to make a decision soon-ish (merge/don't merge) as otherwise we'll miss the release deadline on Monday.

@dyladan
Copy link
Member Author

dyladan commented Mar 12, 2025

  • In an offline discussion, I voiced some concern around the backward compatibility being difficult to manage. After reviewing the final PR I'm still concerned about it as it increases public API surface which we've been trying to reduce in 2.0

I was originally not going to do the backward compatibility at all. Now that we're so close to release I agree it's probably a good idea, but I would have ideally not done it. Removing HrTime and its associated complexity was one of the main reasons I wanted to do this to begin with.

  • It's fairly late to make the cut to be included in SDK 2.0 and we're almost one month late with the release. A few people have already adapted their code based on what's currently on main. With a release planned on Monday, there's little time for them to make adjustments. That's the reason for the proposed backward compatibility IIUC.

Yeah as mentioned above this is the main reason we decided to do back compat.

  • I ran some performance tests locally (npm run test:bench). Not much changes today. I wonder if the change is worth the risk without having the target serialization code in place where this will take effect. Once that is there the performance improvements will be measurable which would make this change an easier sell

I would not expect there to be significant performance improvements for a couple reasons:

  1. The backwards compatibility code is still computing HrTimes in some cases
  2. The timestamps are an extremely small part of the overall cost of the compute. Possibly it wouldn't result in a measurable improvement even if they were removed entirely.
  3. I think the test:bench does not include protobuf serialization? That would be the main area of improvement anyway if there is any.
  • Alternatively the Logs SDK could be a good starting place to test such a change in a minor release since it's still experimental.

I agree although it would be nice to do it everywhere to keep things consistent across the codebase.


If we're going to do backwards compatibility anyway, it really doesn't need to be included in the 2.0. We could introduce this in a minor change as from a user perspective it is just adding new timestamp properties to the readable span. Everything else is an internal detail. We could introduce the new properties and run them in parallel with the old ones until 3.0, then decide if its worth removing the old ones or not at that point.

const START_TIME = millisecondsToNanoseconds(
performance.now() + getTimeOrigin()
);
const END_TIME = millisecondsToNanoseconds(performance.now() + getTimeOrigin());
Copy link
Member

@legendecas legendecas Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found multiple occurrences of hrTime() need to be converted to nanos. Maybe we can add a util function performanceNowNanos().

readonly startTime: HrTime;
readonly endTime: HrTime;
readonly status: SpanStatus;
readonly attributes: Attributes;
readonly links: Link[];
readonly events: TimedEvent[];
readonly duration: HrTime;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we decide to keep compatibility, this should be kept. Checked on the spec, duration is not required by it and we may not need to add a nanos equivalent.

@pichlermarc
Copy link
Member

If we're going to do backwards compatibility anyway, it really doesn't need to be included in the 2.0. We could introduce this in a minor change as from a user perspective it is just adding new timestamp properties to the readable span. Everything else is an internal detail. We could introduce the new properties and run them in parallel with the old ones until 3.0, then decide if its worth removing the old ones or not at that point.

Yes, I agree. Let's move forward this way. Then we can also deprecate the other properties to signal to people to move over to the new ones. This will ensure a softer landing for users of these properties and will unblock the 2.0 release.

I discussed this with the other @open-telemetry/javascript-maintainers offline earlier and we were in agreement that we'll push it out of the 2.0 scope.

@pichlermarc pichlermarc removed the target:next-major-release This PR targets the next major release (`next` branch) label Mar 12, 2025
@pichlermarc pichlermarc removed this from the OpenTelemetry SDK 2.0 milestone Mar 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants