-
Notifications
You must be signed in to change notification settings - Fork 351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Usage in AV applications for measuring AV desynchronization #46
Comments
It should be doable:
|
I'd be intending to do this via the C library (the application I have is written in C), sending a struct containing the channel number & the timestamp at time of sending. On the receiving end I'd send all received samples through the library as well. For the purpose of correct timing it would be crucial to get the timestamp at which the library receives the first tone from the preamble sequence, so is that possible? To extract the timestamp of first reception of the preamble. That way any processing delay can be calculated by substraction of the time of reception of the struct and time of reception of the preamble. Since this is a live stream there is likely always a processing delay, since sample count transmitted can vary depending on codec and muxing strategy. |
The current interface of ggwave does not provide a mechanism to get the exact time of the beginning of the transmission. Still, you can estimate it in the following way:
This will give you an approximate timestamp of the first tone which will be +/- 100 ms from the real timestamp. |
Coming back to this, I have implemented code now that is able to signal timestamps encoded in our audio stream and use that to correlate with timestamps encoded in video pixels. Is it feasible to get the offset to the start of the signal as seen from the end in sample count when decoding, as well as the amount of unused samples trailing in the provided buffer? I believe that should bring me closer to ~1ms accuracy |
To give an example, for the current size I transmit ggwave_encode returns what would be a sample size of 73728, while (from testing) the actual amount of samples after which the decode function starts to return data is around 68832 samples |
The current format of the audio payload when using variable length messages is like this:
To determine
In your case, you have 73728 samples in total which means When you transmit such message, the receiver must receive the first 16 frames of the "begin" sound marker together with the The uncertainty comes during the receiving of the "end" marker. It cannot be predicted how many "end" frames the receiver needs to receive in order to "detect" the "end" marker. In perfect conditions, This explains why the decode function starts to return data earlier than expected. All this can be improved by making the decode function report an estimate of the number of "end" frames that have been received in the current decoding. But this requires modifications in the decoding part. One thing you can try to for your use case is to reduce the number of sound marker frames from 16 to 4. ggwave/include/ggwave/ggwave.h Line 243 in 1a0af88
Reducing this number will degrade the transmission reliability, but it should improve the precision of your detection approach. |
Do your calculations also hold true for fixed length data? We're setting the payloadLength parameter on init for both receiver and sender. Since we just send a struct (7 bytes of string, to memcmp since we sometimes get garbage data, 2 32bit values for Timecode and 1 uint8 for channel number) our length is known beforehand. Trading reliability for accuracy in our usecase seems like a good trade off, so I'll explore that route :) Thanks so far for all the help and the great library! |
The fixed-length mode is a bit different, although the logic is simpler.
So to get a total of 73728 samples this means you are most likely using
But regardless of this, again, the decode function will give you the result at different times, depending on noise and background sound. The reason is that in fixed-length mode, the receiver continuously tries to decode the last So what you observe is that when the receiver receives the first I'm not really sure how to improve the receive timestamp accuracy for this transfer mode. |
Our size is a bit bigger than 8 bytes ;) It's 7 +4 +4 + 1
Samples is bufsize/4 There is a 60ish ms variance on the latency when we calculate it. I'm assuming part is due to the effects you've described, and part is due to "unaligned" reads (our audio frames come in in matching sample counts for the framerate, i.e.: 1920 for PAL (48000/25, not always constant btw, some formats have a switching cadence), audio codecs tend to have different on the wire frame sizes, so things might not line up (i.e. non ggwave data at front/back) when we call decode. |
Thanks for the info. To compute the signal power we can try computing FFTs of 256 samples which should give ~5 ms precission. |
Hi, I'm trying to use your library for measuring the audio delay in a WebRTC communication.
Do you think we could expose this event to the js application providing the |
Hi!
I'm currently looking into writing an application that encodes a timestamp into a sound signal for the purpose of measuring desynchronization between audio and video (I already transmit data in the video by writing a bit pattern in the video pixels by modifying the luminance).
For that purpose it's important that the library gives me the exact timestamp of the start of receiving the preamble, is that doable?
The text was updated successfully, but these errors were encountered: