feat: implement automatic driver radio transcription #124

kyujin-cho · 2024-06-30T09:07:36Z

Abstract

This patch adds new feature which displays transcript of every driver radio.

What's changed

`live-backend`

New /api/audio API added
As F1TV's live timing CDN (https://livetiming.formula1.com/static) does not permit cross-origin requests, every calls to obtain the speech file should be proxied through the backend. Since routing every request to the file can marginally increase traffic burden of the live-backend (and also potential IP ban from F1TV CDN), I have decided to make this API only as an optional feature, which can be opted in by defining ENABLE_AUDIO_FETCH environment variable when loading the server process.

`dash`

Automatic Speech Recognition pipeline
This pipeline accepts a sampled audio data and then inferences the transcription data with help of Transformers.js and OpenAI's Whisper Model. There are loads of whisper-based models, but based on my experiences, I have made three models as available option in this project (check dash/src/app/(nav)/settings/page.tsx). Those options will be labeled as More Quality, Balanced and Low Latency as it stands.
That says, only the computational resource of the client browser will be affected when executing the pipeline; API backend will not take part of the process.

vercel · 2024-06-30T09:07:39Z

@kyujin-cho is attempting to deploy a commit to the f1-dash Team on Vercel.

A member of the Team first needs to authorize it.

vercel · 2024-06-30T09:38:24Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
f1-dash	❌ Failed (Inspect)			Jul 7, 2024 10:50am

SpatzlHD · 2024-06-30T10:59:51Z

Quick question don't we already have the audio file when it is sent via the live socket? Couldn't we just use that or am I missing something?

kyujin-cho · 2024-06-30T11:58:32Z

@SpatzlHD AFAIK the pathname to the audio file - not the actual file - is the data client only receives via SSE.

SpatzlHD · 2024-06-30T12:13:52Z

@SpatzlHD AFAIK the pathname to the audio file - not the actual file - is the data client only receives via SSE.

But there has to be an audio file to play it or?

slowlydev

Thanks a lot for this PR. I also wanted to implement this directly into f1-dash after seeing someone from the community discord do it with a local python server.

I was thinking about doing it with rust and web assembly but also using the whisper model. Not sure if performance would be any better.

Please take a look at my comments and also CORS also is weird to me because we can play the audio with no CORS problem.

Also currently the build is failing because of webkitAudioContext not existing in the types.

slowlydev · 2024-07-02T17:01:04Z

dash/src/app/(nav)/settings/page.tsx

+
+			const transcriptionStorage = localStorage.getItem("transcription");
+			const transcriptionSettings: TranscriptionSettings = transcriptionStorage ? JSON.parse(transcriptionStorage) : { enableTranscription: false, whisperModel: "" };
+
+			setEnableTranscription(transcriptionSettings.enableTranscription);
+			setTranscriptionModel(transcriptionSettings.whisperModel);		


Maybe a separate Context for either transcription or settings in general would be better than adding it to the mode one. As its primarily used for the swishy thingy in the top right.

dash/src/app/(nav)/settings/page.tsx

dash/src/components/TeamRadioMessage.tsx

slowlydev · 2024-07-02T17:04:08Z

dash/src/components/TeamRadioMessage.tsx

 }
+
+const SkeletonTranscription = () => {
+	const animateClass = "h-6 animate-pulse rounded-md bg-zinc-800";


seems a bit tall, either do one or two thinner ones please

dash/src/components/TeamRadios.tsx

slowlydev · 2024-07-02T17:09:22Z

dash/src/lib/constants.ts

+function mobileTabletCheck() {
+    // https://stackoverflow.com/questions/11381673/detecting-a-mobile-browser
+    let check = false;
+    (function (a: string) {
+        if (
+            /(android|bb\d+|meego).+mobile|avantgo|bada\/|blackberry|blazer|compal|elaine|fennec|hiptop|iemobile|ip(hone|od)|iris|kindle|lge |maemo|midp|mmp|mobile.+firefox|netfront|opera m(ob|in)i|palm( os)?|phone|p(ixi|re)\/|plucker|pocket|psp|series(4|6)0|symbian|treo|up\.(browser|link)|vodafone|wap|windows ce|xda|xiino|android|ipad|playbook|silk/i.test(
+                a,
+            ) ||
+            /1207|6310|6590|3gso|4thp|50[1-6]i|770s|802s|a wa|abac|ac(er|oo|s\-)|ai(ko|rn)|al(av|ca|co)|amoi|an(ex|ny|yw)|aptu|ar(ch|go)|as(te|us)|attw|au(di|\-m|r |s )|avan|be(ck|ll|nq)|bi(lb|rd)|bl(ac|az)|br(e|v)w|bumb|bw\-(n|u)|c55\/|capi|ccwa|cdm\-|cell|chtm|cldc|cmd\-|co(mp|nd)|craw|da(it|ll|ng)|dbte|dc\-s|devi|dica|dmob|do(c|p)o|ds(12|\-d)|el(49|ai)|em(l2|ul)|er(ic|k0)|esl8|ez([4-7]0|os|wa|ze)|fetc|fly(\-|_)|g1 u|g560|gene|gf\-5|g\-mo|go(\.w|od)|gr(ad|un)|haie|hcit|hd\-(m|p|t)|hei\-|hi(pt|ta)|hp( i|ip)|hs\-c|ht(c(\-| |_|a|g|p|s|t)|tp)|hu(aw|tc)|i\-(20|go|ma)|i230|iac( |\-|\/)|ibro|idea|ig01|ikom|im1k|inno|ipaq|iris|ja(t|v)a|jbro|jemu|jigs|kddi|keji|kgt( |\/)|klon|kpt |kwc\-|kyo(c|k)|le(no|xi)|lg( g|\/(k|l|u)|50|54|\-[a-w])|libw|lynx|m1\-w|m3ga|m50\/|ma(te|ui|xo)|mc(01|21|ca)|m\-cr|me(rc|ri)|mi(o8|oa|ts)|mmef|mo(01|02|bi|de|do|t(\-| |o|v)|zz)|mt(50|p1|v )|mwbp|mywa|n10[0-2]|n20[2-3]|n30(0|2)|n50(0|2|5)|n7(0(0|1)|10)|ne((c|m)\-|on|tf|wf|wg|wt)|nok(6|i)|nzph|o2im|op(ti|wv)|oran|owg1|p800|pan(a|d|t)|pdxg|pg(13|\-([1-8]|c))|phil|pire|pl(ay|uc)|pn\-2|po(ck|rt|se)|prox|psio|pt\-g|qa\-a|qc(07|12|21|32|60|\-[2-7]|i\-)|qtek|r380|r600|raks|rim9|ro(ve|zo)|s55\/|sa(ge|ma|mm|ms|ny|va)|sc(01|h\-|oo|p\-)|sdk\/|se(c(\-|0|1)|47|mc|nd|ri)|sgh\-|shar|sie(\-|m)|sk\-0|sl(45|id)|sm(al|ar|b3|it|t5)|so(ft|ny)|sp(01|h\-|v\-|v )|sy(01|mb)|t2(18|50)|t6(00|10|18)|ta(gt|lk)|tcl\-|tdg\-|tel(i|m)|tim\-|t\-mo|to(pl|sh)|ts(70|m\-|m3|m5)|tx\-9|up(\.b|g1|si)|utst|v400|v750|veri|vi(rg|te)|vk(40|5[0-3]|\-v)|vm40|voda|vulc|vx(52|53|60|61|70|80|81|83|85|98)|w3c(\-| )|webc|whit|wi(g |nc|nw)|wmlb|wonu|x700|yas\-|your|zeto|zte\-/i.test(
+                a.substr(0, 4),
+            )
+        )
+            check = true;
+    })(
+        navigator.userAgent ||
+            navigator.vendor ||
+            ("opera" in window && typeof window.opera === "string"
+                ? window.opera
+                : ""),
+    );
+    return check;


not a huge fan of this regex stuff. is there any other way?
also the function does not belong in the constants file rather in a separate file, also make it an arrow function

Detecting the device type helps the viewer to assume approx. size of the device's RAM, as it is the fatal point of concern when loading the model. But I totally agree with your take on the chunky implementation; how do you think about just replacing the whole logic with ua-parser-js library?

live-backend/src/server/audio.rs

dash/src/components/TeamRadios.tsx

Co-authored-by: slowlydev <[email protected]>

fdezwonders · 2025-02-17T21:31:46Z

Any update on this?

slowlydev · 2025-02-18T21:07:25Z

Any update on this?

still needs some adjustments and also the lates changes need to be merged into it as well. I hope I can pick this up some time soon™

…adio-speech-recognition

Grafaffel · 2025-09-01T13:55:34Z

Are there any plans to add speaker diarization like in this example? Or would this be too resource intensive to run, while streaming the live data?

implement automatic driver radio transcription

18103ae

fix indent

7810853

slowlydev changed the base branch from main to develop June 30, 2024 09:52

vercel bot temporarily deployed to Preview June 30, 2024 10:38 Inactive

Merge branch 'develop' into feature/radio-speech-recognition

fb92d8f

slowlydev requested changes Jul 2, 2024

View reviewed changes

kyujin-cho and others added 8 commits July 6, 2024 23:20

Update dash/src/components/TeamRadioMessage.tsx

9bf0ed2

Co-authored-by: slowlydev <[email protected]>

fix lint

6698a9c

remove console.log

0f52157

remove reference to webkitAudioContext

74e9585

import constant directly

d7a4f6d

use headless' select API

d7d4716

update response code

33f9955

Merge branch 'develop' into feature/radio-speech-recognition

813a48b

vercel bot temporarily deployed to Preview July 7, 2024 10:50 Inactive

Merge branch 'develop' into feature/radio-speech-recognition

6b289c8

slowlydev added 5 commits April 20, 2025 22:55

Merge branch 'develop' of github.com:slowlydev/f1-dash into feature/r…

60d6a62

…adio-speech-recognition

chore: merge leftovers

00649d8

fix: double clsx import

a0afa0f

refactor: use the transcription store

98ff2b3

refactor(live): audio endpoint

6dcd8a0

slowlydev linked an issue May 5, 2025 that may be closed by this pull request

TeamRadio Audio Transcription #259

Open

Uh oh!

feat: implement automatic driver radio transcription #124

Are you sure you want to change the base?

feat: implement automatic driver radio transcription #124

Uh oh!

Conversation

kyujin-cho commented Jun 30, 2024

Abstract

What's changed

live-backend

dash

Uh oh!

vercel bot commented Jun 30, 2024

Uh oh!

vercel bot commented Jun 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SpatzlHD commented Jun 30, 2024

Uh oh!

kyujin-cho commented Jun 30, 2024

Uh oh!

SpatzlHD commented Jun 30, 2024

Uh oh!

slowlydev left a comment

Choose a reason for hiding this comment

Uh oh!

slowlydev Jul 2, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

slowlydev Jul 2, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

slowlydev Jul 2, 2024

Choose a reason for hiding this comment

Uh oh!

kyujin-cho Jul 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

fdezwonders commented Feb 17, 2025

Uh oh!

slowlydev commented Feb 18, 2025

Uh oh!

Grafaffel commented Sep 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

`live-backend`

`dash`

vercel bot commented Jun 30, 2024 •

edited

Loading

kyujin-cho Jul 6, 2024 •

edited

Loading