Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Degirum Detector Integration #17159

Open
wants to merge 2 commits into
base: dev
Choose a base branch
from
Open

Conversation

ChirayuRai
Copy link

Added support for use of DeGirum and PySDK within Frigate.

Proposed change

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New feature
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code
  • Documentation Update

Additional information

  • This PR fixes or closes issue: fixes #
  • This PR is related to issue:

Checklist

  • The code change is tested and works locally.
  • Local tests pass. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • The code has been formatted using Ruff (ruff format frigate)

…e, updated requirements with degirum_headless
Copy link

netlify bot commented Mar 15, 2025

Deploy Preview for frigate-docs ready!

Name Link
🔨 Latest commit 13cd5eb
🔍 Latest deploy log https://app.netlify.com/sites/frigate-docs/deploys/67d85aa2f453170008baffbf
😎 Deploy Preview https://deploy-preview-17159--frigate-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@NickM-27
Copy link
Collaborator

Running a detector on separate hardware is not recommended for frigate as it prioritizes real time detection. What inference speed are you seeing?

@ChirayuRai
Copy link
Author

Running a detector on separate hardware is not recommended for frigate as it prioritizes real time detection. What inference speed are you seeing?

If a user wants to use some local hardware, they just have to run a local AI server, and it'll be handled locally.

But for usage with cloud, I'm getting pretty good fps overall. For instance, with a mobilenet detection model and orca as the hardware, I'm getting on average about 690 FPS and an average frame time of 1.3 ms. When using a slower model like an openvino CPU model, frame times are about 10 ms with roughly 100 fps.

@NickM-27
Copy link
Collaborator

it would probably be good to see some screenshots of the frigate interface with this detector running.

@ChirayuRai
Copy link
Author

ChirayuRai commented Mar 17, 2025

it would probably be good to see some screenshots of the frigate interface with this detector running.

Sure! Is there any specific page or setup you want to see? Or literally just the live view running while the DeGirum detector is selected?

@hawkeye217
Copy link
Collaborator

To start, screenshots of the system metrics and camera metrics pages would be helpful.

@ChirayuRai
Copy link
Author

ChirayuRai commented Mar 18, 2025

To start, screenshots of the system metrics and camera metrics pages would be helpful.

Sure! I have a few combinations here:

Using the cloud location with an orca processor (the in house degirum AI accelerator) along with a mobilenet detection model:
scrn-2025-03-17-14-30-46
scrn-2025-03-17-14-30-56

Using the cloud location with a hailo processor along with a yolov8s model:
scrn-2025-03-17-14-54-01
scrn-2025-03-17-14-54-13

Using the cloud location with an openvino NPU, along with a yolov11n model:
scrn-2025-03-17-16-54-21
scrn-2025-03-17-16-54-34

Using an AI server available on the local network (but not on my local machine), which has an openvino NPU and is running a yolov11n model:
scrn-2025-03-17-16-22-44
scrn-2025-03-17-16-22-55

Using an AI server on my local machine which has an openvino NPU (off the core ultra 7 155h, to be exact), and is running a yolov11n model:
scrn-2025-03-17-16-41-17
scrn-2025-03-17-16-41-29

I would like to note that detections had random spikes because I or someone else would stand in front of the camera for whatever reason. Let me know if you want to see anything else!

@hawkeye217
Copy link
Collaborator

Your inference times are lower than most local hardware based GPUs, which makes me think the values you're seeing are suspect.

Can you clarify what you meant when you said "I would like to note that detections had random spikes because I or someone else would stand in front of the camera for whatever reason."?

@ChirayuRai
Copy link
Author

ChirayuRai commented Mar 18, 2025

Your inference times are lower than most local hardware based GPUs, which makes me think the values you're seeing are suspect.

Meaning the inference times are much faster than expected so it's suspect, or it's much slower than expected, so it's hard to understand how I got those FPS numbers for the benchmarking I did?

Can you clarify what you meant when you said "I would like to note that detections had random spikes because I or someone else would stand in front of the camera for whatever reason."?

I noticed that in the camera portion, the "detections" line would remain at about half of the FPS. However, when a person was in frame, there were spikes in that detection number. For instance, in my hailo screenshots, right at the end you can see that the detections line just spikes and stays up for a little bit. Someone was in view of the camera for that.

@hawkeye217
Copy link
Collaborator

The inference times are much faster than expected. Even with a very good internet connection (< 5ms ping time), I'd expect network latency to take much more than 2ms.

I'm not familiar with how Degirum works, but I'm guessing that because of the way you've written the detector and put everything into the detect_raw function, predict_batch has no new work to do and then next(self.predict_batch) runs and returns None immediately. Then, when there is actual work to do (someone standing in front of the camera), your inference time spikes massively.

@NickM-27
Copy link
Collaborator

Inference times vary depending on model and hardware of course, you can see some examples listed under the recommended hardware page https://docs.frigate.video/frigate/hardware

The fact that the spikes are so large during activity suggest that the testing is not representative of real world performance.

@ChirayuRai
Copy link
Author

ChirayuRai commented Mar 18, 2025

I can try to just stand in front of the camera the entire time to see if detections start spiking up the latency of the inferences? Maybe that will indicate a bit more real world performance? Also, as a side note, I was testing all these on site, right next to the cloud servers/AI servers. So my ping was probably much lower than someone who is testing these inferences from further away.

@NickM-27
Copy link
Collaborator

The concern is that if objects are not being detected then inference speed shouldn't differ so much.

That would be a good starting point to understanding what performance would be, but the concern is that there is an incorrect behavior here. You may want to look at the deepstack detector for more reference on a similar implementation.

@ChirayuRai
Copy link
Author

ChirayuRai commented Mar 19, 2025

The concern is that if objects are not being detected then inference speed shouldn't differ so much.

Ok, so if I understand the frames/detections graph correctly, if a person was to stand in front of the camera for the entire time we're using a properly functioning detector, then the FPS line and detections line should be the same? And if detections are properly occurring, that means the inference speed would be exactly what is expected in real world performance. So, if I can show a graph with 30 fps, 30 detections, then the inference speed graph should be close to real world, right?

That would be a good starting point to understanding what performance would be, but the concern is that there is an incorrect behavior here. You may want to look at the deepstack detector for more reference on a similar implementation.

I could write another PR that uses just a model.predict call instead of predict batch, and that would completely mimic the behavior of deepstack. If using a regular model.predict, or an API call like with deepstack, we have to establish a connection to whatever server is being pinged, get a response, and then close the connection once per frame.

With predict_batch, it just has one websocket connection that is established when we call predict batch on the first element of the generator/iterator that's passed in, and this connection stays open as long as the passed in generator/iterator continues to yield/return objects. On top of that, the results are potentially inaccurate since we're not forcing synchronization at all. So the results returned for frame x might actually have been the results for frame x - 2 or something.

I can add a tracking algorithm to help overcome that? Or I can stick to trying to make it more of a synchronous approach if that's preferred.

@NickM-27
Copy link
Collaborator

Ok, so if I understand the frames/detections graph correctly, if a person was to stand in front of the camera for the entire time we're using a properly functioning detector, then the FPS line and detections line should be the same? And if detections are properly occurring, that means the inference speed would be exactly what is expected in real world performance. So, if I can show a graph with 30 fps, 30 detections, then the inference speed graph should be close to real world, right?

No, multiple detections can be run on the same frame, so detections can be higher than camera fps. The main problem you want to avoid is skipped fps, if the detections fps is high but skipped fps is high that indicates a problem

So the results returned for frame x might actually have been the results for frame x - 2 or something.

that should not be possible given that detectors in Frigate are synchronous

@ChirayuRai
Copy link
Author

that should not be possible given that detectors in Frigate are synchronous

We overcome this by just returning an empty detection result if nothing was returned by predict_batch in time, but ultimately all that's running in a separate thread. So it ends up being async. I could enforce syncing if needed though.

if the detections fps is high but skipped fps is high that indicates a problem

Got it, will work on trying to minimize skipped FPS then.

@NickM-27
Copy link
Collaborator

We overcome this by just returning an empty detection result if nothing was returned by predict_batch in time, but ultimately all that's running in a separate thread. So it ends up being async. I could enforce syncing if needed though.

the frame that is passed in as the input_tensor absolutely must match the data that is returned with the detected objects, otherwise object tracking will be very wrong

@ChirayuRai
Copy link
Author

ChirayuRai commented Mar 19, 2025

Currently I don't have any object tracking implemented, it's purely async. If I was to add some object tracking, then ya I would have to align the frames. Or I could try to just make it block until we have a response. Either way, I understand that some form of syncing needs to occur, and I'm working on implementing exactly that!

@NickM-27
Copy link
Collaborator

Currently I don't have any object tracking implemented, it's purely async. If I was to add some object tracking, then ya I would have to align the frames. Or I could try to just make it block until we have a response. Either way, I understand that some form of syncing needs to occur, and I'm working on implementing exactly that!

I think you misunderstand, Frigate does the object tracking already.

@NickM-27
Copy link
Collaborator

also can you elaborate on We overcome this by just returning an empty detection result if nothing was returned by predict_batch in time? We should be making a best effort to detect objects on every frame, otherwise this will also affect object tracking.

@ChirayuRai
Copy link
Author

also can you elaborate on We overcome this by just returning an empty detection result if nothing was returned by predict_batch in time?

Essentially, the detect_raw is currently just putting truncated_input into a queue, and then that queue is being fed into predict_batch. Right after that happens, we're asking for predict_batch to return whatever inference results it can from this queue. However, if there's no inference results, predict_batch just returns None, which then returns the empty detection result (which is that numpy zero array of size 20 x 6). If we do have a proper inference result, then we reformat the result to then be returned by detect_raw. But this approach isn't blocking detect_raw and making sure that res actually has inference results.

As an example, lets say for an inference result to be properly returned, it would take about 5 frames. On our first frame, let's call it frame X, we go through detect_raw, and put the truncated_input into our queue. From there, that queue is passed into predict_batch, which then starts the whole inference cycle on the truncated_input from that frame. Now, it would only be 5 frames AFTER the initial frame X that we get the proper inference results for frame X. At that point, res would then evaluate to the inference results for frame X. Otherwise, it would evaluate to None. So essentially, predict_batch operates asynchronously from detect_raw, meaning our frames could be out of sync if the inference results from predict_batch don't get returned by the time next(predict_batch) is called. I hope that clears things up. Let me know if any more explanation is needed.

@NickM-27
Copy link
Collaborator

Okay thanks, that makes sense. detect_raw absolutely needs to be blocked such that the tensor_input that is passed in and the returned object data matches

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants