Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update device_capabilities.py: Add GTX 1070, 1080; main.py: timeout 90->900 #393

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

FFAMax
Copy link

@FFAMax FFAMax commented Oct 28, 2024

  1. Added few GPUs.
  2. Tuned timeout. On slow setups (~1 token per second) average response may take ~600-1000 tokens. In most cases it will lead to timeout (network error which is not). Fixing to reduce exceptions. Who looking for better performance and know what to do need adjust with a knowledge how it will impact. By default making it will work for most cases.

On slow setups (~1 token per second) average response may take ~600-1000 tokens. In most cases it will lead to timeout (network error which is not). Fixing to reduce exceptions. Who looking for better performance and know what to do need adjust with a knowledge how it will impact. By default making it will work for most cases.
@FFAMax FFAMax changed the title Update device_capabilities.py: Add GTX 1070, 1080 Update device_capabilities.py: Add GTX 1070, 1080; main.py: timeout 90->900 Oct 28, 2024
@dtnewman
Copy link
Contributor

dtnewman commented Nov 3, 2024

Can you double check the FP16 numbers here? Those look a little too low. They are usually halfway between the 8 and 32.

@FFAMax
Copy link
Author

FFAMax commented Nov 3, 2024

Can you double check the FP16 numbers here? Those look a little too low. They are usually halfway between the 8 and 32.

For example take GTX 1080 Ti

According to https://images.nvidia.com/aem-dam/en-zz/Solutions/design-visualization/technologies/turing-architecture/NVIDIA-Turing-Architecture-Whitepaper.pdf

FP16 & INT8 are NA

Based on https://www.techpowerup.com/gpu-specs/geforce-gtx-1080-ti.c2877
FP16 (half) 177.2 GFLOPS (1:64)
What is .177 TFLOPS as mentioned

So they are low probably due no HW support or something like that.

For example for https://www.techpowerup.com/gpu-specs/geforce-gtx-1660-ti.c3364 We can see 2:1 as you mentioned while for 1080 it is 1:64

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants