Open
Description
llamafile is a local app (similar to llama.cpp) to run llms in a distributed way from a single file
library can be used on both .gguf
and .llamafile
files
repo : https://github.com/Mozilla-Ocho/llamafile
snippets
linux and mac
wget https://huggingface.co/Mozilla/Meta-Llama-3.1-8B-llamafile/resolve/main/Meta-Llama-3.1-8B.Q6_K.llamafile
chmod +x Meta-Llama-3.1-8B.Q6_K.llamafile
./Meta-Llama-3.1-8B.Q6_K.llamafile -p 'four score and seven'
windows
(download and rename it using .exe)
curl -o Meta-Llama-3.1-8B.Q6_K.exe https://huggingface.co/Mozilla/Meta-Llama-3.1-8B-llamafile/resolve/main/Meta-Llama-3.1-8B.Q6_K.llamafile
.\Meta-Llama-3.1-8B.Q6_K.exe -p 'four score and seven'
gguf
wget https://github.com/Mozilla-Ocho/llamafile/releases/download/0.8.13/llamafile-0.8.13
wget https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q6_K.gguf
chmod +x llamafile-0.8.13
./llamafile-0.8.13 -m tinyllama-1.1b-chat-v1.0.Q6_K.gguf -p 'four score and'
notes
- The Windows example you might want to change to TinyLLaMA though in case that 8B model exceeds the Windows 4GB .exe file size limit. It's also possible to say
.\llamafile-0.8.13 -m foo.llamafile
to get around the limit (similar to GGUF snippet) - they do multi-model too with e.g. llava image processing. https://huggingface.co/Mozilla/llava-v1.5-7b-llamafile That's their flagship model. If you just say
./llava-v1.5-7b-q4.llamafile
it'll launch an HTTP server, open a tab in your desktop's browser, and you can chat with the model, upload an image file, ask it to analyze what it sees, etc. - binary has a different name for each library release
Metadata
Metadata
Assignees
Labels
No labels