Skip to content

add llamafile 🦙📁 #871

Open
@not-lain

Description

@not-lain

llamafile is a local app (similar to llama.cpp) to run llms in a distributed way from a single file

library can be used on both .gguf and .llamafile files

repo : https://github.com/Mozilla-Ocho/llamafile

snippets

linux and mac

wget https://huggingface.co/Mozilla/Meta-Llama-3.1-8B-llamafile/resolve/main/Meta-Llama-3.1-8B.Q6_K.llamafile
chmod +x Meta-Llama-3.1-8B.Q6_K.llamafile
./Meta-Llama-3.1-8B.Q6_K.llamafile -p 'four score and seven'

windows
(download and rename it using .exe)

curl -o Meta-Llama-3.1-8B.Q6_K.exe https://huggingface.co/Mozilla/Meta-Llama-3.1-8B-llamafile/resolve/main/Meta-Llama-3.1-8B.Q6_K.llamafile
.\Meta-Llama-3.1-8B.Q6_K.exe -p 'four score and seven'

gguf

wget https://github.com/Mozilla-Ocho/llamafile/releases/download/0.8.13/llamafile-0.8.13
wget https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q6_K.gguf
chmod +x llamafile-0.8.13
./llamafile-0.8.13 -m tinyllama-1.1b-chat-v1.0.Q6_K.gguf -p 'four score and'

notes

  • The Windows example you might want to change to TinyLLaMA though in case that 8B model exceeds the Windows 4GB .exe file size limit. It's also possible to say .\llamafile-0.8.13 -m foo.llamafile to get around the limit (similar to GGUF snippet)
  • they do multi-model too with e.g. llava image processing. https://huggingface.co/Mozilla/llava-v1.5-7b-llamafile That's their flagship model. If you just say ./llava-v1.5-7b-q4.llamafile it'll launch an HTTP server, open a tab in your desktop's browser, and you can chat with the model, upload an image file, ask it to analyze what it sees, etc.
  • binary has a different name for each library release

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions