Skip to content

oscartiz/rust-micro-llm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

rust-micro-llm

A minimal CLI that runs Llama-3.2-1B-Instruct locally in Rust on Apple Silicon (Metal), built on top of candle and Hugging Face's GGUF tooling.

The whole program is one ~90-line main.rs: load the tokenizer + a 4-bit quantized GGUF (~1 GB), feed in a chat-formatted prompt, and stream tokens out until <|end_of_text|> or <|eot_id|>.

Usage

cargo run --release -- --prompt "Explain RoPE in three sentences." -n 200
-p, --prompt <PROMPT>     Prompt to generate from
-n, --sample-len <N>      Max tokens to generate (default: 200)
-t, --temperature <T>     Sampling temperature (default: 0.8)

First run downloads two artifacts from the Hugging Face Hub:

  • unsloth/Llama-3.2-1B-Instructtokenizer.json
  • bartowski/Llama-3.2-1B-Instruct-GGUFLlama-3.2-1B-Instruct-Q4_K_M.gguf

They're cached under the standard ~/.cache/huggingface directory.

How it works

prompt (formatted with Llama-3 chat tokens)
   │
   ▼
tokenizer.json  ──►  tokens (Vec<u32>)
                            │
                            ▼
        ModelWeights::from_gguf  (Q4_K_M, ~1 GB)
                            │
                            ▼
        For each new token:
          • slice the most recent `context_size` tokens
          • Tensor::new(...).unsqueeze(0)
          • model.forward(input, current_pos)  → logits
          • LogitsProcessor::sample(logits)    → next_token
          • print + push, advance current_pos

Inference runs on Metal when available (Device::new_metal(0)) and falls back to CPU otherwise.

Stack

Notes

  • Generation uses a simple temperature-based sampler (no top-k / top-p / repetition penalty).
  • The KV cache is implicit in model.forward(..., current_pos) — context size grows by 1 each step after the prompt is consumed.
  • Stop tokens are hard-coded for Llama-3 (128001, 128009).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages