native-llm

The easiest way to run AI models locally.

Quick Start • Why native-llm • Models • Documentation

🚀 Quick Start

npm install native-llm

import { LLMEngine } from "native-llm"

const engine = new LLMEngine({ model: "gemma" })

const result = await engine.generate({
  prompt: "Explain quantum computing to a 5-year-old"
})

console.log(result.text)

That's it. Model downloads automatically. GPU detected automatically. Just works.

🎯 Why native-llm?

A friendly wrapper around llama.cpp that handles the hard parts:

Without native-llm	With native-llm
Find GGUF model URLs	`model: "gemma"`
Configure HuggingFace auth	Auto from `HF_TOKEN`
20+ lines boilerplate	3 lines
Research model benchmarks	Curated recommendations

Local vs Cloud

	☁️ Cloud AI	🏠 native-llm
Cost	$0.001 - $0.10 per query	Free forever
Speed	1-20 seconds	< 100ms
Privacy	Data sent to servers	100% local
Limits	Rate limits & quotas	Unlimited
Offline	❌ Requires internet	✅ Works offline

🎨 Models

Simple Aliases

new LLMEngine({ model: "gemma" }) // Best balance (default)
new LLMEngine({ model: "gemma-fast" }) // Maximum speed
new LLMEngine({ model: "qwen-coder" }) // Code generation
new LLMEngine({ model: "deepseek" }) // Complex reasoning

Smart Recommendations

import { LLMEngine } from "native-llm"

// Get the right model for your use case
const model = LLMEngine.getModelForUseCase("code") // → qwen-2.5-coder-7b
const model = LLMEngine.getModelForUseCase("fast") // → gemma-3n-e2b
const model = LLMEngine.getModelForUseCase("quality") // → gemma-3-27b

// List all available models
const models = LLMEngine.listModels()
// → [{ id: "gemma-3n-e4b", name: "Gemma 3n E4B", size: "5 GB", ... }, ...]

Performance (M1 Ultra)

Model	Size	Speed	Best For
🚀 Gemma 3n E2B	3 GB	36 tok/s	Maximum speed
⭐ Gemma 3n E4B	5 GB	18 tok/s	Best balance
💻 Qwen 2.5 Coder	5 GB	23 tok/s	Code generation
🧠 DeepSeek R1	5 GB	9 tok/s	Complex reasoning
👑 Gemma 3 27B	18 GB	5 tok/s	Maximum quality

✨ Features

Feature	Description
📦 Zero Config	Models download automatically, GPU detected automatically
🎯 Smart Defaults	Curated models, sensible parameters, thinking-mode handled
🔥 Native Speed	Direct llama.cpp bindings — no Python, no subprocess
🍎 Metal GPU	Full Apple Silicon acceleration out of the box
🖥️ Cross-Platform	macOS, Linux, Windows with CUDA support
🌊 Streaming	Real-time token-by-token output
📝 TypeScript	Full type definitions included

🔑 Setup for Gemma Models

Gemma models require a free HuggingFace token:

export HF_TOKEN="hf_your_token_here"

Get yours in 30 seconds: huggingface.co/settings/tokens

📚 Documentation

→ Full Documentation — Streaming, chat API, custom models, and more.

MIT License · Made with ❤️ by Sebastian Software
_{Powered by llama.cpp & node-llama-cpp}

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.github		.github
.husky		.husky
docs		docs
scripts		scripts
src		src
.editorconfig		.editorconfig
.gitignore		.gitignore
.lintstagedrc		.lintstagedrc
.prettierignore		.prettierignore
.prettierrc		.prettierrc
.release-it.json		.release-it.json
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
codecov.yml		codecov.yml
eslint.config.js		eslint.config.js
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
tsconfig.json		tsconfig.json
tsup.config.ts		tsup.config.ts
typedoc.json		typedoc.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

native-llm

🚀 Quick Start

🎯 Why native-llm?

Local vs Cloud

🎨 Models

Simple Aliases

Smart Recommendations

Performance (M1 Ultra)

✨ Features

🔑 Setup for Gemma Models

📚 Documentation

About

Uh oh!

Releases 1

Sponsor this project

Uh oh!

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

native-llm

🚀 Quick Start

🎯 Why native-llm?

Local vs Cloud

🎨 Models

Simple Aliases

Smart Recommendations

Performance (M1 Ultra)

✨ Features

🔑 Setup for Gemma Models

📚 Documentation

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages