Getting Started

Install

bash

npm install @agent-layer-zero/dendrite

Building with Vue or React? See Framework Packages for prebuilt chat widgets via @agent-layer-zero/dendrite-vue and @agent-layer-zero/dendrite-react.

Basic Usage

Two lines to set up, one line to chat. No worker files, no extra imports.

import { createNeuron } from '@agent-layer-zero/dendrite'

const neuron = createNeuron({
  modelId: 'gemma-2-2b-it-q4f16_1-MLC',
  systemPrompt: 'You are a helpful cooking assistant.',
  onProgress: (percent, text) => {
    console.log(`Loading: ${percent}% — ${text}`)
  },
})

// Stream tokens
for await (const token of neuron.send('How do I make pasta?')) {
  process.stdout.write(token)
}

// Or get the full response
const reply = await neuron.complete('What about carbonara?')
console.log(reply)

What Happens

createNeuron() starts downloading the model immediately (~0.5–5 GB, cached after first download)
onProgress fires with download percentage
Once loaded, call send() or complete() to chat
Tokens stream back as an async generator
Conversation history is tracked automatically

Available Models

A curated subset of WebLLM models, grouped by tier. Each model has both q4f16_1 (default; smaller, faster) and q4f32_1 (higher precision for GPUs without strong f16 support) variants.

Model	VRAM	Tier
Qwen3 0.6B	~0.5 GB	Tiny
SmolLM2 360M	~0.3 GB	Tiny
Llama 3.2 1B	~0.9 GB	Tiny
Qwen3 1.7B	~1.5 GB	Light
SmolLM2 1.7B	~1.5 GB	Light
Gemma 2 2B (default)	~1.7 GB	Light
Qwen3 4B	~3.2 GB	Standard
Llama 3.2 3B	~2.3 GB	Standard
Phi 3.5 Mini	~2.3 GB	Standard
Qwen3 8B	~5.5 GB	Heavy
Llama 3.1 8B	~5.0 GB	Heavy
Gemma 2 9B	~6.5 GB	Heavy
DeepSeek R1 1.5B / 7B / 8B	1.4–5.0 GB	Reasoning

Default is Gemma 2 2B IT — best balance of size, instruction following, and reproduction of structured data (contact info, lists). Smaller models load faster but follow instructions less precisely. Reasoning-tier models emit chain-of-thought tokens; pick from Tiny/Light/Standard/Heavy for typical chat use.

Full programmatic list: import { MODEL_OPTIONS } from '@agent-layer-zero/dendrite'.

Performance: Web Worker (Optional)

By default Dendrite runs on the main thread — this works great and requires zero setup.

For better performance (offloads model computation to a separate thread), you can optionally pass a Web Worker. See the Worker Setup guide.

Getting Started ​

Install ​

Basic Usage ​

What Happens ​