Skip to content

Getting Started

Install

bash
npm install @agent-layer-zero/dendrite

Building with Vue or React? See Framework Packages for prebuilt chat widgets via @agent-layer-zero/dendrite-vue and @agent-layer-zero/dendrite-react.

Basic Usage

Two lines to set up, one line to chat. No worker files, no extra imports.

ts
import { createNeuron } from '@agent-layer-zero/dendrite'

const neuron = createNeuron({
  modelId: 'gemma-2-2b-it-q4f16_1-MLC',
  systemPrompt: 'You are a helpful cooking assistant.',
  onProgress: (percent, text) => {
    console.log(`Loading: ${percent}% — ${text}`)
  },
})

// Stream tokens
for await (const token of neuron.send('How do I make pasta?')) {
  process.stdout.write(token)
}

// Or get the full response
const reply = await neuron.complete('What about carbonara?')
console.log(reply)

What Happens

  1. createNeuron() starts downloading the model immediately (~0.5–5 GB, cached after first download)
  2. onProgress fires with download percentage
  3. Once loaded, call send() or complete() to chat
  4. Tokens stream back as an async generator
  5. Conversation history is tracked automatically

Available Models

A curated subset of WebLLM models, grouped by tier. Each model has both q4f16_1 (default; smaller, faster) and q4f32_1 (higher precision for GPUs without strong f16 support) variants.

ModelVRAMTier
Qwen3 0.6B~0.5 GBTiny
SmolLM2 360M~0.3 GBTiny
Llama 3.2 1B~0.9 GBTiny
Qwen3 1.7B~1.5 GBLight
SmolLM2 1.7B~1.5 GBLight
Gemma 2 2B (default)~1.7 GBLight
Qwen3 4B~3.2 GBStandard
Llama 3.2 3B~2.3 GBStandard
Phi 3.5 Mini~2.3 GBStandard
Qwen3 8B~5.5 GBHeavy
Llama 3.1 8B~5.0 GBHeavy
Gemma 2 9B~6.5 GBHeavy
DeepSeek R1 1.5B / 7B / 8B1.4–5.0 GBReasoning

Default is Gemma 2 2B IT — best balance of size, instruction following, and reproduction of structured data (contact info, lists). Smaller models load faster but follow instructions less precisely. Reasoning-tier models emit chain-of-thought tokens; pick from Tiny/Light/Standard/Heavy for typical chat use.

Full programmatic list: import { MODEL_OPTIONS } from '@agent-layer-zero/dendrite'.

Performance: Web Worker (Optional)

By default Dendrite runs on the main thread — this works great and requires zero setup.

For better performance (offloads model computation to a separate thread), you can optionally pass a Web Worker. See the Worker Setup guide.

Part of the AgentLayerZero platform