Discord Bot · Local AI · Persistent Memory
PROJECT MAKI
A local AI chatbot with persistent memory,
character depth, and a Discord home.
About the Project
What Is Maki?
Maki started as a learning project for local LLM inference via Ollama. What began as curiosity about running models without cloud dependencies evolved into a character-driven Discord bot with persistent per-user memory, a personality that genuinely develops over time, and two fully independent personas.
Core Systems
How It Works
Memory System
Every user gets their own JSON file. After each reply, two background LLM passes silently extract structured facts — one for what the user revealed, one for what Maki disclosed about herself. Facts carry weight tiers: core facts persist indefinitely, recent facts have a 30-day TTL, and stale facts expire automatically without cluttering context.
{
"userId": "182734...",
"familiarity": 23,
"lastSeen": "2025-11-14T22:47:00Z",
"history": [ /* rolling 20-turn window */ ],
"userFacts": [
{
"fact": "Works as a software engineer",
"weight": "core",
"addedAt": "2025-10-02"
},
{
"fact": "Recently moved apartments",
"weight": "recent",
"ttl": "2025-12-14"
},
{
"fact": "Mentioned a bad day at work",
"weight": "stale",
"expiredAt": "2025-10-18"
}
],
"selfFacts": [
{
"fact": "Mentioned disliking crowded spaces",
"weight": "core"
}
]
}Familiarity System
A numeric score tracks relationship depth per user. Each exchange adds +1; discovering new personal facts adds +2. The score drives how Maki behaves across five tiers — from guarded and minimal at zero to genuinely open at 50+. Scores survive restarts and can be manually adjusted per user.
Time Context
Every system prompt receives the current time of day and how long it has been since the user last spoke. Maki's tone shifts across six mood windows — more guarded before 6am, more honest after 10pm. Time-since-last-seen is injected in natural language: “it's been about three weeks.”
Loop Detection & Self-Correction
detectLoop() scans the last several assistant turns and flags near-identical responses (>80% similarity). When a loop is detected, correctLoop() fires a second LLM pass with explicit context. The user never sees the repeated reply — only the clean, corrected response is committed to history.
// Detect near-identical recent replies (>80% similarity)
function detectLoop(history, threshold = 0.8) {
const recent = history
.filter(m => m.role === "assistant")
.slice(-4);
for (let i = 0; i < recent.length - 1; i++) {
const sim = similarity(
recent[i].content,
recent[recent.length - 1].content
);
if (sim > threshold) return true;
}
return false;
}
// Fire a correction pass — user never sees the looped reply
async function correctLoop(history, persona) {
const result = await ollama.chat({
...persona.options,
messages: [
...history,
{
role: "user",
content: "[INTERNAL] Your last reply was too similar " +
"to a recent one. Please vary your response."
}
]
});
return result.message.content;
}Dual Persona Architecture
Maki and Yuki are completely independent. Separate memory directories, separate fact stores, separate familiarity scores. Each persona lives in personalities/ with its own persona.js config. Per-channel routing via environment variables lets each Discord channel be assigned a different character.
- 35, Tokyo → Seattle
- Infrastructure engineer
- Reserved, dry humor
- Warms up slowly
maki/memory/
- 26, Osaka → Tokyo
- Graphic designer
- Openly excited
- Warm from message one
yuki/memory/
Development History
Dev Timeline
Nine phases from a Mistral 7B experiment to a production-grade character bot.
Project Kickoff
- discord.js v14 with Mistral 7B as the initial LLM
- Rolling conversation history with username injection into every prompt
- Basic slash commands: /clear, /reset, /status
Personality Build
- System prompt design with detailed speech style guide
- System prompt injection fix — persona now active on every message
- Initial character lore: Tokyo background, reserved demeanor
Model Problems
- Mistral 7B repeated itself and looped on long contexts
- Hallucinated user opinions and invented facts without prompting
- Researched candidate models: LLaMA 3, Qwen3, Phi-3
Model Upgrade
- Migrated to Qwen3 8B — significant quality improvement immediately
- Discovered think: mode for internal chain-of-thought reasoning
- Fixed non-Latin script bleed affecting Japanese text output
Memory System
- Per-user JSON files: rolling history, extracted facts, lastSeen, score
- Two background LLM extraction passes per reply — user facts + self-facts
- Fact weight tiers: core (stable), recent (30-day TTL), stale (expired)
Familiarity System
- Numeric score: +1 per exchange, +2 when new personal facts found
- Five relationship tiers that change how the persona behaves
- Scores persist across sessions and are manually editable
Loop Detection
- detectLoop() scans recent assistant turns for >80% similarity
- correctLoop() fires a second LLM pass — user never sees the repeated reply
- Added repeat_penalty parameter tuning to Ollama config
Character Depth
- Full backstory: brother Naota, Seattle transplant, infrastructure career
- Fixed continuity bug where Maki contradicted earlier self-disclosures
- Addressed prose flattening — response length variance improved
Gemma 4 E4B + Yuki
- Edge-optimised Gemma 4 E4B with internal reasoning layer
- Built Yuki as a fully independent persona with separate memory directory
- UI model switcher + per-channel persona routing via environment variables
Conversation Samples
Screenshots
Real conversation samples coming soon.