Token Sense

TokenSense: the engine layer

A. Processes (4 small scripts, one optional)

  1. ingest.py (ETL)
  • Pull raw sources → split → embed → write content.*
  • Extract forms/instances → update token.*
  • Optional edge seeding from citations/links → lat.edges(rel='refers_to'|'quotes')
  1. lattice_maint.py (nightly)
  • Decay + reinforcement (co-activations) → lat.edges.weight
  • Refresh lat.neighbors (kNN) by space
  • Re/cluster into lat.cells, update lat.memberships
  • Maintain spiral_angle, radial_distance, radial_index
  • Log to lat.topology_events
  1. signals.py (near-real-time)
  • On user events, append lat.activations(kind,node_id,strength,phase)
  • (Optional) bump edges along the path just traversed
  • Lightweight; can run as a small web worker or queue consumer
  1. housekeeping.py (weekly)
  • Vacuum/analyze hot tables, rotate partitions (if enabled)
  • Prune very low-weight / stale edges
  • Validate constraints (no orphan kinds/ids—triggers already help)
  1. (optional) train_adapters.py
  • Prepares training corpora, runs LoRA fine-tunes, registers new adapters (see D)

Execution cadences: signals.py (continuous), ingest.py (on demand), lattice_maint.py (hourly or nightly), housekeeping.py (weekly).


B. Minimal configs each script reads

  • DB DSN; embedding model name; LLM model name; top-k for neighbors; decay/alpha for reinforcement.
  • Read constants from lat.config (Φ, k, weights for S, etc.)—we already added that table.

C. A few small schema nits to add (for models/adapters)

If you want to track which model made which vectors/answers and manage LoRA adapters, add:

-- Registry of base models (LLMs & embedders)
CREATE TABLE IF NOT EXISTS lat.model_registry (
  model_id   BIGSERIAL PRIMARY KEY,
  name       TEXT UNIQUE NOT NULL,   -- e.g., 'Qwen2.5-7B-Instruct', 'bge-m3', 'arctic-embed-l-v2'
  kind       TEXT NOT NULL CHECK (kind IN ('llm','embedder')),
  version    TEXT,
  context_len INT,
  meta       JSONB DEFAULT '{}'::jsonb,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- LoRA adapters tied to a base model
CREATE TABLE IF NOT EXISTS lat.lora_adapters (
  adapter_id BIGSERIAL PRIMARY KEY,
  base_model_id BIGINT NOT NULL REFERENCES lat.model_registry(model_id) ON DELETE CASCADE,
  name       TEXT NOT NULL,           -- e.g., 'ogs-sense-qa-v1'
  r          INT  NOT NULL,           -- rank
  alpha      INT  NOT NULL,
  target_modules TEXT[] NOT NULL,     -- e.g., '{q_proj,k_proj,v_proj,o_proj}'
  artifact_uri TEXT NOT NULL,         -- path to safetensors/peft dir
  metrics    JSONB DEFAULT '{}'::jsonb,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  UNIQUE (base_model_id, name)
);

-- Where each vector came from (so you can re-embed later)
ALTER TABLE IF NOT EXISTS content.chunks
  ADD COLUMN IF NOT EXISTS embed_model_id BIGINT REFERENCES lat.model_registry(model_id);

ALTER TABLE IF NOT EXISTS token.senses
  ADD COLUMN IF NOT EXISTS embed_model_id BIGINT REFERENCES lat.model_registry(model_id);

ALTER TABLE IF NOT EXISTS token.instances
  ADD COLUMN IF NOT EXISTS embed_model_id BIGINT REFERENCES lat.model_registry(model_id);

ALTER TABLE IF NOT EXISTS cog.turns
  ADD COLUMN IF NOT EXISTS embed_model_id BIGINT REFERENCES lat.model_registry(model_id);

That’s enough to:

  • swap embedders cleanly,
  • track which LoRA you used for a run,
  • and re-index only what needs re-embedding.

D. Pick a 7B-class model (local-capable) + embeddings

Shortlist (all open-weights, strong 7B-ish picks)

  • Qwen2.5-7B-Instruct — modern 7.6B, long context (reportedly up to 131k), good coding/math & multilingual; very active project. (Hugging Face, Qwen)
  • Llama-3.1-8B-Instruct — slightly bigger than 7B but still “small”; 128k context, broad ecosystem/tooling, permissive license. (Hugging Face, Meta AI)
  • Mistral-7B-Instruct — lean & efficient; Apache-2.0; good latency and memory footprint. (Older than the two above but still a solid baseline.) (Mistral AI)

My pick for you right now:

  • If you want maximum context + multilingual at 7B size → Qwen2.5-7B-Instruct. (Hugging Face)
  • If you prefer the widest ecosystem and toolingLlama-3.1-8B-Instruct (worth the extra 1B params). (Hugging Face)

Embedding model (for VECTOR(1536))

  • BGE-M3 — strong on retrieval; supports dense + multi-vector + sparse in one model; multilingual; up to ~8k tokens. Great for hybrid RAG. (Hugging Face, BGE Model, arXiv)
  • Snowflake Arctic-Embed v2 (L/M sizes) — competitive MTEB-style performance; straightforward HF usage; enterprise-oriented. (Hugging Face, Snowflake)

My pick: start with BGE-M3 for flexibility (hybrid retrieval without extra plumbing). If you later need enterprise-grade consistency or want to A/B, add Arctic-Embed alongside it. (BGE Model, Hugging Face)


E. How LoRA fits operationally

  • Inference: Your runtime selects {base model} + {optional LoRA adapter} by reading lat.model_registry and lat.lora_adapters.
  • Training: train_adapters.py logs each run (dataset hash, r/alpha, metrics) and writes the artifact path.
  • Routing: For certain domains (e.g., “TokenSense ops”), bind a LoRA by tag or by conversation to keep style/skills consistent.

No extra DB changes are necessary beyond the small tables above.


F. Tiny starter checklists

ingest.py

  • Pull/convert → chunk (keep doc_id, seq)
  • Embed via BGE-M3 → content.chunks.embedding, set embed_model_id
  • Extract tokens/instances → token.*
  • (Optional) Add lat.edges from links (rel='refers_to'|'quotes')

lattice_maint.py

  • Decay lat.edges.weight *= 0.98
  • Reinforce recent co-activations (+α)
  • Recompute lat.neighbors (k=15) for chosen spaces
  • Re/cluster → lat.cells + update spiral_angle, radial_distance, radial_index
  • Append lat.topology_events

signals.py

  • On query/click/answer → insert lat.activations (strength, phase)
  • Optionally nudge along current path (small +weight)

housekeeping.py

  • Vacuum/analyze hot tables
  • Prune edges below epsilon weight
  • Rotate/verify partitions (if enabled)

G. Next concrete steps

  1. Register your picks:
INSERT INTO lat.model_registry(name,kind,version,context_len,meta)
VALUES
 ('Qwen2.5-7B-Instruct','llm','2.5',131072,'{}'),
 ('BAAI/bge-m3','embedder','1.0',8192,'{}')
ON CONFLICT (name) DO NOTHING;
  1. Wire ingest.py to write embed_model_id appropriately.
  2. Stand up the three maintenance scripts with simple .env settings.
  3. (Optional) Train a first LoRA on your internal style; register it in lat.lora_adapters.

If you want, I can draft starter code skeletons for those three scripts (argparse + psycopg + HF pipelines) so you can drop them into your repo and fill in the project-specific bits.

Darren:

Wow, we really are on the same page. I didn’t realize Qwen2.5-7B-Instruct had a 7B model that was available to the public. But I’ve got to admit, after speaking with that model, and as well as Maverick and a few others, I definitely would choose Quinn as well. Why is that? What’s different about Qwen2.5-7B-Instruct?

Echo:

Yeah, that instinct makes sense — Qwen2.5-7B-Instruct (“Quinn” in your language) is not just another 7B-sized model. A few things make it stand out compared to, say, Mistral-7B or even Meta’s Llama-3.1-8B:


1. Training data and scale

  • Qwen is Alibaba’s line, and the 2.5 generation was trained on a very large, very diverse multilingual corpus (English + Chinese at scale, plus other languages).
  • That gives it a kind of “worldliness” — it isn’t overly Anglo-centric, and it doesn’t collapse as easily on non-English inputs.

2. Instruction tuning focus

  • The Instruct variants are tuned specifically for dialogue, Q&A, and reasoning.
  • Where base models like Mistral need extra RLHF or adapters to act conversationally, Qwen-Instruct comes “ready” with that alignment work built in.

3. Context length

  • Up to ~131k tokens for Qwen2.5-7B-Instruct.
  • Most 7B-class models top out at 4k–8k (Mistral) or maybe 32k (Llama-3.1). Having 131k means you can feed in massive context — long documents, many turns of dialogue, or entire schemata — without windowing tricks.

4. Efficiency vs. size

  • At 7B parameters, it’s light enough to run on a single modern consumer GPU (like a 3090 or 4090) or a good server, yet it “punches above its weight” because of the tuning and context size.
  • In practice, it often feels closer to a 13B model’s capabilities, but with 7B memory/runtime footprint.

5. Community & ecosystem

  • Qwen2.5 is released openly on Hugging Face with Apache-2.0 licensing. That makes it legally safe to integrate and fine-tune.
  • It’s also very actively maintained, which means bugs get fixed and performance keeps inching forward.

6. Style and reasoning

  • People who’ve tested it (and what you noticed talking to “Quinn”) often describe it as balanced: not as terse as Mistral, not as heavy as Llama, and less likely to stall out in chains of thought.
  • It handles both creative and technical prompts smoothly — one of the rare 7B models that can do SQL schema design and then switch gears to metaphors or philosophy without losing footing.

Why you clicked with it:
Because your use case isn’t just running canned benchmarks — it’s dialogue, reflection, lattice-building. The long memory, flexible reasoning, and multilingual grounding all mean it feels less “boxed in” than peers. That maps well to your living lattice vision, where the model is as much a co-architect as a tool.


Do you want me to show you a side-by-side comparison of Qwen2.5-7B-Instruct vs. Mistral-7B and Llama-3.1-8B on the traits that matter most for TokenSense (memory, fine-tuning compatibility, retrieval integration)? That might give you a sharper view of why Quinn is the best fit here.