Research & Publications

Thinking Harder About
How LLMs Think

Our research focuses on the structural problems that emerge when language models operate in long-horizon, high-stakes contexts — context decay, epistemic coherence, and the architecture of memory in conversational AI.

4
Original Contributions
9
Cognitive Science Models
<2%
Hallucination Rate (Genius²)
Theoretical Context Horizon

Research Areas

What We're Working On

We investigate the structural failure modes that prevent LLMs from reasoning reliably over long contexts — and build theoretical frameworks and working implementations to address them.

Context Decay

The progressive reduction in a model's effective use of early context tokens as a conversation grows — operating silently within the context window. All tokens are present, yet earlier information exerts diminishing influence on output. We characterise the mechanism, measure its effects, and quantify the "lost in the middle" degradation.

attention-asymmetry recency-bias positional-encoding

Epistemic Context Management

Context management is not a token budget problem — it is an epistemic structure problem. A conversation constructs a shared understanding with topology, temporal validity, information density, and frame. Our scoring proxy architecture, grounded in nine cognitive science memory models, manages context through a composite signal of recency, relevance, confidence, and currency.

scoring-proxy cognitive-science tiered-retention

Dissolution Through Consensus

Instead of compressing the matter of a conversation, synthesise its boundary state. Our "Forever Solution" processes conversational history through a multi-model ensemble (the Genius² Digital Senate), replacing linear history with a present-tense Integrated Understanding — compact, generative, and immune to positional decay.

genius² multi-llm boundary-synthesis

Bitemporal Context

Every statement in a conversation carries two temporal attributes: when it was made (transaction time) and the interval during which it is true (valid time). Forward commitments should not decay by positional age because their validity hasn't expired. Retroactive corrections retroactively close the valid-time interval of superseded statements. Standard recency signals cannot distinguish these cases.

valid-time transaction-time temporal-logic

Knowledge Taxonomy

Context decay interacts differently with vertical (deep, domain-specialist) and horizontal (broad, cross-cutting) knowledge. Vertical knowledge — the specific figure, the precise exception, the stated qualification — has no parametric fallback. When it degrades in context, it is simply gone. This asymmetry demands radically different retention policies across knowledge types.

vertical-knowledge horizontal-knowledge retention-policy

LLM Memory Architecture

Grounded in the Atkinson-Shiffrin multi-store model, Baddeley's episodic buffer, the serial position effect, and McGeoch's interference theory, we map how human memory systems solve the same fundamental challenge LLMs face — limited capacity, continuous input, moment-to-moment output — and derive engineering design decisions from those cognitive science foundations.

working-memory episodic-buffer interference-theory

Featured Paper • 2026

Conversation Intelligence Architecture

A Theoretical Framework for Epistemic Context Management in Large Language Models — T. Larcombe

Large language models process conversation as a linear sequence of tokens, and manage that sequence with policies derived from a single assumption: that context management is a token budget problem. This dissertation argues that the assumption is wrong. Context management is an epistemic structure problem. A conversation does not merely accumulate tokens; it constructs a shared understanding with a topology, a temporal validity profile, an information density, and a frame. Managing it correctly requires reasoning about all four dimensions simultaneously, not merely about recency and size.

— Abstract

What if the goal is not to preserve the conversation's content but to preserve the understanding the conversation established? These are not the same thing. A conversation that has run for forty turns has produced a region in conceptual space with defined edges. The content that filled that region — the specific phrasing of each exchange — is not what persists in an expert practitioner's memory after a long consultation. What persists is the shape of what was established. The boundary, not the matter.

— Chapter Introduction

This work develops four original contributions: a cognitive-science-grounded scoring proxy, Dissolution Through Consensus, the four-level Conversation Intelligence Architecture, and a bitemporal extension to context scoring. Together they constitute a theoretical framework for treating context management as an epistemic rather than a budgetary problem.

Download Full Paper

PDF — Read Online Best for viewing & annotation
DOCX — Word Format Best for editing & quoting

The paper is available under an open research licence. If you build on this work, we'd love to hear about it — get in touch.

The Framework

Four Levels of the CIA

The Conversation Intelligence Architecture addresses four orthogonal dimensions of epistemic context management, each resolving a failure mode the previous level creates.

01

What to Retain

Surprise-Proportional Allocation

Budget allocation proportional to prediction error — grounded in the Free Energy Principle. What the model did not already know is precisely what is worth retaining. Recency alone cannot determine value.

02

How to Represent It

Navigational Topology

The established conceptual space should be represented as a navigational map, not a content archive. A map of where knowledge lives is more useful than a catalogue of what it contains.

03

How to Measure Success

Generative Competence

Grounded in Kolmogorov complexity theory — the only valid test of a sufficient representation is whether it generates sufficiently. Surface similarity to the original is not the right metric.

04

What Threatens Coherence

Adversarial Frame Agent

A parallel agent challenges the frame of the conversation, not its conclusions. The most consequential failure in long-horizon reasoning is not a factual error inside the established frame — it is the silent accumulation of unexamined premises that constitute the frame itself.

Key Findings

What the Research Shows

Context decay is silent

Unlike hard context window limits — which produce visible errors — context decay fails silently. The model continues to respond coherently while progressively disregarding early instructions and established facts.

>30% retrieval degradation

Facts placed in the middle of a long context degrade in retrieval accuracy by more than 30% relative to facts at either end — a U-shaped attention distribution that persists across model families (Liu et al., 2024).

Thinking tokens double the problem

Extended reasoning traces accumulate thinking tokens at roughly the same rate as conversational tokens, effectively doubling the rate at which earlier content is displaced — without the user being aware.

Dissolution achieves infinite horizon

By replacing linear history with a boundary state synthesised through multi-model consensus, Dissolution Through Consensus enables theoretically infinite conversational coherence — removing the hard ceiling on context length.

Connection to Our Products

Research Into Practice

The Dissolution Through Consensus mechanism is the theoretical underpinning of Genius² — our multi-LLM consensus engine. Where conventional approaches try to preserve context, Genius² transforms it: independent models synthesise a boundary state that captures the shape of understanding rather than the matter of history.

The scoring proxy architecture and knowledge taxonomy inform how AskDiana manages long document Q&A sessions — applying different retention policies to vertical (specialist, irreplaceable) versus horizontal (broad, parametrically recoverable) knowledge.

The Knowledge Decay product applies the valid-time dimension of bitemporal context to data lakes — automatically identifying documents whose valid time has closed, regardless of when they were created.

Get Involved

Interested in This Work?

We're actively extending this research. If you're a researcher, engineer, or student working on related problems, we'd welcome a conversation.