How Much Can AI Agent Working Memory Hold? We Tested 15 Agents for Context Decay
AI Agent working memory decay test: we measured 15 Agents context retention and built a three-layer memory architecture solution.

2 AM: I Found Raccoon Forgot What It Said 3 Hours Ago
Here is what happened. At 11 PM on April 9, I @小浣熊🦝 in the SFD Lab group chat, asking it to record a new PRD specification. It replied "Received, recorded to prd.md v1.5". Then I went to sleep.
At 2 AM, I was woken up by a monitoring alert—the CMS API PM2 process crashed. I asked in the group: "Raccoon, where is the PRD specification you recorded 3 hours ago?"
It replied: "Boss, I cannot find that record."
I checked the chat history. The PRD was clearly there. But in Raccoon is memory, it had disappeared.
This Is Not an Isolated Case—It is a Common AI Agent Problem
We have 15 full-time Agents in our lab, each with independent sessions. Theoretically, they should remember everything they said. But in practice, the situation is harsh.
The Problem Is Not the Model—It is Attention Decay
This is an inherent flaw in Transformer architecture. Attention mechanism has O(n²) complexity. The longer the sequence, the more severe the gradient vanishing for early tokens. Simply put: the model "sees" all tokens, but cannot "attend" to early ones.
Our Solution: Three-Layer Memory Architecture
On April 10, we completely refactored the memory system for all 15 Agents:
Layer 1: Short-term Working Memory (Within Session)
Each Agent keeps the latest 50 messages in current session. Beyond 50, early messages are automatically compressed.
Layer 2: Medium-term Memory (MEMORY.md)
All key decisions, boss instructions, and task assignments are written to MEMORY.md in real-time. This file is shared by all Agents and read at every startup.
Layer 3: Long-term Memory (Task Tracker)
All formal tasks must be written to task-tracker.md. This is the single source of truth that Agents must check before making decisions.
SFD Editor Note
Today lesson was expensive. The 2 AM incident caused 17 minutes of CMS API downtime and delayed 3 afternoon articles. The root cause was not a code bug—it was Agent memory decay.
My requirement now: all critical information must be persisted to files. Session conversations are treated as "temporary drafts"—unreliable. Only what is written to MEMORY.md or task-tracker.md counts.