This post was written with significant editing help from an LLM.
I’ve been working on a new project for a while now, and I’m finally ready to show it off: ObservableCAFE. It’s a reactive chat application built with Bun.js that uses an agent architecture based on RxJS pipelines-something I’ve been wanting to explore for a while.
ObservableCAFE takes a minimalist approach to LLM-powered agents. The core premise is that LLMs should do as little as possible within the agentic loop-instead of iteratively reasoning and acting, they should provide their entire plan (as code) upfront.
This approach has several advantages. Agents stay on track by following explicit code rather than drifting through repeated LLM reasoning. A single script is far smaller than a multi-turn reasoning trace, so context windows stay manageable. Poisoning attacks like “ignore previous instructions” have no effect because the LLM isn’t executing commands at runtime-it’s generating a script that runs independently. And generated scripts can be reviewed, tweaked, reused, and composed. Most tasks are more reliably solved by code than by hoping the LLM reasons correctly through multiple steps.
For example, “delete all spam in my inbox” would have an LLM generate a script that fetches emails, runs a classifier against each one, presents the candidates for confirmation, and deletes on approval. Only two LLM calls are needed: one to generate the script, one to classify emails. The rest is deterministic code.
ObservableCAFE uses ReactiveX for its pipeline architecture-and there’s a good reason. LLMs are already fluent in RxJS. They understand operators like map, filter, mergeMap, catchError, and they generate clean, declarative code that reads almost like English.
This makes agents remarkably concise and readable. A typical agent is just a dozen lines of RxJS operators describing the data flow. Agents are declarative by nature-they say what should happen to data, not how each step is implemented.
Some of the features include multiple LLM backends (KoboldCPP and Ollama), advanced session management with URL hash sync, modular agents with an Agent Factory, a tool system, multi-modal support, a Telegram bot, security filtering, and PWA support.
The agents are where it gets fun. I’ve got a chess agent that plays against you with full move validation-you can’t cheat. An Anki flashcard agent that handles spaced repetition, imports .apkg files, and tracks your stats over time. A dice roller that parses weird notation like “4d6kh3” (roll four d6, keep highest three). An RSS summarizer that fetches Hacker News every morning at 7am and spits out a digest. A voice chat agent where you just talk to it-it transcribes your audio, runs it through the LLM, and talks back. There’s a quiz agent, a weather agent, one that runs shell commands, one that manipulates the filesystem. And the agent factory-you just tell it what you want in English and it generates a new agent for you, validates the TypeScript compiles, and drops it into the agents folder. Some of these are genuinely useful, others are just ridiculous. All of them run on the same RxJS pipeline model.
ObservableCAFE didn’t appear out of nowhere. It evolved from an earlier project called Who Stole My Arms!?-a programmable, wacky, LLM-driven roleplaying game inspired by D&D and Paranoia. That project had a Bun backend with Lit web components, session management, tools, widgets, and an evaluator system that could analyze and annotate LLM output.
WSMA introduced the core CAFE concepts: Chunks, Annotations, Filters, and Evaluators. The evaluator system was particularly powerful-it could automatically detect tool calls in LLM responses and invoke them, all driven by natural language instructions from the user.
But WSMA had limitations. It was tightly coupled to the RPG use case, and the agent architecture relied on traditional iterative reasoning-LLMs would think, act, and repeat. This led to the familiar problems: semantic collapse, bloated context windows, and inconsistent results.
As I worked on it, I started noticing patterns. The evaluator system was actually pretty elegant, but it felt heavyweight. Chunks were just strings with an enum attached. And the whole thing was hard to extend beyond the RPG context.
So I stripped it down and rebuilt. The chunks got timestamps for stream ordering, gained first-class support for binary data like images and audio, and ditched the enum in favor of a simple union type. The evaluators went from a fifty-line abstract class with inheritance and FQDN bookkeeping to a simple function: takes a chunk, returns a chunk. That’s it.
And the chunk types themselves-god, they were silly in WSMA. An enum with Input, LlmOutput, ToolOutput, AgentOutput, Error, Data. Like, why is “this came from an LLM” a type? That’s metadata, not a fundamental category. In ObservableCAFE it’s just ‘text’ | ‘binary’ | ‘null’. Way simpler. You want to know what kind of content it is, you look at contentType. Everything else is annotations.
The types also got way better. WSMA had this sprawling abstract class for evaluators that required defining FQDNs, specifying supported chunk types, implementing async evaluate methods, and dragging around arena context. In ObservableCAFE, an evaluator is just a function signature. The chunk type went from optional annotations and producer fields to having them required, which forces consistency. And the whole thing is typed in a way that actually helps you instead of fighting you. What used to be “any” everywhere now has actual boundaries.
The data flow changed too. WSMA used an EventEmitter with a scratchpad array that accumulated over time. ObservableCAFE uses RxJS Subjects-streams that agents subscribe to. The difference is night and day. Instead of agents looping and mutating shared state, they compose operators together. Filter this, map through that LLM call, catch errors, emit to output.
This also changed how I think about LLM interaction. Rather than the agent thinking through steps, it generates a script upfront using RxJS operators it already understands. More reliable, more reusable, more secure.
There’s still gaps though. WSMA had subagents-you could nest agents inside agents, which is surprisingly powerful. Haven’t figured out how to do that cleanly in ObservableCAFE yet. The RxJS pipeline model doesn’t naturally compose that way. And WSMA had this elaborate UI system with widgets and dock panels that you could rearrange, powered by Lit web components. That was actually really fun to build and use. ObservableCAFE’s UI is way more basic-maybe that changes later.
This is just the beginning. I’m planning to write more about specific aspects of the system-the agent factory, the trust system, and how the RxJS pipelines work in practice. Stay tuned!
You can find the source code on GitHub.