
Archimedes
A local-first, AI-powered PDF book library manager for macOS.
Why this exists
Personal PDF libraries balloon into thousands of files with no metadata, no covers, and no way to ask “what did that book say about X?”. Cloud RAG solves the asking, but uploading a personal library to a third party feels wrong. The brief was an honest local-first AI library: catalogues itself, doesn’t depend on a service, doesn’t leak the collection, and still lets you ask questions across the whole shelf.
The proof point
Local-first AI is now production-grade. The constraints that look “personal” at this scale — process isolation, bring-your-own-key providers, pluggable embeddings, durable job queues, content-based identity — are the same constraints that hold at enterprise scale on private documents. The desktop is a useful proving ground for the architecture an enterprise will adopt next.
How it works
Archimedes is an Electron app with a strict three-process split: a sandboxed React renderer with no Node access, a typed `window.api` preload bridge, and a Node main process that owns the filesystem, PDF parsing, AI calls, and the vector store. Every dependency that can break — model loading, ONNX runtime, embedding computation — runs in a separate utility process, so a native crash kills only the worker, not the app.
Each library is a flat folder on disk that Archimedes owns but never mutates the originals. Books are catalogued by an AI metadata pass, enriched by Open Library and Google Books, and deduplicated by content (title, author, publisher, year, ISBN, content hash) rather than by filename. Embedding and indexing run through a persisted FIFO queue that survives restarts — interrupted work is reset and re-queued on next launch.
The AI layer is pluggable end to end. Chat providers (Anthropic, OpenAI, Gemini, Ollama) are chosen per query; embeddings default to a local `bge-small-en-v1.5` model running in the isolated worker, or any of the remote providers with a BYO key. API keys are envelope-encrypted via the OS keychain. Find runs a single embed against every library’s LanceDB table; Ask synthesises citations over retrieved excerpts only, never the full book.
Architecture
What an architect can take from it
- 01Process isolation isn’t optional once you ship AI. Native runtimes crash; the rest of your application shouldn’t go down with them.
- 02Pluggable providers + BYO key is the right default for AI on private data. It outsources cost, switching, and trust to the user without forcing a re-architecture later.
- 03Content-based identity — not filename — is how you deduplicate documents at any scale. The same rules carry into enterprise document repositories.