mdstill is a document-ingestion tool purpose-built for LLM and RAG workflows. Where generic text extractors dump messy output, mdstill produces clean, semantic markdown that preserves tables, headings, and document structure — the things LLMs actually need to understand context.
What you can do with it:
• Prepare documents for RAG pipelines — chunk-ready, with semantic boundaries preserved
• Feed PDFs, Word, or spreadsheets into ChatGPT, Claude, or Gemini without losing tables
• Build knowledge bases in Obsidian, Notion, or Logseq from existing document archives
• Extract structured context for AI agents and embeddings
How it's different: Deep mode runs layout-aware parsing — tables, OCR, multi-column PDFs — not just text dumping. Output is roughly 40% more token-efficient than raw text, so LLM costs drop. REST API available for pipeline automation. Free tier with no signup required for basic use.
Built for engineers shipping AI features and knowledge workers tired of pasting broken PDF text into chat windows.


