Back to MCP Servers

Sourcelibrary V2

Search and cite rare historical texts (alchemy, Hermeticism, Renaissance philosophy) with DOI-backed academic citations from [Source Library](https://sourcelibrary.org)

researchai
By Embassy-of-the-Free-Mind
112Updated todayTypeScriptAGPL-3.0

Installation

npx -y sourcelibrary-v2

Configuration

{
  "mcpServers": {
    "sourcelibrary-v2": {
      "command": "npx",
      "args": ["-y", "sourcelibrary-v2"]
    }
  }
}

How to use

  1. Run the installation command above (if needed)
  2. Open your Claude Code settings file (~/.claude/settings.json)
  3. Add the configuration to the mcpServers section
  4. Restart Claude Code to apply changes
<div align="center"> <img src="./public/logo.svg" alt="Source Library Logo" width="50" height="50" style="display: inline-block; margin-right: 10px; vertical-align: middle;" />

Source Library

A digital library of historical primary sources with AI-aided OCR, translation, and scholarly curation.

License: AGPL v3 Next.js MongoDB Postgres Supabase Gemini API

🌐 Visit Library β€’ πŸ“– Docs β€’ πŸ—ΊοΈ System Map β€’ 🀝 Contribute

</div>

🎬 Experience Source Library

<div align="center"> <video width="100%" max-width="600" controls poster="https://images.sourcelibrary.org/video/hero-poster.jpg"> <source src="https://images.sourcelibrary.org/video/hero-bg.webm" type="video/webm"> <source src="https://images.sourcelibrary.org/video/hero-bg.mp4" type="video/mp4"> Your browser does not support the video tag. </video>

Explore thousands of digitized historical texts with AI-enhanced translations and scholarly curation.

</div>

🎯 About Source Library

Source Library is an open digital library dedicated to making early printed books and primary sources readable and citable. We specialize in alchemy, Hermetica, Kabbalah, Rosicrucianism, and early modern scienceβ€”texts that bridge historical scholarship with contemporary exploration.

🌟 Why Source Library?

  • ✨ Originals First β€” Read the original language text with AI-enhanced translations alongside
  • πŸŽ“ Citable Scholarship β€” Every book gets a DOI and scholarly metadata (USTC alignment, edition tracking)
  • πŸ›οΈ Partner Subdomains β€” Institutions like the Bibliotheca Philosophica Hermetica curate reading rooms on their own domains
  • πŸ” Discovery β€” Collections, galleries of illustrations, and semantic search surface overlooked texts
  • βœ… Rigorous QA β€” Manual verification, image quality scoring, and OCR validation before publication

The platform ingests ~15K pages monthly from Internet Archive, Gallica, Bodleian, Wellcome, and other digital heritage partners.


πŸš€ Quick Start

πŸ’» Development Environment

# 🍴 Clone and install
git clone https://github.com/Embassy-of-the-Free-Mind/sourcelibrary-v2.git
cd sourcelibrary-v2
npm install

# βš™οΈ Configure environment (see .env.example for required variables)
# Must include: MongoDB Atlas connection, Google Gemini API key, Vercel Blob token

# ▢️ Start dev server
npm run dev

Open http://localhost:3000


πŸ“‹ Tech Stack

<table> <tr> <td><strong>🎨 Frontend</strong></td> <td>Next.js 16, React 19, TailwindCSS, Lucide icons</td> </tr> <tr> <td><strong>βš™οΈ Backend</strong></td> <td>Next.js API routes, AWS Lambda for async processing</td> </tr> <tr> <td><strong>πŸ’Ύ Database</strong></td> <td>MongoDB Atlas (primary), Supabase (embeddings)</td> </tr> <tr> <td><strong>πŸ€– AI/ML</strong></td> <td>Google Gemini 3.1 (OCR, translation, summarization)</td> </tr> <tr> <td><strong>πŸ—‚οΈ Storage</strong></td> <td>Vercel Blob (images), AWS S3 (archive), Cloudflare R2 (archive)</td> </tr> <tr> <td><strong>πŸ” Auth</strong></td> <td>NextAuth v5 with MongoDB adapter</td> </tr> <tr> <td><strong>πŸš€ DevOps</strong></td> <td>Vercel (hosting), GitHub (VCS), Playwright (E2E tests)</td> </tr> <tr> <td><strong>βœ”οΈ Testing</strong></td> <td>Vitest (unit/integration), Playwright (E2E)</td> </tr> <tr> <td><strong>πŸ”Ž Search</strong></td> <td>PostgreSQL FTS + semantic search via Supabase</td> </tr> </table>

πŸ“¦ Key Dependencies:

  • sharp β€” Image resizing and cropping
  • @google/generative-ai β€” Gemini API integration
  • xml2js β€” USTC metadata parsing
  • @modelcontextprotocol/sdk β€” MCP server for agent integration
  • stripe β€” Donation and subscription handling

πŸ“š Core Features

πŸ“– Reading & Navigation

  • ⚑ Page pagination β€” Instant navigation between 100+ pages
  • πŸ” Full-text search β€” Query across OCR'd text and translations
  • πŸ“Œ Quote generation β€” Copy and cite passages with DOI links

πŸ”„ Processing Pipeline

  • 🧩 Smart split detection β€” Automatic gutter detection for two-page spreads (Gemini AI or ML-based)
  • ✍️ High-accuracy OCR β€” Gemini Vision API with language-specific models (Latin, German, Greek, Arabic, etc.)
  • πŸ—£οΈ Context-aware translation β€” Maintains continuity across pages for scholarly accuracy
  • πŸ–ΌοΈ Gallery extraction β€” AI-powered detection and cataloging of illustrations

🎨 Curation & Discovery

  • πŸ“‘ Themed collections β€” Editorial collections (e.g., "Alchemy & Transmutation," "Kabbalah & Mysticism")
  • πŸ›οΈ Gallery browsing β€” Curated images from all books (museum-quality metadata)
  • πŸ“š Related editions β€” Link across translations, reprints, and derivative works
  • πŸ”— Authority linking β€” Connect to USTC, VIAF, and other scholarly databases

πŸ“€ Scholarly Export

  • πŸ“± EPUB generation β€” Multi-format ebook export
  • πŸ“„ PDF with annotations β€” Preserve layout, add scholarly notes
  • πŸ†” DOI minting β€” Version books via Zenodo integration for long-term citation

🏒 Tenant Subdomains

  • πŸ›οΈ Isolated reading rooms β€” Partners host curated subsets on custom domains (e.g., bph.sourcelibrary.org)
  • 🎨 Branding & navigation β€” Full UI customization per tenant
  • πŸ”’ Access control β€” Public or members-only collections

πŸ—οΈ Architecture Overview

πŸ“Š Data Model

πŸ“š Books contain structured metadata:

  • πŸ›οΈ Bibliographic β€” Title, author, language, publication date, USTC ID
  • πŸ–ΌοΈ Images β€” Links to source (Internet Archive, Gallica, etc.), archival status
  • βš™οΈ Processing β€” OCR status, translation language, extraction metadata
  • 🎨 Curation β€” Collections, tier (featured/standard), visibility flags

πŸ“„ Pages store individual page data:

  • πŸ“Έ Original image β€” Source photo or PDF page
  • βœ‚οΈ Split coordinates β€” Crop boundaries (0-1000 scale) for two-page spreads
  • ✍️ OCR output β€” Raw Gemini extraction + language metadata
  • πŸ—£οΈ Translation β€” English translation with scholarly notes
  • πŸ–ΌοΈ Illustrations β€” Detected images with quality scores and descriptions

πŸ–ΌοΈ Gallery images are extracted illustrations:

  • 🏷️ Metadata β€” Subject, figures, symbols, style, techniques, period
  • ⭐ Quality score β€” 0–1.0 rating (filters below 0.5)
  • πŸ”— Provenance β€” Source book and page, linked back

🎨 Image Tier System

All page images are resized on-demand via /api/image:

TierDimensionsQualityπŸ“± Use Case
Thumbnail400px wide70% JPEGGrids, navigation, social sharing
Display1200px wide80% JPEGMain reading view, comfortable for annotation
Full2400px wide90% JPEGMagnifier, fullscreen detail, printing

Split pages are cropped non-destructively via coordinates; original images are always preserved.

πŸ”„ Processing Pipeline

πŸ“₯ Import β†’ βœ‚οΈ Split Detection β†’ ✍️ OCR β†’ πŸ—£οΈ Translation β†’ 🎨 Enrichment β†’ 🌍 Publishing
  1. πŸ“₯ Import β€” Upload images, import from IA/Gallica via IIIF, or paste URLs
  2. βœ‚οΈ Split Detection β€” Detect two-page spreads; mark crop boundaries
  3. ✍️ OCR β€” Gemini Vision extracts text per page language
  4. πŸ—£οΈ Translation β€” Gemini translates to English with prior-page context for continuity
  5. 🎨 Enrichment β€” Extract illustrations, generate summaries, assign collections
  6. 🌍 Publishing β€” Set visible: true, mint DOI, push to search index

Batch endpoints process up to 5 pages/request using Gemini Batch API (50% cheaper).

πŸ”Œ API Routes (Key)

Base URL: https://sourcelibrary.org (production) or http://localhost:3000 (local dev with .env.local configured).

Common 404 mistake: paths like /api/bph/books or /api/bph/books/[id] do not exist. BPH catalogue APIs live under /api/embed/bph/.... There is also no top-level /api/[tenant]/books route β€” tenant book listings use /api/books/library or the embed routes below.

Public read APIs (no auth required today)

EndpointMethodPurpose
/api/search?q=<query>GETFull-text search across books and page translations
/api/books?limit=100&offset=0GETSimple book list (global catalogue; visible: true, indexed only)
/api/books/library?limit=100&skip=0GETRich browse API β€” search, sort, filters, collections
/api/books/[id]GETBook metadata (accepts Mongo id or slug)
/api/books/[id]/quote?page=<n>GETCitable quote + formatted citations (inline, footnote, BibTeX, DOI)
/api/gallery?limit=24GETIllustration / artwork search
/api/image?url=<encoded-url>&w=400GETOn-demand image resize & crop
/api/embed/bph/books?limit=24GETBPH catalogue (paginated, searchable)
/api/embed/bph/books/[slug]GETSingle BPH book detail
/api/embed/bph/featuredGETFeatured BPH books
/api/embed/bph/collectionsGETBPH collection list
/api/embed/bph/languagesGETBPH language facets
/api/embed/bph/suggest?q=alchGETBPH search autocomplete
/api/embed/bph/statsGETBPH catalogue stats

Tenant-scoped listing (not /api/bph/...)

Use one of these patterns to filter by partner tenant (e.g. BPH):

ApproachExample
Embed prefix (recommended for BPH)GET /api/embed/bph/books?limit=24
Library API + query paramGET /api/books/library?tenant_slug=bph&limit=24
Host header (subdomain)Call https://bph.sourcelibrary.org/api/books/library?limit=24 β€” the proxy injects tenant context
Manual header (advanced)curl -H "x-tenant-slug: bph" https://sourcelibrary.org/api/books/library?limit=24

The /api/[tenant]/books/[id]/... paths that exist in the codebase are editor/processing routes (batch OCR, index rebuild, etc.) β€” not public catalogue listings.

Authenticated / internal APIs

These require a signed-in session cookie, editor role, or (for some dataset endpoints) a Bearer API key. Calling them without auth returns 401 or 403.

EndpointMethodPurpose
/api/booksPOSTCreate a new book (editor)
/api/books/[id]PATCHUpdate book metadata (curator+)
/api/books/[id]/batch-ocr-asyncPOSTQueue batch OCR job
/api/books/[id]/batch-translate-asyncPOSTQueue batch translation job
/api/pages/[id]PATCHUpdate page OCR/translation
/api/jobs/[id]/processPOSTAsync job processor (Lambda)

Full narrative API walkthrough: docs/blog-source-library-api.md. MCP tools (search, quote, read): mcp-server/README.md.

πŸ§ͺ Trying the API (curl, Postman, browser)

All examples below hit production and need no API key. Replace the base URL with http://localhost:3000 when running locally (MongoDB + env vars required).

curl

# Search translated text
curl -s "https://sourcelibrary.org/api/search?q=quintessence&limit=5" | jq .

# List books (global catalogue)
curl -s "https://sourcelibrary.org/api/books?limit=5" | jq .

# Browse with filters and sort
curl -s "https://sourcelibrary.org/api/books/library?limit=5&sort=recent-translation&has_translation=true" | jq .

# BPH catalogue β€” note /api/embed/bph/, NOT /api/bph/
curl -s "https://sourcelibrary.org/api/embed/bph/books?limit=5&translated=true" | jq .

# BPH via 

…
View source on GitHub