Source Library

A digital library of historical primary sources with AI-aided OCR, translation, and scholarly curation.

🌐 Visit Library • 📖 Docs • 🗺️ System Map • 🤝 Contribute

</div>

🎬 Experience Source Library

<div align="center"> <video width="100%" max-width="600" controls poster="https://images.sourcelibrary.org/video/hero-poster.jpg"> <source src="https://images.sourcelibrary.org/video/hero-bg.webm" type="video/webm"> <source src="https://images.sourcelibrary.org/video/hero-bg.mp4" type="video/mp4"> Your browser does not support the video tag. </video>

Explore thousands of digitized historical texts with AI-enhanced translations and scholarly curation.

</div>

🎯 About Source Library

Source Library is an open digital library dedicated to making early printed books and primary sources readable and citable. We specialize in alchemy, Hermetica, Kabbalah, Rosicrucianism, and early modern science—texts that bridge historical scholarship with contemporary exploration.

🌟 Why Source Library?

✨ Originals First — Read the original language text with AI-enhanced translations alongside
🎓 Citable Scholarship — Every book gets a DOI and scholarly metadata (USTC alignment, edition tracking)
🏛️ Partner Subdomains — Institutions like the Bibliotheca Philosophica Hermetica curate reading rooms on their own domains
🔍 Discovery — Collections, galleries of illustrations, and semantic search surface overlooked texts
✅ Rigorous QA — Manual verification, image quality scoring, and OCR validation before publication

The platform ingests ~15K pages monthly from Internet Archive, Gallica, Bodleian, Wellcome, and other digital heritage partners.

🚀 Quick Start

💻 Development Environment

# 🍴 Clone and install
git clone https://github.com/Embassy-of-the-Free-Mind/sourcelibrary-v2.git
cd sourcelibrary-v2
npm install

# ⚙️ Configure environment (see .env.example for required variables)
# Must include: MongoDB Atlas connection, Google Gemini API key, Vercel Blob token

# ▶️ Start dev server
npm run dev

Open http://localhost:3000

📋 Tech Stack

<table> <tr> <td><strong>🎨 Frontend</strong></td> <td>Next.js 16, React 19, TailwindCSS, Lucide icons</td> </tr> <tr> <td><strong>⚙️ Backend</strong></td> <td>Next.js API routes, AWS Lambda for async processing</td> </tr> <tr> <td><strong>💾 Database</strong></td> <td>MongoDB Atlas (primary), Supabase (embeddings)</td> </tr> <tr> <td><strong>🤖 AI/ML</strong></td> <td>Google Gemini 3.1 (OCR, translation, summarization)</td> </tr> <tr> <td><strong>🗂️ Storage</strong></td> <td>Vercel Blob (images), AWS S3 (archive), Cloudflare R2 (archive)</td> </tr> <tr> <td><strong>🔐 Auth</strong></td> <td>NextAuth v5 with MongoDB adapter</td> </tr> <tr> <td><strong>🚀 DevOps</strong></td> <td>Vercel (hosting), GitHub (VCS), Playwright (E2E tests)</td> </tr> <tr> <td><strong>✔️ Testing</strong></td> <td>Vitest (unit/integration), Playwright (E2E)</td> </tr> <tr> <td><strong>🔎 Search</strong></td> <td>PostgreSQL FTS + semantic search via Supabase</td> </tr> </table>

📦 Key Dependencies:

sharp — Image resizing and cropping
@google/generative-ai — Gemini API integration
xml2js — USTC metadata parsing
@modelcontextprotocol/sdk — MCP server for agent integration
stripe — Donation and subscription handling

📚 Core Features

📖 Reading & Navigation

⚡ Page pagination — Instant navigation between 100+ pages
🔍 Full-text search — Query across OCR'd text and translations
📌 Quote generation — Copy and cite passages with DOI links

🔄 Processing Pipeline

🧩 Smart split detection — Automatic gutter detection for two-page spreads (Gemini AI or ML-based)
✍️ High-accuracy OCR — Gemini Vision API with language-specific models (Latin, German, Greek, Arabic, etc.)
🗣️ Context-aware translation — Maintains continuity across pages for scholarly accuracy
🖼️ Gallery extraction — AI-powered detection and cataloging of illustrations

🎨 Curation & Discovery

📑 Themed collections — Editorial collections (e.g., "Alchemy & Transmutation," "Kabbalah & Mysticism")
🏛️ Gallery browsing — Curated images from all books (museum-quality metadata)
📚 Related editions — Link across translations, reprints, and derivative works
🔗 Authority linking — Connect to USTC, VIAF, and other scholarly databases

📤 Scholarly Export

📱 EPUB generation — Multi-format ebook export
📄 PDF with annotations — Preserve layout, add scholarly notes
🆔 DOI minting — Version books via Zenodo integration for long-term citation

🏢 Tenant Subdomains

🏛️ Isolated reading rooms — Partners host curated subsets on custom domains (e.g., bph.sourcelibrary.org)
🎨 Branding & navigation — Full UI customization per tenant
🔒 Access control — Public or members-only collections

🏗️ Architecture Overview

📊 Data Model

📚 Books contain structured metadata:

🏛️ Bibliographic — Title, author, language, publication date, USTC ID
🖼️ Images — Links to source (Internet Archive, Gallica, etc.), archival status
⚙️ Processing — OCR status, translation language, extraction metadata
🎨 Curation — Collections, tier (featured/standard), visibility flags

📄 Pages store individual page data:

📸 Original image — Source photo or PDF page
✂️ Split coordinates — Crop boundaries (0-1000 scale) for two-page spreads
✍️ OCR output — Raw Gemini extraction + language metadata
🗣️ Translation — English translation with scholarly notes
🖼️ Illustrations — Detected images with quality scores and descriptions

🖼️ Gallery images are extracted illustrations:

🏷️ Metadata — Subject, figures, symbols, style, techniques, period
⭐ Quality score — 0–1.0 rating (filters below 0.5)
🔗 Provenance — Source book and page, linked back

🎨 Image Tier System

All page images are resized on-demand via /api/image:

Tier	Dimensions	Quality	📱 Use Case
Thumbnail	400px wide	70% JPEG	Grids, navigation, social sharing
Display	1200px wide	80% JPEG	Main reading view, comfortable for annotation
Full	2400px wide	90% JPEG	Magnifier, fullscreen detail, printing

Split pages are cropped non-destructively via coordinates; original images are always preserved.

🔄 Processing Pipeline

📥 Import → ✂️ Split Detection → ✍️ OCR → 🗣️ Translation → 🎨 Enrichment → 🌍 Publishing

📥 Import — Upload images, import from IA/Gallica via IIIF, or paste URLs
✂️ Split Detection — Detect two-page spreads; mark crop boundaries
✍️ OCR — Gemini Vision extracts text per page language
🗣️ Translation — Gemini translates to English with prior-page context for continuity
🎨 Enrichment — Extract illustrations, generate summaries, assign collections
🌍 Publishing — Set visible: true, mint DOI, push to search index

Batch endpoints process up to 5 pages/request using Gemini Batch API (50% cheaper).

🔌 API Routes (Key)

Base URL: https://sourcelibrary.org (production) or http://localhost:3000 (local dev with .env.local configured).

Common 404 mistake: paths like /api/bph/books or /api/bph/books/[id] do not exist. BPH catalogue APIs live under /api/embed/bph/.... There is also no top-level /api/[tenant]/books route — tenant book listings use /api/books/library or the embed routes below.

Public read APIs (no auth required today)

Endpoint	Method	Purpose
`/api/search?q=<query>`	GET	Full-text search across books and page translations
`/api/books?limit=100&offset=0`	GET	Simple book list (global catalogue; `visible: true`, indexed only)
`/api/books/library?limit=100&skip=0`	GET	Rich browse API — search, sort, filters, collections
`/api/books/[id]`	GET	Book metadata (accepts Mongo `id` or `slug`)
`/api/books/[id]/quote?page=<n>`	GET	Citable quote + formatted citations (inline, footnote, BibTeX, DOI)
`/api/gallery?limit=24`	GET	Illustration / artwork search
`/api/image?url=<encoded-url>&w=400`	GET	On-demand image resize & crop
`/api/embed/bph/books?limit=24`	GET	BPH catalogue (paginated, searchable)
`/api/embed/bph/books/[slug]`	GET	Single BPH book detail
`/api/embed/bph/featured`	GET	Featured BPH books
`/api/embed/bph/collections`	GET	BPH collection list
`/api/embed/bph/languages`	GET	BPH language facets
`/api/embed/bph/suggest?q=alch`	GET	BPH search autocomplete
`/api/embed/bph/stats`	GET	BPH catalogue stats

Tenant-scoped listing (not `/api/bph/...`)

Use one of these patterns to filter by partner tenant (e.g. BPH):

Approach	Example
Embed prefix (recommended for BPH)	`GET /api/embed/bph/books?limit=24`
Library API + query param	`GET /api/books/library?tenant_slug=bph&limit=24`
Host header (subdomain)	Call `https://bph.sourcelibrary.org/api/books/library?limit=24` — the proxy injects tenant context
Manual header (advanced)	`curl -H "x-tenant-slug: bph" https://sourcelibrary.org/api/books/library?limit=24`

The /api/[tenant]/books/[id]/... paths that exist in the codebase are editor/processing routes (batch OCR, index rebuild, etc.) — not public catalogue listings.

Authenticated / internal APIs

These require a signed-in session cookie, editor role, or (for some dataset endpoints) a Bearer API key. Calling them without auth returns 401 or 403.

Endpoint	Method	Purpose
`/api/books`	POST	Create a new book (editor)
`/api/books/[id]`	PATCH	Update book metadata (curator+)
`/api/books/[id]/batch-ocr-async`	POST	Queue batch OCR job
`/api/books/[id]/batch-translate-async`	POST	Queue batch translation job
`/api/pages/[id]`	PATCH	Update page OCR/translation
`/api/jobs/[id]/process`	POST	Async job processor (Lambda)

Full narrative API walkthrough: docs/blog-source-library-api.md. MCP tools (search, quote, read): mcp-server/README.md.

🧪 Trying the API (curl, Postman, browser)

All examples below hit production and need no API key. Replace the base URL with http://localhost:3000 when running locally (MongoDB + env vars required).

curl

# Search translated text
curl -s "https://sourcelibrary.org/api/search?q=quintessence&limit=5" | jq .

# List books (global catalogue)
curl -s "https://sourcelibrary.org/api/books?limit=5" | jq .

# Browse with filters and sort
curl -s "https://sourcelibrary.org/api/books/library?limit=5&sort=recent-translation&has_translation=true" | jq .

# BPH catalogue — note /api/embed/bph/, NOT /api/bph/
curl -s "https://sourcelibrary.org/api/embed/bph/books?limit=5&translated=true" | jq .

# BPH via 

…

Sourcelibrary V2

Installation

Configuration

How to use

README

Source Library

🎬 Experience Source Library

🎯 About Source Library

🌟 Why Source Library?

🚀 Quick Start

💻 Development Environment

📋 Tech Stack

📚 Core Features

📖 Reading & Navigation

🔄 Processing Pipeline

🎨 Curation & Discovery

📤 Scholarly Export

🏢 Tenant Subdomains

🏗️ Architecture Overview

📊 Data Model

🎨 Image Tier System

🔄 Processing Pipeline

🔌 API Routes (Key)

Public read APIs (no auth required today)

Tenant-scoped listing (not `/api/bph/...`)

Authenticated / internal APIs

🧪 Trying the API (curl, Postman, browser)

curl

You might also like

Sourcelibrary V2

Installation

Configuration

How to use

README

Source Library

🎬 Experience Source Library

🎯 About Source Library

🌟 Why Source Library?

🚀 Quick Start

💻 Development Environment

📋 Tech Stack

📚 Core Features

📖 Reading & Navigation

🔄 Processing Pipeline

🎨 Curation & Discovery

📤 Scholarly Export

🏢 Tenant Subdomains

🏗️ Architecture Overview

📊 Data Model

🎨 Image Tier System

🔄 Processing Pipeline

🔌 API Routes (Key)

Public read APIs (no auth required today)

Tenant-scoped listing (not /api/bph/...)

Authenticated / internal APIs

🧪 Trying the API (curl, Postman, browser)

curl

You might also like

Tenant-scoped listing (not `/api/bph/...`)