Source Library
A digital library of historical primary sources with AI-aided OCR, translation, and scholarly curation.
π Visit Library β’ π Docs β’ πΊοΈ System Map β’ π€ Contribute
</div>π¬ Experience Source Library
<div align="center"> <video width="100%" max-width="600" controls poster="https://images.sourcelibrary.org/video/hero-poster.jpg"> <source src="https://images.sourcelibrary.org/video/hero-bg.webm" type="video/webm"> <source src="https://images.sourcelibrary.org/video/hero-bg.mp4" type="video/mp4"> Your browser does not support the video tag. </video>Explore thousands of digitized historical texts with AI-enhanced translations and scholarly curation.
</div>π― About Source Library
Source Library is an open digital library dedicated to making early printed books and primary sources readable and citable. We specialize in alchemy, Hermetica, Kabbalah, Rosicrucianism, and early modern scienceβtexts that bridge historical scholarship with contemporary exploration.
π Why Source Library?
- β¨ Originals First β Read the original language text with AI-enhanced translations alongside
- π Citable Scholarship β Every book gets a DOI and scholarly metadata (USTC alignment, edition tracking)
- ποΈ Partner Subdomains β Institutions like the Bibliotheca Philosophica Hermetica curate reading rooms on their own domains
- π Discovery β Collections, galleries of illustrations, and semantic search surface overlooked texts
- β Rigorous QA β Manual verification, image quality scoring, and OCR validation before publication
The platform ingests ~15K pages monthly from Internet Archive, Gallica, Bodleian, Wellcome, and other digital heritage partners.
π Quick Start
π» Development Environment
# π΄ Clone and install
git clone https://github.com/Embassy-of-the-Free-Mind/sourcelibrary-v2.git
cd sourcelibrary-v2
npm install
# βοΈ Configure environment (see .env.example for required variables)
# Must include: MongoDB Atlas connection, Google Gemini API key, Vercel Blob token
# βΆοΈ Start dev server
npm run devπ Tech Stack
<table> <tr> <td><strong>π¨ Frontend</strong></td> <td>Next.js 16, React 19, TailwindCSS, Lucide icons</td> </tr> <tr> <td><strong>βοΈ Backend</strong></td> <td>Next.js API routes, AWS Lambda for async processing</td> </tr> <tr> <td><strong>πΎ Database</strong></td> <td>MongoDB Atlas (primary), Supabase (embeddings)</td> </tr> <tr> <td><strong>π€ AI/ML</strong></td> <td>Google Gemini 3.1 (OCR, translation, summarization)</td> </tr> <tr> <td><strong>ποΈ Storage</strong></td> <td>Vercel Blob (images), AWS S3 (archive), Cloudflare R2 (archive)</td> </tr> <tr> <td><strong>π Auth</strong></td> <td>NextAuth v5 with MongoDB adapter</td> </tr> <tr> <td><strong>π DevOps</strong></td> <td>Vercel (hosting), GitHub (VCS), Playwright (E2E tests)</td> </tr> <tr> <td><strong>βοΈ Testing</strong></td> <td>Vitest (unit/integration), Playwright (E2E)</td> </tr> <tr> <td><strong>π Search</strong></td> <td>PostgreSQL FTS + semantic search via Supabase</td> </tr> </table>π¦ Key Dependencies:
sharpβ Image resizing and cropping@google/generative-aiβ Gemini API integrationxml2jsβ USTC metadata parsing@modelcontextprotocol/sdkβ MCP server for agent integrationstripeβ Donation and subscription handling
π Core Features
π Reading & Navigation
- β‘ Page pagination β Instant navigation between 100+ pages
- π Full-text search β Query across OCR'd text and translations
- π Quote generation β Copy and cite passages with DOI links
π Processing Pipeline
- π§© Smart split detection β Automatic gutter detection for two-page spreads (Gemini AI or ML-based)
- βοΈ High-accuracy OCR β Gemini Vision API with language-specific models (Latin, German, Greek, Arabic, etc.)
- π£οΈ Context-aware translation β Maintains continuity across pages for scholarly accuracy
- πΌοΈ Gallery extraction β AI-powered detection and cataloging of illustrations
π¨ Curation & Discovery
- π Themed collections β Editorial collections (e.g., "Alchemy & Transmutation," "Kabbalah & Mysticism")
- ποΈ Gallery browsing β Curated images from all books (museum-quality metadata)
- π Related editions β Link across translations, reprints, and derivative works
- π Authority linking β Connect to USTC, VIAF, and other scholarly databases
π€ Scholarly Export
- π± EPUB generation β Multi-format ebook export
- π PDF with annotations β Preserve layout, add scholarly notes
- π DOI minting β Version books via Zenodo integration for long-term citation
π’ Tenant Subdomains
- ποΈ Isolated reading rooms β Partners host curated subsets on custom domains (e.g.,
bph.sourcelibrary.org) - π¨ Branding & navigation β Full UI customization per tenant
- π Access control β Public or members-only collections
ποΈ Architecture Overview
π Data Model
π Books contain structured metadata:
- ποΈ Bibliographic β Title, author, language, publication date, USTC ID
- πΌοΈ Images β Links to source (Internet Archive, Gallica, etc.), archival status
- βοΈ Processing β OCR status, translation language, extraction metadata
- π¨ Curation β Collections, tier (featured/standard), visibility flags
π Pages store individual page data:
- πΈ Original image β Source photo or PDF page
- βοΈ Split coordinates β Crop boundaries (0-1000 scale) for two-page spreads
- βοΈ OCR output β Raw Gemini extraction + language metadata
- π£οΈ Translation β English translation with scholarly notes
- πΌοΈ Illustrations β Detected images with quality scores and descriptions
πΌοΈ Gallery images are extracted illustrations:
- π·οΈ Metadata β Subject, figures, symbols, style, techniques, period
- β Quality score β 0β1.0 rating (filters below 0.5)
- π Provenance β Source book and page, linked back
π¨ Image Tier System
All page images are resized on-demand via /api/image:
| Tier | Dimensions | Quality | π± Use Case |
|---|---|---|---|
| Thumbnail | 400px wide | 70% JPEG | Grids, navigation, social sharing |
| Display | 1200px wide | 80% JPEG | Main reading view, comfortable for annotation |
| Full | 2400px wide | 90% JPEG | Magnifier, fullscreen detail, printing |
Split pages are cropped non-destructively via coordinates; original images are always preserved.
π Processing Pipeline
π₯ Import β βοΈ Split Detection β βοΈ OCR β π£οΈ Translation β π¨ Enrichment β π Publishing- π₯ Import β Upload images, import from IA/Gallica via IIIF, or paste URLs
- βοΈ Split Detection β Detect two-page spreads; mark crop boundaries
- βοΈ OCR β Gemini Vision extracts text per page language
- π£οΈ Translation β Gemini translates to English with prior-page context for continuity
- π¨ Enrichment β Extract illustrations, generate summaries, assign collections
- π Publishing β Set
visible: true, mint DOI, push to search index
Batch endpoints process up to 5 pages/request using Gemini Batch API (50% cheaper).
π API Routes (Key)
Base URL: https://sourcelibrary.org (production) or http://localhost:3000 (local dev with .env.local configured).
Common 404 mistake: paths like
/api/bph/booksor/api/bph/books/[id]do not exist. BPH catalogue APIs live under/api/embed/bph/.... There is also no top-level/api/[tenant]/booksroute β tenant book listings use/api/books/libraryor the embed routes below.
Public read APIs (no auth required today)
| Endpoint | Method | Purpose |
|---|---|---|
/api/search?q=<query> | GET | Full-text search across books and page translations |
/api/books?limit=100&offset=0 | GET | Simple book list (global catalogue; visible: true, indexed only) |
/api/books/library?limit=100&skip=0 | GET | Rich browse API β search, sort, filters, collections |
/api/books/[id] | GET | Book metadata (accepts Mongo id or slug) |
/api/books/[id]/quote?page=<n> | GET | Citable quote + formatted citations (inline, footnote, BibTeX, DOI) |
/api/gallery?limit=24 | GET | Illustration / artwork search |
/api/image?url=<encoded-url>&w=400 | GET | On-demand image resize & crop |
/api/embed/bph/books?limit=24 | GET | BPH catalogue (paginated, searchable) |
/api/embed/bph/books/[slug] | GET | Single BPH book detail |
/api/embed/bph/featured | GET | Featured BPH books |
/api/embed/bph/collections | GET | BPH collection list |
/api/embed/bph/languages | GET | BPH language facets |
/api/embed/bph/suggest?q=alch | GET | BPH search autocomplete |
/api/embed/bph/stats | GET | BPH catalogue stats |
Tenant-scoped listing (not /api/bph/...)
Use one of these patterns to filter by partner tenant (e.g. BPH):
| Approach | Example |
|---|---|
| Embed prefix (recommended for BPH) | GET /api/embed/bph/books?limit=24 |
| Library API + query param | GET /api/books/library?tenant_slug=bph&limit=24 |
| Host header (subdomain) | Call https://bph.sourcelibrary.org/api/books/library?limit=24 β the proxy injects tenant context |
| Manual header (advanced) | curl -H "x-tenant-slug: bph" https://sourcelibrary.org/api/books/library?limit=24 |
The /api/[tenant]/books/[id]/... paths that exist in the codebase are editor/processing routes (batch OCR, index rebuild, etc.) β not public catalogue listings.
Authenticated / internal APIs
These require a signed-in session cookie, editor role, or (for some dataset endpoints) a Bearer API key. Calling them without auth returns 401 or 403.
| Endpoint | Method | Purpose |
|---|---|---|
/api/books | POST | Create a new book (editor) |
/api/books/[id] | PATCH | Update book metadata (curator+) |
/api/books/[id]/batch-ocr-async | POST | Queue batch OCR job |
/api/books/[id]/batch-translate-async | POST | Queue batch translation job |
/api/pages/[id] | PATCH | Update page OCR/translation |
/api/jobs/[id]/process | POST | Async job processor (Lambda) |
Full narrative API walkthrough: docs/blog-source-library-api.md. MCP tools (search, quote, read): mcp-server/README.md.
π§ͺ Trying the API (curl, Postman, browser)
All examples below hit production and need no API key. Replace the base URL with http://localhost:3000 when running locally (MongoDB + env vars required).
curl
# Search translated text
curl -s "https://sourcelibrary.org/api/search?q=quintessence&limit=5" | jq .
# List books (global catalogue)
curl -s "https://sourcelibrary.org/api/books?limit=5" | jq .
# Browse with filters and sort
curl -s "https://sourcelibrary.org/api/books/library?limit=5&sort=recent-translation&has_translation=true" | jq .
# BPH catalogue β note /api/embed/bph/, NOT /api/bph/
curl -s "https://sourcelibrary.org/api/embed/bph/books?limit=5&translated=true" | jq .
# BPH via
β¦