CLI
The gutenbit command-line tool provides seven subcommands that follow a natural workflow: find books, add them, explore their structure, read text, and search.
Installation
Gutenbit is not published on PyPI yet, so start by running the CLI directly from GitHub:
uvx --from git+https://github.com/keinan1/gutenbit gutenbit --help
Install it persistently with uv once you want the gutenbit command available without uvx:
uv tool install git+https://github.com/keinan1/gutenbit
Then run gutenbit --help. Remove it later with uv tool uninstall gutenbit.
Gutenbit stores its database and catalog cache in a .gutenbit/ folder.
All CLI-managed state lives under .gutenbit/ by default: the database is .gutenbit/gutenbit.db, and the catalog cache is stored under .gutenbit/cache/. Use --db PATH to store the database elsewhere. All commands support --json for machine-readable output.
Project Gutenberg Access
Use gutenbit for individual downloads, not bulk downloading. It prefers official mirrors and uses the main site only as a zip fallback, with a default 2.0 second delay between downloads. Review the Robot Access Policy and Terms of Use.
catalog
Search the Project Gutenberg catalog for books by metadata.
gutenbit catalog --author "Dickens"
gutenbit catalog --title "Christmas" --author "Dickens"
gutenbit catalog --author "Dickens" --refresh
gutenbit catalog --subject "Philosophy" --limit 50
| Flag | Description |
|---|---|
--author TEXT |
Filter by author (substring match) |
--title TEXT |
Filter by title (substring match) |
--subject TEXT |
Filter by subject (substring match) |
--language CODE |
Filter by language code (e.g. en) |
--limit N |
Maximum results (default: 20) |
--refresh |
Ignore the local catalog cache and redownload it now |
--json |
Output as JSON |
Filters combine with AND logic. All matching is case-insensitive. The catalog is cached locally for two hours, filtered to English text records, and can be forced to redownload with --refresh.
add
Download books from Project Gutenberg and store them in the database.
gutenbit add 1342
gutenbit add 46 730 967
gutenbit add 1342 --refresh
gutenbit add 2600 --delay 2.0
| Flag | Description |
|---|---|
BOOK_IDS |
One or more Project Gutenberg IDs (positional) |
--delay SECONDS |
Pause between downloads (default: 2.0) |
--refresh |
Ignore the local catalog cache and redownload it now |
--json |
Output as JSON |
Books already stored at the current chunker version are skipped. IDs that map to a different canonical edition are remapped automatically.
books
List all books stored in the database, or update stored books whose parser version is stale.
gutenbit books
gutenbit books --json
gutenbit books --update
gutenbit books --update --force
gutenbit books --update --dry-run
| Flag | Description |
|---|---|
--update |
Reprocess stored books whose parser version is stale |
--delay SECONDS |
Pause between downloads in update mode (default: 2.0) |
--force |
Reprocess all stored books in update mode, even if already current |
--dry-run |
Show which stored books would be updated without downloading |
--json |
Output as JSON |
Without --update, books behaves exactly as before and just lists stored books.
With --update, gutenbit checks the local database and reprocesses only books whose
stored text is out of date for the current chunker version. --force refreshes every
stored book, and --dry-run reports what would be refreshed without doing any work.
delete
Remove books and their chunks from the database.
gutenbit delete 1342
gutenbit delete 46 730 967
| Flag | Description |
|---|---|
BOOK_IDS |
One or more Project Gutenberg IDs (positional) |
--json |
Output as JSON |
Exits with code 1 if any requested ID was not found.
search
Full-text search across stored books using SQLite FTS5 with BM25 ranking. Search targets text chunks by default.
gutenbit search "bennet"
gutenbit search "don't stop" # punctuation is ok
gutenbit search "truth universally acknowledged" --phrase
gutenbit search "ghost OR spirit" --raw # FTS5 boolean query
gutenbit search "bennet" --book 1342 --order first
gutenbit search "truth universally acknowledged" --book 1342 --section 1 --phrase
gutenbit search "chapter" --book 1342 --kind heading
gutenbit search "bennet" --book 1342 --radius 1 # include surrounding passage
gutenbit search "bennet" --book 1342 --limit 3
gutenbit search "bennet" --book 1342 --count
| Flag | Description |
|---|---|
QUERY |
Search query (positional) |
--phrase |
Treat query as an exact phrase (mutually exclusive with --raw) |
--raw |
Pass query directly to FTS5 for advanced syntax (mutually exclusive with --phrase) |
--order ORDER |
rank (default), first, or last |
--author TEXT |
Filter by author (substring match) |
--title TEXT |
Filter by title (substring match) |
--book ID |
Restrict to a single book |
--kind KIND |
Chunk kind to search: text (default), heading, or all |
--section SELECTOR |
Restrict to a section by path prefix or number from toc (number requires --book) |
--limit N |
Maximum results (default: 10) |
--radius N |
Surrounding passage to include on each side of each hit |
--count |
Just print the number of matches |
--json |
Output as JSON |
Query modes
By default, punctuation in the query is auto-escaped so apostrophes, hyphens, and other punctuation just work. Tokens are implicitly AND'd.
- (default): Plain text — punctuation is auto-escaped, words are AND'd.
- --phrase: Exact phrase — word order and adjacency must match exactly.
- --raw: FTS5 syntax — AND, OR, NOT, NEAR(), prefix*, "phrases", (groups).
Search order
- rank: Results ordered by BM25 relevance score, then book, then position.
- first: Earliest matches. Ordered by book ascending, then position ascending.
- last: Latest matches. Ordered by book descending, then position descending.
Result shaping
- Use
--limitto control how many hits are returned. The default is 10. - Use
--radiusto read surrounding passage around each hit in normal reading order. --countcannot be combined with--radius.- Use
--kind headingto search structural headings, or--kind allto include both headings and text.
FTS5 query syntax
When using --raw, the query is passed directly to SQLite FTS5. Supported syntax:
| Syntax | Meaning |
|---|---|
war peace |
Both terms (implicit AND) |
war OR peace |
Either term |
war NOT peace |
First term, excluding second |
"to be or not" |
Exact phrase |
philos* |
Prefix match |
NEAR(war peace, 5) |
Terms within 5 tokens of each other |
(war OR battle) AND peace |
Grouped boolean logic |
Use --phrase to auto-wrap the entire query as an exact phrase without manual quoting.
toc
Show the structural table of contents for a stored book, with numbered sections.
gutenbit toc 1342
gutenbit toc 2600 --json
| Flag | Description |
|---|---|
BOOK_ID |
Project Gutenberg book ID (positional) |
--json |
Output as JSON |
Section numbers in the output can be passed to view --section or search --section.
view
Read stored book text. Starts at the first structural section by default. Use selectors to focus on a specific part.
gutenbit view 1342 # first structural section
gutenbit view 1342 --all # full book
gutenbit view 1342 --section 1 # section by number
gutenbit view 1342 --section 1 --all # full section
gutenbit view 1342 --section "Chapter 1" --forward 10 # section by path
gutenbit view 1342 --position 1 --forward 5 # from exact position
gutenbit view 1342 --position 1 --radius 2 # surrounding passage around position
gutenbit view 1342 --section 1 --radius 2 # surrounding passage around section start
| Flag | Description |
|---|---|
BOOK_ID |
Project Gutenberg book ID (positional) |
--section SELECTOR |
Section number (from toc) or path prefix (e.g. "BOOK I/CHAPTER I") |
--position N |
Exact chunk position |
--all |
Read the full selected scope (whole book or whole section) |
--forward N |
Passages to read forward (default: 3 for opening, 1 for section/position) |
--radius N |
Surrounding passage to include on each side of the selected center passage |
--json |
Output as JSON |
Use --section or --position, not both. --forward, --radius, and --all are mutually exclusive in view. Use --all for a whole book or whole section; it does not apply to --position. Run toc first to see available section numbers.
JSON output
Every command accepts --json and returns a unified envelope:
{
"ok": true,
"command": "search",
"data": { ... },
"warnings": [],
"errors": []
}
When ok is false, the errors list contains error messages. The data field holds command-specific results. The warnings list captures non-fatal issues (e.g. a requested ID not found during bulk delete).
For view, the response body is content-first. Successful responses include a shared passage shape: book, title, author, section, section_number, position, forward, radius, all, and content.
For search, data["order"] records the selected result order, data["filters"] includes the resolved kind, and data["items"] remains the hit list. Each hit uses that same passage shape, with search-specific fields such as kind, rank, and score appended after the shared fields. When --radius is used, content is the joined surrounding passage in reading order.
Global flags
These flags apply to all subcommands:
| Flag | Description |
|---|---|
--db PATH |
SQLite database path (default: .gutenbit/gutenbit.db) |
-v, --verbose |
Enable debug logging |