gutenbit.db
db
SQLite storage and full-text search for Project Gutenberg books.
ChunkRecord(chunk_id: int, book_id: int, div1: str, div2: str, div3: str, div4: str, position: int, content: str, kind: str, char_count: int)
dataclass
One stored chunk with structural metadata.
Database(path: str | Path)
SQLite database for storing and searching Project Gutenberg books.
book(book_id: int) -> BookRecord | None
Return one stored book by Project Gutenberg id.
books() -> list[BookRecord]
Return all stored books.
chunk_by_id(book_id: int, chunk_id: int) -> ChunkRecord | None
Return one chunk by internal row id within a specific book.
chunk_by_position(book_id: int, position: int) -> ChunkRecord | None
Return one chunk by structural position within a specific book.
chunk_records(book_id: int, *, kinds: list[str] | None = None) -> list[ChunkRecord]
Return all chunks for a book as ChunkRecord objects.
chunk_window(book_id: int, position: int, *, around: int = 0) -> list[ChunkRecord]
Return the selected position and N neighboring chunks on each side.
chunks(book_id: int, *, kinds: list[str] | None = None) -> list[tuple[int, str, str, str, str, str, str, int]]
Return chunks as (position, div1, div2, div3, div4, content, kind, char_count).
chunks_by_div(book_id: int, div_path: str, *, kinds: list[str] | None = None, limit: int = 0) -> list[ChunkRecord]
Return chunks under a division path prefix.
Each segment is matched exactly, except that the deepest query segment
also accepts a prefix match (so "CHAPTER I" matches
"CHAPTER I DESCRIPTION OF A PALACE"). Trailing punctuation is
always ignored.
delete_book(book_id: int) -> bool
Delete a stored book and all associated rows. Returns False if missing.
has_current_text(book_id: int) -> bool
Return True when stored text matches the current chunker version.
has_text(book_id: int) -> bool
Return True when a book has already been downloaded and stored.
ingest(books: list[BookRecord], *, delay: float = 1.0, force: bool = False) -> None
Download, chunk, and store books.
Enforces package ingestion boundaries: English text records only, with in-request duplicate work IDs collapsed to a canonical edition.
search(query: str, *, author: str | None = None, title: str | None = None, language: str | None = None, subject: str | None = None, book_id: int | None = None, kind: str | None = None, div_path: str | None = None, order: SearchOrder = 'rank', limit: int = 20) -> list[SearchResult]
Search chunks via FTS5 with BM25 ranking.
When div_path is given, results are post-filtered using the same
path-prefix matching as :meth:chunks_by_div (normalized, with
word-boundary prefix on the deepest segment).
search_count(query: str, *, author: str | None = None, title: str | None = None, language: str | None = None, subject: str | None = None, book_id: int | None = None, kind: str | None = None, div_path: str | None = None) -> int
Return the total number of search hits before any CLI display limit.
search_page(query: str, *, author: str | None = None, title: str | None = None, language: str | None = None, subject: str | None = None, book_id: int | None = None, kind: str | None = None, div_path: str | None = None, order: SearchOrder = 'rank', limit: int = 20) -> SearchPage
Return one CLI search page plus an exact total-hit count.
stale_books() -> list[BookRecord]
Return stored books whose text is missing or stale for this chunker version.
text(book_id: int) -> str | None
Return the clean text for a book, or None if not found.
text_states(book_ids: list[int]) -> dict[int, TextState]
Return stored text presence/currentness for the requested ids.
SearchPage(items: list[SearchResult], total_results: int)
dataclass
One CLI search page plus exact total-hit metadata.
SearchResult(chunk_id: int, book_id: int, title: str, authors: str, language: str, subjects: str, div1: str, div2: str, div3: str, div4: str, position: int, content: str, kind: str, char_count: int, score: float)
dataclass
A single search hit — one chunk with its book metadata.
TextState(has_text: bool, has_current_text: bool)
dataclass
Presence/currentness snapshot for one stored book.