Skip to content

gutenbit.catalog

catalog

Fetch and search the Project Gutenberg CSV catalog.

BookRecord(id: int, title: str, authors: str, language: str, subjects: str, locc: str, bookshelves: str, issued: str, type: str) dataclass

A book entry from the Project Gutenberg catalog.

Catalog(records: list[BookRecord], *, canonical_id_by_id: dict[int, int] | None = None, fetch_info: CatalogFetchInfo | None = None)

The Project Gutenberg catalog, searchable in memory.

canonical_id(book_id: int) -> int | None

Resolve any known id to the canonical id under current policy.

fetch(*, policy: CatalogPolicy = DEFAULT_CATALOG_POLICY, cache_dir: str | Path | None = None, refresh: bool = False) -> Catalog classmethod

Download the CSV catalog from Project Gutenberg.

get(book_id: int) -> BookRecord | None

Return a canonical book record for a requested id.

is_canonical_id(book_id: int) -> bool

Return True when an id is already canonical under current policy.

search(*, author: str = '', title: str = '', language: str = '', subject: str = '') -> list[BookRecord]

Search for books matching all given criteria.

All filters use case-insensitive matching. Each query is first tried as a contiguous substring; if it contains multiple words and the substring fails, every word must appear individually (so "Jane Austen" matches "Austen, Jane, 1775-1817").

CatalogFetchInfo(source: Literal['cache', 'downloaded', 'stale_cache'], cache_path: Path, cache_age_seconds: float | None = None) dataclass

How a catalog payload was loaded.

CatalogPolicy(allowed_language_codes: frozenset[str] = CATALOG_ALLOWED_LANGUAGE_CODES, allowed_media_types: frozenset[str] = CATALOG_ALLOWED_MEDIA_TYPES, dedupe_strategy: CatalogDedupeStrategy = CATALOG_DEDUPE_STRATEGY) dataclass

Config for catalog boundary enforcement.

apply_catalog_policy(records: Iterable[BookRecord], *, policy: CatalogPolicy = DEFAULT_CATALOG_POLICY) -> tuple[list[BookRecord], dict[int, int]]

Filter and canonicalize records according to catalog policy.

Returns a tuple of: 1. Canonical records after boundaries are applied. 2. Mapping from original record id -> canonical record id.

is_record_allowed(record: BookRecord, *, policy: CatalogPolicy = DEFAULT_CATALOG_POLICY) -> bool

Return True when a record is within the catalog policy.

work_key(record: BookRecord) -> tuple[str, str] | None

Return a conservative key for canonical duplicate detection.