gutenbit.catalog
catalog
Fetch and search the Project Gutenberg CSV catalog.
BookRecord(id: int, title: str, authors: str, language: str, subjects: str, locc: str, bookshelves: str, issued: str, type: str)
dataclass
A book entry from the Project Gutenberg catalog.
Catalog(records: list[BookRecord], *, canonical_id_by_id: dict[int, int] | None = None, fetch_info: CatalogFetchInfo | None = None)
The Project Gutenberg catalog, searchable in memory.
canonical_id(book_id: int) -> int | None
Resolve any known id to the canonical id under current policy.
fetch(*, policy: CatalogPolicy = DEFAULT_CATALOG_POLICY, cache_dir: str | Path | None = None, refresh: bool = False) -> Catalog
classmethod
Download the CSV catalog from Project Gutenberg.
get(book_id: int) -> BookRecord | None
Return a canonical book record for a requested id.
is_canonical_id(book_id: int) -> bool
Return True when an id is already canonical under current policy.
search(*, author: str = '', title: str = '', language: str = '', subject: str = '') -> list[BookRecord]
Search for books matching all given criteria.
All filters use case-insensitive matching. Each query is first tried as
a contiguous substring; if it contains multiple words and the substring
fails, every word must appear individually (so "Jane Austen" matches
"Austen, Jane, 1775-1817").
CatalogFetchInfo(source: Literal['cache', 'downloaded', 'stale_cache'], cache_path: Path, cache_age_seconds: float | None = None)
dataclass
How a catalog payload was loaded.
CatalogPolicy(allowed_language_codes: frozenset[str] = CATALOG_ALLOWED_LANGUAGE_CODES, allowed_media_types: frozenset[str] = CATALOG_ALLOWED_MEDIA_TYPES, dedupe_strategy: CatalogDedupeStrategy = CATALOG_DEDUPE_STRATEGY)
dataclass
Config for catalog boundary enforcement.
apply_catalog_policy(records: Iterable[BookRecord], *, policy: CatalogPolicy = DEFAULT_CATALOG_POLICY) -> tuple[list[BookRecord], dict[int, int]]
Filter and canonicalize records according to catalog policy.
Returns a tuple of: 1. Canonical records after boundaries are applied. 2. Mapping from original record id -> canonical record id.
is_record_allowed(record: BookRecord, *, policy: CatalogPolicy = DEFAULT_CATALOG_POLICY) -> bool
Return True when a record is within the catalog policy.
work_key(record: BookRecord) -> tuple[str, str] | None
Return a conservative key for canonical duplicate detection.