Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion docs/concepts/custom_field_query.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ The `custom_field_query` filter lets you search documents by custom field values
| ------------------------------------ | ----------------- | ---------------------- |
| `CustomFieldQuery(field, op, value)` | — | `[field, op, value]` |
| `CustomFieldQueryAnd(q1, q2, …)` | `q1 & q2` | `["AND", [q1, q2, …]]` |
| `CustomFieldQueryOr(q1, q2, …)` | `q1 | q2` | `["OR", [q1, q2, …]]` |
| `CustomFieldQueryOr(q1, q2, …)` | `q1 \| q2` | `["OR", [q1, q2, …]]` |
| `CustomFieldQueryNot(q)` | `~q` | `["NOT", q]` |

Import from `pypaperless.models.custom_field_query` (or via `pypaperless.models.types`):
Expand Down Expand Up @@ -136,3 +136,5 @@ Paperless-ngx enforces:
- Maximum number of atoms: **20**

Exceeding these limits returns a HTTP 400 validation error.

The full filter specification is in the [Paperless-ngx API reference](https://docs.paperless-ngx.com/api/#filtering-by-custom-fields).
27 changes: 13 additions & 14 deletions docs/concepts/custom_fields.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Each custom field has a `data_type` which determines the type of its value:
| `INTEGER` | `int` | Integer number |
| `FLOAT` | `float` | Floating point number |
| `MONETARY` | `str` | Currency amount, e.g. `"EUR12.50"` |
| `SELECT` | `int | str` | Selection from predefined options |
| `SELECT` | `int` or `str` | Selection from predefined options |
| `DOCUMENT_LINK` | `list[int]` | Links to other documents by ID |

---
Expand Down Expand Up @@ -91,8 +91,7 @@ print(value.value)
Use `default()` to return `None` instead of raising when the field is absent:

```python
value = document.custom_fields.default(8)
if value is not None:
if value := document.custom_fields.default(8):
print(value.value)
```

Expand All @@ -115,17 +114,17 @@ A `TypeError` is raised if the actual type does not match `expected_type`.

When the cache is active, pypaperless instantiates the right subclass automatically:

| Class | `data_type` | `value` type |
| ------------------------------ | -------------------- | ---------------- |
| `CustomFieldStringValue` | `STRING`, `LONGTEXT` | `str` |
| `CustomFieldURLValue` | `URL` | `str` |
| `CustomFieldDateValue` | `DATE` | `datetime.date` |
| `CustomFieldBooleanValue` | `BOOLEAN` | `bool` |
| `CustomFieldIntegerValue` | `INTEGER` | `int` |
| `CustomFieldFloatValue` | `FLOAT` | `float` |
| `CustomFieldMonetaryValue` | `MONETARY` | `str` |
| `CustomFieldSelectValue` | `SELECT` | `int | str` |
| `CustomFieldDocumentLinkValue` | `DOCUMENT_LINK` | `list[int]` |
| Class | `data_type` | `value` type |
| ------------------------------ | -------------------- | --------------- |
| `CustomFieldStringValue` | `STRING`, `LONGTEXT` | `str` |
| `CustomFieldURLValue` | `URL` | `str` |
| `CustomFieldDateValue` | `DATE` | `datetime.date` |
| `CustomFieldBooleanValue` | `BOOLEAN` | `bool` |
| `CustomFieldIntegerValue` | `INTEGER` | `int` |
| `CustomFieldFloatValue` | `FLOAT` | `float` |
| `CustomFieldMonetaryValue` | `MONETARY` | `str` |
| `CustomFieldSelectValue` | `SELECT` | `int` or `str` |
| `CustomFieldDocumentLinkValue` | `DOCUMENT_LINK` | `list[int]` |

### `CustomFieldMonetaryValue` extras

Expand Down
122 changes: 39 additions & 83 deletions docs/concepts/documents.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,11 @@ document = await paperless.documents(42)

print(document.id)
print(document.title)
print(document.correspondent) # int (id) or None
print(document.document_type) # int (id) or None
print(document.tags) # list[int]
print(document.created) # datetime.date | None
print(document.content) # extracted text content
print(document.correspondent)
print(document.document_type)
print(document.tags)
print(document.created)
print(document.content)
print(document.page_count)
print(document.mime_type)
print(document.archive_serial_number)
Expand Down Expand Up @@ -43,15 +43,16 @@ preview = await doc.get_preview()
thumbnail = await doc.get_thumbnail()
```

The `DownloadedDocument` holds:
`DownloadedDocument` gives you the raw bytes plus everything from the response headers you'd need to save or serve the file:

| Attribute | Description |
| ---------------------- | --------------------------------------------- |
| `content` | Raw binary file data |
| `content_type` | MIME type, e.g. `"application/pdf"` |
| `disposition_type` | `"attachment"` or `"inline"` |
| `disposition_filename` | Suggested filename from `Content-Disposition` |
| `original` | Whether the original file was requested |
```python
# save to disk using the filename suggested by the API
with open(download.disposition_filename, "wb") as f:
f.write(download.content)

print(download.content_type) # e.g. "application/pdf"
print(download.disposition_type) # "attachment" or "inline"
```

### Requesting the original file

Expand Down Expand Up @@ -79,30 +80,27 @@ async for document in paperless.documents.search(query="annual report"):
...
```

### Custom field query

```python
async for document in paperless.documents.search(
custom_field_query='["amount", "gte", 10000]'
):
...
```

Custom field query syntax is documented in the [Paperless-ngx API reference](https://docs.paperless-ngx.com/api/#filtering-by-custom-fields).

### Search hits

When a document was returned from a search, it carries a `DocumentSearchHit`:
When a document was returned from a search, it carries a `DocumentSearchHit`. Use `has_search_hit` to branch on it, or the walrus operator to check and bind in one step:

```python
if document.has_search_hit:
hit = document.search_hit
print(f"{document.title} matched the query")

if hit := document.search_hit:
print(hit.score)
print(hit.highlights)
print(hit.note_highlights)
print(hit.rank)
```

`search_hit` is `None` for documents fetched directly (e.g. `paperless.documents(42)`).

### Custom field query

For building expressions in a type-safe way, see [Custom field query](custom_field_query.md).

---

## More-like search
Expand Down Expand Up @@ -146,11 +144,11 @@ suggestions = await paperless.documents.suggestions(42)
doc = await paperless.documents(42)
suggestions = await doc.get_suggestions()

print(suggestions.correspondents) # list[int]
print(suggestions.document_types) # list[int]
print(suggestions.tags) # list[int]
print(suggestions.storage_paths) # list[int]
print(suggestions.dates) # list[datetime.date]
print(suggestions.correspondents)
print(suggestions.document_types)
print(suggestions.tags)
print(suggestions.storage_paths)
print(suggestions.dates)
```

---
Expand Down Expand Up @@ -209,21 +207,23 @@ print(f"Next ASN: {next_asn}")

## Uploading a document

Use `draft()` to construct a document upload and `save()` to submit it. The document content must be provided as `bytes`.
Use `draft()` to construct a document upload and `save()` to submit it. The document content must be provided as `bytes`. All fields except `document` are optional.

```python
with open("invoice.pdf", "rb") as f:
content = f.read()

draft = paperless.documents.draft(
document=content,
filename="invoice.pdf",
document=content, # required — raw file bytes
filename="invoice.pdf", # original filename
title="Invoice 2024-01",
created=datetime.datetime(2024, 1, 15),
correspondent=3,
document_type=2,
tags=[1, 5],
correspondent=3, # correspondent ID
document_type=2, # document type ID
storage_path=1, # storage path ID
tags=[1, 5], # tag IDs
archive_serial_number=1042,
custom_fields=[3, 8], # custom field IDs (Paperless assigns null values)
)

task_id = await paperless.documents.save(draft)
Expand All @@ -233,35 +233,9 @@ print(f"Upload queued as task: {task_id}")
!!! note
Unlike other resources, `save()` for documents returns a **task ID string**, not an integer ID. The document is processed asynchronously by Paperless-ngx. Use `paperless.tasks` to monitor the task.

### Document draft fields

| Field | Description |
| ----------------------- | ------------------------------ |
| `document` | **Required.** Raw file content |
| `filename` | Optional original filename |
| `title` | Document title |
| `created` | Document creation date |
| `correspondent` | Correspondent ID |
| `document_type` | Document type ID |
| `storage_path` | Storage path ID |
| `tags` | Tag IDs |
| `archive_serial_number` | Archive serial number |
| `custom_fields` | Custom field assignments |

### Uploading with custom fields

You can attach custom fields in two ways:

**As a list of field IDs** (Paperless assigns `null` as value):

```python
draft = paperless.documents.draft(
document=content,
custom_fields=[3, 8],
)
```
### Uploading with custom field values

**As a `DocumentCustomFieldList`** (with explicit values):
To set explicit values on custom fields at upload time, use `DocumentCustomFieldList`:

```python
from pypaperless.models.documents import DocumentCustomFieldList
Expand Down Expand Up @@ -365,21 +339,3 @@ for entry in entries:
# Via the service, passing the document pk explicitly
entries = await paperless.documents.history(42)
```

### `DocumentHistory` fields

| Field | Description |
| ----------- | ----------------------------------------------------- |
| `id` | Entry id |
| `document` | Document pk (injected by the service layer) |
| `timestamp` | When the change occurred |
| `action` | `"create"` or `"update"` (`DocumentHistoryAction`) |
| `changes` | Dict mapping field names to `[old, new]` pairs |
| `actor` | The user who made the change (`DocumentHistoryActor`) |

### `DocumentHistoryActor` fields

| Field | Description |
| ---------- | --------------- |
| `id` | User id |
| `username` | Username string |
12 changes: 6 additions & 6 deletions docs/concepts/permissions.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,8 @@ print(doc.owner)
print(doc.user_can_change)

if doc.has_permissions:
print(doc.permissions.view.users) # list[int]
print(doc.permissions.view.groups) # list[int]
print(doc.permissions.view.users)
print(doc.permissions.view.groups)
print(doc.permissions.change.users)
print(doc.permissions.change.groups)
```
Expand Down Expand Up @@ -164,10 +164,10 @@ Permissions(view_users=[2, 3], change_users=[2], change_groups=[1])

# Read individual scopes
perms = doc.permissions
perms.view.users # list[int]
perms.view.groups # list[int]
perms.change.users # list[int]
perms.change.groups # list[int]
perms.view.users
perms.view.groups
perms.change.users
perms.change.groups
```

---
Expand Down
10 changes: 4 additions & 6 deletions docs/migrating-v5-to-v6.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,10 @@
# Migrating from v5 to v6

v6 is a full rewrite of pypaperless, motivated by three concrete problems with v5:
v6 is almost a full rewrite of pypaperless. Three things drove it:

- **Tight coupling between models and the HTTP layer.** In v5, every model instance held a reference to the `Paperless` client and called it directly for `update()`, `delete()`, and `save()`. This made models hard to construct in tests and impossible to pass between different client contexts. v6 makes models pure data containers — all I/O goes through service objects.
- **Runtime type safety.** v5 used plain dataclasses with manual dict-to-object conversion. v6 uses Pydantic v2, which validates all incoming API data against the declared field types at parse time. The Pydantic team's own benchmarks show v2 parses 5–17× faster than v1 and catches malformed payloads that would previously have silently produced wrong values.
- **HTTP client ergonomics.** `aiohttp` requires an explicit session lifecycle and has no built-in sync support. `httpx` has a unified sync/async API, ships with a `MockTransport` for testing without a live server, and handles connection pooling transparently.

The changes are mechanical at call sites but the reasoning is architectural.
- **Models were too tightly coupled to the HTTP layer.** In v5, every model instance carried a reference to the client and called it directly. That made testing awkward and sharing models between contexts impossible. v6 models are plain data — all I/O goes through services.
- **No runtime type safety.** v5 used dataclasses with manual dict conversion, so bad API responses would silently produce wrong values. v6 uses Pydantic v2, which validates every response at parse time.
- **`aiohttp` got removed.** `httpx` is modern, has a cleaner sync/async API and a built-in mock transport that makes testing easier.

---

Expand Down
21 changes: 1 addition & 20 deletions docs/resources/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,26 +4,7 @@ The `config` resource exposes the Paperless-ngx application configuration. It is

## Model

| Field | Description |
| ------------------------ | ----------------------------------- |
| `id` | Always `1` |
| `user_args` | Extra arguments passed to Tesseract |
| `output_type` | OCR output format (e.g. `"pdf"`) |
| `pages` | Number of pages to OCR (0 = all) |
| `language` | OCR language code |
| `mode` | OCR mode |
| `skip_archive_file` | Archive-skip policy |
| `image_dpi` | DPI for rasterized images |
| `deskew` | Enable deskew |
| `rotate_pages` | Enable automatic page rotation |
| `rotate_pages_threshold` | Rotation confidence threshold |
| `max_image_pixels` | Maximum image size in pixels |
| `app_title` | Paperless web UI title |
| `app_logo` | Paperless web UI logo path |
| `barcodes_enabled` | Enable barcode processing |
| `barcode_string` | Custom barcode separator string |
| `barcode_enable_asn` | Detect ASN from barcodes |
| `barcode_asn_prefix` | Expected ASN barcode prefix |
See [`pypaperless/models/config.py`](https://github.com/tb1337/paperless-api/blob/main/pypaperless/models/config.py) for all fields and types, and the [Paperless-ngx API docs](https://docs.paperless-ngx.com/api/) for the upstream schema.

## Fetch

Expand Down
20 changes: 3 additions & 17 deletions docs/resources/correspondents.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,27 +4,13 @@ Correspondents represent the senders or recipients that Paperless-ngx associates

## Models

### `Correspondent`

| Field | Description |
| --------------------- | ----------------------------------------- |
| `id` | Primary key |
| `slug` | URL-safe identifier |
| `name` | Display name |
| `document_count` | Number of assigned documents |
| `last_correspondence` | Date of the most recent matching document |

### `CorrespondentDraft`

| Field | Description |
| ------ | --------------------------------- |
| `name` | Display name *(required on save)* |
See [`pypaperless/models/correspondents.py`](https://github.com/tb1337/paperless-api/blob/main/pypaperless/models/correspondents.py) for all fields and types, and the [Paperless-ngx API docs](https://docs.paperless-ngx.com/api/) for the upstream schema.

## Fetch one

```python
correspondent = await paperless.correspondents(7)
print(correspondent.name) # "ACME Corp"
print(correspondent.name) # "ACME Corp"
print(correspondent.document_count) # 42
```

Expand All @@ -39,7 +25,7 @@ all_correspondents = await paperless.correspondents.as_list()

# Fetch only a subset matching a filter
filtered = [
c async for c in paperless.correspondents.filter()
c async for c in paperless.correspondents
if c.document_count and c.document_count > 0
]
```
Expand Down
23 changes: 3 additions & 20 deletions docs/resources/custom_fields.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,24 +4,7 @@ Custom fields let you attach arbitrary typed metadata to documents. This page co

## Models

### `CustomField`

| Field | Description |
| ------------ | --------------------------- |
| `id` | Primary key |
| `name` | Display name |
| `data_type` | Field type (see below) |
| `extra_data` | Type-specific configuration |

**`CustomFieldType` values:** `string`, `longtext`, `url`, `date`, `boolean`, `integer`, `float`, `monetary`, `documentlink`, `select`

### `CustomFieldDraft`

| Field | Description |
| ------------ | --------------------------------- |
| `name` | Display name *(required on save)* |
| `data_type` | Field type *(required on save)* |
| `extra_data` | Type-specific extras |
See [`pypaperless/models/custom_fields.py`](https://github.com/tb1337/paperless-api/blob/main/pypaperless/models/custom_fields.py) for all fields and types, and the [Paperless-ngx API docs](https://docs.paperless-ngx.com/api/) for the upstream schema.

## Fetch one

Expand All @@ -37,8 +20,8 @@ print(field.data_type) # CustomFieldType.MONETARY
async for field in paperless.custom_fields:
print(field.id, field.name, field.data_type)

# Build a name → id lookup
field_map = {f.name: f.id async for f in paperless.custom_fields.filter()}
# Keyed by id
fields = await paperless.custom_fields.as_dict()
```

## Create
Expand Down
Loading