> ## Documentation Index
> Fetch the complete documentation index at: https://docs.lerian.studio/llms.txt
> Use this file to discover all available pages before exploring further.

# Extraction reviews

> Review, approve, or reject AI-extracted transaction candidates before they are ingested, and use mapping proposals and job actions to prepare source data.

Matcher can extract transaction candidates from documents and propose field mappings using AI — but **AI output is never authoritative**. Nothing is reconciled until a human approves it. This guide covers the human-in-the-loop (HITL) extraction-review queue, AI mapping proposals, and the related job actions.

<Note>The document-extraction lane is gated by a global kill-switch **and** a per-tenant opt-in. A tenant that has not opted in receives `403` before any document bytes are stored or egressed.</Note>

## Enqueue a document for extraction

***

Upload a source document (PDF) to run deterministic + AI extraction. The resulting transaction candidates are queued in a review — nothing is reconciled yet.

```bash theme={null}
curl -X POST "https://api.matcher.example.com/v1/imports/contexts/{contextId}/sources/{sourceId}/extract-document" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/pdf" \
  --data-binary @statement.pdf
```

The response (`202 Accepted`) returns the queued review id, the candidate count, and a status that is always `PENDING_REVIEW` on enqueue:

```json theme={null}
{
  "reviewId": "550e8400-e29b-41d4-a716-446655440000",
  "candidateCount": 12,
  "status": "PENDING_REVIEW"
}
```

## The review queue

***

### List reviews

Cursor-paginated list of extraction reviews for a context, optionally filtered by lifecycle status.

```bash theme={null}
curl -X GET "https://api.matcher.example.com/v1/imports/contexts/{contextId}/extraction-reviews?status=PENDING_REVIEW&limit=50" \
  -H "Authorization: Bearer $TOKEN"
```

Query parameters: `status` (`PENDING_REVIEW`, `APPROVED`, `REJECTED`), `limit` (1–200), and `cursor`.

### Get one review

```bash theme={null}
curl -X GET "https://api.matcher.example.com/v1/imports/contexts/{contextId}/extraction-reviews/{reviewId}" \
  -H "Authorization: Bearer $TOKEN"
```

A review carries its lifecycle, the proposed candidates, provenance, and linkage state:

```json theme={null}
{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "contextId": "550e8400-e29b-41d4-a716-446655440000",
  "sourceId": "550e8400-e29b-41d4-a716-446655440000",
  "status": "PENDING_REVIEW",
  "candidates": [
    {
      "source": "text_layer",
      "fields": [
        { "canonicalKey": "amount", "value": "100.50", "confidence": 0.95, "page": 1 },
        { "canonicalKey": "date", "value": "2025-06-01", "confidence": 0.9, "page": 1 }
      ]
    }
  ],
  "version": 1,
  "createdAt": "2025-01-15T10:30:00Z",
  "updatedAt": "2025-01-15T10:30:00Z"
}
```

Each candidate declares the lane that produced it: `text_layer` (PDF text, higher trust) or `vision` (OCR/vision model, lower trust). Field values are **verbatim tokens** — money stays a string, never a parsed amount.

## Approve or reject

***

### Approve

Approving a `PENDING_REVIEW` review runs the single deterministic handoff into the normal ingestion pipeline (dedup + outbox + match-trigger) and links the resulting job to the review. This is the **only** path from an AI candidate to a reconciled transaction, and it runs only on explicit human approval.

```bash theme={null}
curl -X POST "https://api.matcher.example.com/v1/imports/contexts/{contextId}/extraction-reviews/{reviewId}/approve" \
  -H "Authorization: Bearer $TOKEN"
```

```json theme={null}
{
  "reviewId": "550e8400-e29b-41d4-a716-446655440000",
  "ingestionJobId": "550e8400-e29b-41d4-a716-446655440000",
  "candidateCount": 12
}
```

### Reject

Rejecting discards the candidates — nothing is ingested. The body is optional; an empty body is a valid "reject with no reason".

```bash theme={null}
curl -X POST "https://api.matcher.example.com/v1/imports/contexts/{contextId}/extraction-reviews/{reviewId}/reject" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "reason": "poor scan quality, re-upload" }'
```

The approving/rejecting principal is recorded for audit.

## Mapping proposals

***

Before you declare a field map by hand, ask the advisor to inspect a representative sample and propose a **config-only** mapping. It is advisory and side-effect-free: producing a proposal **persists nothing**. You confirm the result through the existing field-map declaration path.

```bash theme={null}
curl -X POST "https://api.matcher.example.com/v1/imports/contexts/{contextId}/sources/{sourceId}/mapping-proposal" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "sample": "id;value;ccy;posted_at\nA1;10,50;BRL;2025-06-01\n",
    "format": "csv",
    "hints": { "locale": "pt-BR", "has_header": "true" }
  }'
```

The response carries the proposed field map, source dialect, and a per-field breakdown with confidence and rationale:

```json theme={null}
{
  "mapping": { "amount": "value", "external_id": "id" },
  "dialect": {
    "encoding": "utf-8",
    "delimiter": "semicolon",
    "decimalStyle": "comma",
    "dateStyle": "iso"
  },
  "fields": [
    { "canonicalKey": "amount", "sourceColumn": "value", "confidence": 0.92, "rationale": "numeric column with comma decimal" }
  ]
}
```

The response never carries parsed values, amounts, or transactions.

## Fetch from an external transport

***

Trigger a manual fetch-and-ingest that lists every object matching the supplied transport coordinates (SFTP today) and streams each into the trusted-content ingestion pipeline. The body carries connection coordinates plus an **opaque credential reference — never a secret**.

```bash theme={null}
curl -X POST "https://api.matcher.example.com/v1/imports/contexts/{contextId}/sources/{sourceId}/fetch" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "kind": "sftp",
    "host": "sftp.bank.example",
    "port": 22,
    "path": "outbound/returns",
    "glob": "*.ret",
    "credentialRef": "cred-handle-123",
    "format": "br/cnab240/febraban-base"
  }'
```

The response (`202 Accepted`) returns a per-file outcome in fetch order. Per-file intake failures are reported without failing the batch:

```json theme={null}
{
  "files": [
    { "name": "statement-2025-06.ret", "ingestionJobId": "550e8400-...", "transactionCount": 42 }
  ]
}
```

A transport-level failure (endpoint unreachable or credential rejected) returns `503`.

## Inspect job errors

***

After an import, list the stored per-row parse/normalization errors for a job (capped at 100 per job) to explain failed or partially-failed imports.

```bash theme={null}
curl -X GET "https://api.matcher.example.com/v1/imports/contexts/{contextId}/jobs/{jobId}/errors" \
  -H "Authorization: Bearer $TOKEN"
```

```json theme={null}
{
  "items": [ ... ],
  "totalErrors": 137,
  "storedErrors": 100,
  "errorCap": 100,
  "truncated": true
}
```

`totalErrors` is the uncapped failure total; `truncated` is `true` when it exceeds the stored (capped) set.

## Response codes

***

| Status | Meaning                                                               |
| ------ | --------------------------------------------------------------------- |
| `200`  | Review, list, mapping proposal, or job errors returned                |
| `202`  | Document enqueued / fetch accepted                                    |
| `400`  | Invalid input (empty body, bad status filter, invalid pagination)     |
| `403`  | Tenant not opted into document extraction                             |
| `404`  | Review or job not found                                               |
| `409`  | Invalid review state transition                                       |
| `422`  | No candidates could be extracted / mapping sample rejected downstream |
| `503`  | Extraction, review, proposal, or fetch not enabled on this deployment |
