Matcher can extract transaction candidates from documents and propose field mappings using AI — but AI output is never authoritative. Nothing is reconciled until a human approves it. This guide covers the human-in-the-loop (HITL) extraction-review queue, AI mapping proposals, and the related job actions.
The document-extraction lane is gated by a global kill-switch and a per-tenant opt-in. A tenant that has not opted in receives 403 before any document bytes are stored or egressed.
Upload a source document (PDF) to run deterministic + AI extraction. The resulting transaction candidates are queued in a review — nothing is reconciled yet.
curl -X POST "https://api.matcher.example.com/v1/imports/contexts/{contextId}/sources/{sourceId}/extract-document" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/pdf" \
--data-binary @statement.pdf
The response (202 Accepted) returns the queued review id, the candidate count, and a status that is always PENDING_REVIEW on enqueue:
{
"reviewId": "550e8400-e29b-41d4-a716-446655440000",
"candidateCount": 12,
"status": "PENDING_REVIEW"
}
The review queue
List reviews
Cursor-paginated list of extraction reviews for a context, optionally filtered by lifecycle status.
curl -X GET "https://api.matcher.example.com/v1/imports/contexts/{contextId}/extraction-reviews?status=PENDING_REVIEW&limit=50" \
-H "Authorization: Bearer $TOKEN"
Query parameters: status (PENDING_REVIEW, APPROVED, REJECTED), limit (1–200), and cursor.
Get one review
curl -X GET "https://api.matcher.example.com/v1/imports/contexts/{contextId}/extraction-reviews/{reviewId}" \
-H "Authorization: Bearer $TOKEN"
A review carries its lifecycle, the proposed candidates, provenance, and linkage state:
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"contextId": "550e8400-e29b-41d4-a716-446655440000",
"sourceId": "550e8400-e29b-41d4-a716-446655440000",
"status": "PENDING_REVIEW",
"candidates": [
{
"source": "text_layer",
"fields": [
{ "canonicalKey": "amount", "value": "100.50", "confidence": 0.95, "page": 1 },
{ "canonicalKey": "date", "value": "2025-06-01", "confidence": 0.9, "page": 1 }
]
}
],
"version": 1,
"createdAt": "2025-01-15T10:30:00Z",
"updatedAt": "2025-01-15T10:30:00Z"
}
Each candidate declares the lane that produced it: text_layer (PDF text, higher trust) or vision (OCR/vision model, lower trust). Field values are verbatim tokens — money stays a string, never a parsed amount.
Approve or reject
Approve
Approving a PENDING_REVIEW review runs the single deterministic handoff into the normal ingestion pipeline (dedup + outbox + match-trigger) and links the resulting job to the review. This is the only path from an AI candidate to a reconciled transaction, and it runs only on explicit human approval.
curl -X POST "https://api.matcher.example.com/v1/imports/contexts/{contextId}/extraction-reviews/{reviewId}/approve" \
-H "Authorization: Bearer $TOKEN"
{
"reviewId": "550e8400-e29b-41d4-a716-446655440000",
"ingestionJobId": "550e8400-e29b-41d4-a716-446655440000",
"candidateCount": 12
}
Reject
Rejecting discards the candidates — nothing is ingested. The body is optional; an empty body is a valid “reject with no reason”.
curl -X POST "https://api.matcher.example.com/v1/imports/contexts/{contextId}/extraction-reviews/{reviewId}/reject" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{ "reason": "poor scan quality, re-upload" }'
The approving/rejecting principal is recorded for audit.
Mapping proposals
Before you declare a field map by hand, ask the advisor to inspect a representative sample and propose a config-only mapping. It is advisory and side-effect-free: producing a proposal persists nothing. You confirm the result through the existing field-map declaration path.
curl -X POST "https://api.matcher.example.com/v1/imports/contexts/{contextId}/sources/{sourceId}/mapping-proposal" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"sample": "id;value;ccy;posted_at\nA1;10,50;BRL;2025-06-01\n",
"format": "csv",
"hints": { "locale": "pt-BR", "has_header": "true" }
}'
The response carries the proposed field map, source dialect, and a per-field breakdown with confidence and rationale:
{
"mapping": { "amount": "value", "external_id": "id" },
"dialect": {
"encoding": "utf-8",
"delimiter": "semicolon",
"decimalStyle": "comma",
"dateStyle": "iso"
},
"fields": [
{ "canonicalKey": "amount", "sourceColumn": "value", "confidence": 0.92, "rationale": "numeric column with comma decimal" }
]
}
The response never carries parsed values, amounts, or transactions.
Fetch from an external transport
Trigger a manual fetch-and-ingest that lists every object matching the supplied transport coordinates (SFTP today) and streams each into the trusted-content ingestion pipeline. The body carries connection coordinates plus an opaque credential reference — never a secret.
curl -X POST "https://api.matcher.example.com/v1/imports/contexts/{contextId}/sources/{sourceId}/fetch" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"kind": "sftp",
"host": "sftp.bank.example",
"port": 22,
"path": "outbound/returns",
"glob": "*.ret",
"credentialRef": "cred-handle-123",
"format": "br/cnab240/febraban-base"
}'
The response (202 Accepted) returns a per-file outcome in fetch order. Per-file intake failures are reported without failing the batch:
{
"files": [
{ "name": "statement-2025-06.ret", "ingestionJobId": "550e8400-...", "transactionCount": 42 }
]
}
A transport-level failure (endpoint unreachable or credential rejected) returns 503.
Inspect job errors
After an import, list the stored per-row parse/normalization errors for a job (capped at 100 per job) to explain failed or partially-failed imports.
curl -X GET "https://api.matcher.example.com/v1/imports/contexts/{contextId}/jobs/{jobId}/errors" \
-H "Authorization: Bearer $TOKEN"
{
"items": [ ... ],
"totalErrors": 137,
"storedErrors": 100,
"errorCap": 100,
"truncated": true
}
totalErrors is the uncapped failure total; truncated is true when it exceeds the stored (capped) set.
Response codes
| Status | Meaning |
|---|
200 | Review, list, mapping proposal, or job errors returned |
202 | Document enqueued / fetch accepted |
400 | Invalid input (empty body, bad status filter, invalid pagination) |
403 | Tenant not opted into document extraction |
404 | Review or job not found |
409 | Invalid review state transition |
422 | No candidates could be extracted / mapping sample rejected downstream |
503 | Extraction, review, proposal, or fetch not enabled on this deployment |