Skip to main content
Matcher can extract transaction candidates from documents and propose field mappings using AI — but AI output is never authoritative. Nothing is reconciled until a human approves it. This guide covers the human-in-the-loop (HITL) extraction-review queue, AI mapping proposals, and the related job actions.
The document-extraction lane is gated by a global kill-switch and a per-tenant opt-in. A tenant that has not opted in receives 403 before any document bytes are stored or egressed.

Enqueue a document for extraction


Upload a source document (PDF) to run deterministic + AI extraction. The resulting transaction candidates are queued in a review — nothing is reconciled yet.
curl -X POST "https://api.matcher.example.com/v1/imports/contexts/{contextId}/sources/{sourceId}/extract-document" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/pdf" \
  --data-binary @statement.pdf
The response (202 Accepted) returns the queued review id, the candidate count, and a status that is always PENDING_REVIEW on enqueue:
{
  "reviewId": "550e8400-e29b-41d4-a716-446655440000",
  "candidateCount": 12,
  "status": "PENDING_REVIEW"
}

The review queue


List reviews

Cursor-paginated list of extraction reviews for a context, optionally filtered by lifecycle status.
curl -X GET "https://api.matcher.example.com/v1/imports/contexts/{contextId}/extraction-reviews?status=PENDING_REVIEW&limit=50" \
  -H "Authorization: Bearer $TOKEN"
Query parameters: status (PENDING_REVIEW, APPROVED, REJECTED), limit (1–200), and cursor.

Get one review

curl -X GET "https://api.matcher.example.com/v1/imports/contexts/{contextId}/extraction-reviews/{reviewId}" \
  -H "Authorization: Bearer $TOKEN"
A review carries its lifecycle, the proposed candidates, provenance, and linkage state:
{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "contextId": "550e8400-e29b-41d4-a716-446655440000",
  "sourceId": "550e8400-e29b-41d4-a716-446655440000",
  "status": "PENDING_REVIEW",
  "candidates": [
    {
      "source": "text_layer",
      "fields": [
        { "canonicalKey": "amount", "value": "100.50", "confidence": 0.95, "page": 1 },
        { "canonicalKey": "date", "value": "2025-06-01", "confidence": 0.9, "page": 1 }
      ]
    }
  ],
  "version": 1,
  "createdAt": "2025-01-15T10:30:00Z",
  "updatedAt": "2025-01-15T10:30:00Z"
}
Each candidate declares the lane that produced it: text_layer (PDF text, higher trust) or vision (OCR/vision model, lower trust). Field values are verbatim tokens — money stays a string, never a parsed amount.

Approve or reject


Approve

Approving a PENDING_REVIEW review runs the single deterministic handoff into the normal ingestion pipeline (dedup + outbox + match-trigger) and links the resulting job to the review. This is the only path from an AI candidate to a reconciled transaction, and it runs only on explicit human approval.
curl -X POST "https://api.matcher.example.com/v1/imports/contexts/{contextId}/extraction-reviews/{reviewId}/approve" \
  -H "Authorization: Bearer $TOKEN"
{
  "reviewId": "550e8400-e29b-41d4-a716-446655440000",
  "ingestionJobId": "550e8400-e29b-41d4-a716-446655440000",
  "candidateCount": 12
}

Reject

Rejecting discards the candidates — nothing is ingested. The body is optional; an empty body is a valid “reject with no reason”.
curl -X POST "https://api.matcher.example.com/v1/imports/contexts/{contextId}/extraction-reviews/{reviewId}/reject" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "reason": "poor scan quality, re-upload" }'
The approving/rejecting principal is recorded for audit.

Mapping proposals


Before you declare a field map by hand, ask the advisor to inspect a representative sample and propose a config-only mapping. It is advisory and side-effect-free: producing a proposal persists nothing. You confirm the result through the existing field-map declaration path.
curl -X POST "https://api.matcher.example.com/v1/imports/contexts/{contextId}/sources/{sourceId}/mapping-proposal" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "sample": "id;value;ccy;posted_at\nA1;10,50;BRL;2025-06-01\n",
    "format": "csv",
    "hints": { "locale": "pt-BR", "has_header": "true" }
  }'
The response carries the proposed field map, source dialect, and a per-field breakdown with confidence and rationale:
{
  "mapping": { "amount": "value", "external_id": "id" },
  "dialect": {
    "encoding": "utf-8",
    "delimiter": "semicolon",
    "decimalStyle": "comma",
    "dateStyle": "iso"
  },
  "fields": [
    { "canonicalKey": "amount", "sourceColumn": "value", "confidence": 0.92, "rationale": "numeric column with comma decimal" }
  ]
}
The response never carries parsed values, amounts, or transactions.

Fetch from an external transport


Trigger a manual fetch-and-ingest that lists every object matching the supplied transport coordinates (SFTP today) and streams each into the trusted-content ingestion pipeline. The body carries connection coordinates plus an opaque credential reference — never a secret.
curl -X POST "https://api.matcher.example.com/v1/imports/contexts/{contextId}/sources/{sourceId}/fetch" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "kind": "sftp",
    "host": "sftp.bank.example",
    "port": 22,
    "path": "outbound/returns",
    "glob": "*.ret",
    "credentialRef": "cred-handle-123",
    "format": "br/cnab240/febraban-base"
  }'
The response (202 Accepted) returns a per-file outcome in fetch order. Per-file intake failures are reported without failing the batch:
{
  "files": [
    { "name": "statement-2025-06.ret", "ingestionJobId": "550e8400-...", "transactionCount": 42 }
  ]
}
A transport-level failure (endpoint unreachable or credential rejected) returns 503.

Inspect job errors


After an import, list the stored per-row parse/normalization errors for a job (capped at 100 per job) to explain failed or partially-failed imports.
curl -X GET "https://api.matcher.example.com/v1/imports/contexts/{contextId}/jobs/{jobId}/errors" \
  -H "Authorization: Bearer $TOKEN"
{
  "items": [ ... ],
  "totalErrors": 137,
  "storedErrors": 100,
  "errorCap": 100,
  "truncated": true
}
totalErrors is the uncapped failure total; truncated is true when it exceeds the stored (capped) set.

Response codes


StatusMeaning
200Review, list, mapping proposal, or job errors returned
202Document enqueued / fetch accepted
400Invalid input (empty body, bad status filter, invalid pagination)
403Tenant not opted into document extraction
404Review or job not found
409Invalid review state transition
422No candidates could be extracted / mapping sample rejected downstream
503Extraction, review, proposal, or fetch not enabled on this deployment