Redaction API + Document Intelligence API

Parse, extract, and redact sensitive documents for AI workflows

Expunct gives teams two connected pillars in one platform: redaction across supported formats, and privacy-first document intelligence for PDF and DOCX. Use safe-parse to sanitize documents before indexing, retrieval, or extraction.

redaction API
document intelligence beta
safe-parse workflow
AI ingestion
PDF + DOCX

One workflow for privacy-first ingestion

Keep the install story simple. The same API, SDKs, CLI, MCP, and LangChain surfaces route into the same parse, extract, and redaction platform.

python
from expunct import Expunct

client = Expunct(api_key="EXPUNCT_API_KEY")

# Parse and sanitize before indexing
job = client.documents.safe_parse("customer_contract.pdf", language="en")

completed = client.wait_for_document_job(job.id)
# sanitized artifacts are ready for storage or RAG

One company story, two product pillars

Lead with parse, extract, and redact. Keep safe-parse as the hero workflow for sensitive AI ingestion, not a separate product.

Redaction API

Detect and redact sensitive data across text, PDF, image, audio, and video workflows without splitting your stack.

Document Intelligence API

Parse and extract PDF and DOCX into structured artifacts, markdown, chunks, and schema-shaped outputs for downstream AI systems.

Sensitive Workflows

Use safe-parse to run parse plus sanitization in one workflow, so your vector store, retrieval layer, or extraction pipeline only sees the cleaned artifacts you intend to keep.

Built for privacy-first document pipelines

Expunct fits between your source documents and downstream AI systems, so you can choose when raw data is processed and when sanitized artifacts are the only thing that leaves the boundary.

Source docs, prompts, and raw inputs
Expunct parse, extract, redact, and safe-parse workflows
LLMs · Vector stores · Internal tools · Audit-ready outputs

Built for launch-worthy use cases

Start with the workflows that match the launch wedge: privacy-first document intelligence for AI ingestion, extraction, and sensitive data handling.

Safe RAG ingestion

Parse documents into chunks and citations, then use safe-parse when your retrieval pipeline cannot keep raw PII at rest.

Schema-driven extraction

Extract invoice, form, and document fields into structured outputs without creating a separate parser stack upstream.

Prompt and log sanitization

Keep sensitive values out of prompts, traces, and observability systems with the same redaction engine behind document workflows.

Multi-format redaction

Continue protecting sensitive data across text, PDF, image, audio, and video while document intelligence launches on PDF and DOCX first.

2
product pillars
3
document workflows: parse, extract, safe-parse
5
redaction media surfaces: text, PDF, image, audio, video

Ship one product story across every integration surface

Use the API first, then reach for the Python SDK, Node SDK, CLI, MCP server, or LangChain integration when it helps adoption. They are distribution surfaces for the same platform, not separate products.

Python + Node SDKs
CLI
MCP + LangChain
Image redaction
Video redaction
Audio redaction

Start with redaction, expand into document intelligence

The free plan gets teams started on redaction today. Document Intelligence beta is available on enabled paid plans for PDF and DOCX workflows.