Redaction API + Document Intelligence API
Parse, extract, and redact sensitive documents for AI workflows
Expunct gives teams two connected pillars in one platform: redaction across supported formats, and privacy-first document intelligence for PDF and DOCX. Use safe-parse to sanitize documents before indexing, retrieval, or extraction.
One workflow for privacy-first ingestion
Keep the install story simple. The same API, SDKs, CLI, MCP, and LangChain surfaces route into the same parse, extract, and redaction platform.
from expunct import Expunct
client = Expunct(api_key="EXPUNCT_API_KEY")
# Parse and sanitize before indexing
job = client.documents.safe_parse("customer_contract.pdf", language="en")
completed = client.wait_for_document_job(job.id)
# sanitized artifacts are ready for storage or RAGOne company story, two product pillars
Lead with parse, extract, and redact. Keep safe-parse as the hero workflow for sensitive AI ingestion, not a separate product.
Redaction API
Detect and redact sensitive data across text, PDF, image, audio, and video workflows without splitting your stack.
Document Intelligence API
Parse and extract PDF and DOCX into structured artifacts, markdown, chunks, and schema-shaped outputs for downstream AI systems.
Sensitive Workflows
Use safe-parse to run parse plus sanitization in one workflow, so your vector store, retrieval layer, or extraction pipeline only sees the cleaned artifacts you intend to keep.
Built for privacy-first document pipelines
Expunct fits between your source documents and downstream AI systems, so you can choose when raw data is processed and when sanitized artifacts are the only thing that leaves the boundary.
Built for launch-worthy use cases
Start with the workflows that match the launch wedge: privacy-first document intelligence for AI ingestion, extraction, and sensitive data handling.
Safe RAG ingestion
Parse documents into chunks and citations, then use safe-parse when your retrieval pipeline cannot keep raw PII at rest.
Schema-driven extraction
Extract invoice, form, and document fields into structured outputs without creating a separate parser stack upstream.
Prompt and log sanitization
Keep sensitive values out of prompts, traces, and observability systems with the same redaction engine behind document workflows.
Multi-format redaction
Continue protecting sensitive data across text, PDF, image, audio, and video while document intelligence launches on PDF and DOCX first.
Ship one product story across every integration surface
Use the API first, then reach for the Python SDK, Node SDK, CLI, MCP server, or LangChain integration when it helps adoption. They are distribution surfaces for the same platform, not separate products.
Start with redaction, expand into document intelligence
The free plan gets teams started on redaction today. Document Intelligence beta is available on enabled paid plans for PDF and DOCX workflows.