Qrynt Documentation
Qrynt takes unstructured or messy text — documents, exports, logs, pastes — and returns typed, structured output you can connect directly to a database, an AI pipeline, or a human review interface. This reference covers every concept, field, and output format.
The three-stage pipeline
Every request flows through three stages. Understanding this helps you know which endpoint to call and what to expect at each step.
For file uploads, use POST /ingest/file —
it runs all three stages in a single call and returns the full output immediately.
The engine detects structure in your text and classifies each piece into typed segments. You then export those segments in whichever format fits your use case: plain text, flat JSON records, or a Logical Document Model.
Output fields
Every segment in Qrynt's output carries the same set of fields regardless of its type. Here is what each one means and how to use it.
A segment is a coherent, self-contained chunk of your input text — a paragraph, a block of key-value pairs, a table, or a section header. Qrynt splits your input into segments before classifying each one independently. One document becomes an ordered list of segments.
A deterministic 16-character hex identifier for this segment, derived
from its content. The same text always produces the same
segment_id — useful
for deduplication, caching, and stable foreign keys in a database.
How certain the engine is about the structural classification of this segment. 1.0 means the engine is certain — the structure was unambiguous. Lower values (below 0.6) indicate ambiguity, often in mixed-content segments. Use this to decide whether to trust the classification automatically or route the segment to a human reviewer.
Anomalies or quality warnings detected on this segment during processing.
An empty array [] means
no issues were found. Populated flags might indicate encoding problems,
truncated content, or unusual character distributions. Always check flags
before storing segment content in a production database.
The inferred heading or section this segment belongs to, if one was
detected in the document structure. null
means no parent section was found — common in flat documents, logs,
and chat exports. When present, use this to group related segments
or build a document outline.
Character offsets pointing back to where this segment appeared in
the original input — {"start": N, "end": M}.
Use this to highlight the original text, build diff views, or
trace output back to source. null
when the segment was synthesized or the source position is unavailable.
classification_stats
Internal metrics from the classification engine. Useful for debugging unexpected classifications and understanding how confident the engine was in its structural decisions.
The number of lines in this segment that matched the dominant structural pattern. For a 5-line KV block, this would be 5. A low number relative to the total segment length suggests mixed content or a weak structural signal.
The number of structural blocks detected within this segment. A simple paragraph has 1 block. A segment with 3 separate key-value groups has 3 blocks. Higher counts indicate denser structural content.
Which part of the engine made the final classification decision. Understanding this tells you how deeply the engine analysed this segment.
pre_classifier_bypass —
the structure was clear enough that the engine bypassed deep analysis entirely.
Fast, high-confidence classifications always carry this source.
structure_engine —
the full table detector ran on this segment.
block_engine —
KV, hierarchy, and context detectors ran.
fallback_engine —
no strong structure was found; context-only detection was used.
none —
classification failed; segment treated as prose.
What type means
Every segment is classified into one of five structural types. The type tells you what shape the content is in — which determines how you should process, store, or display it.
Continuous written text — sentences, paragraphs, explanations, chat messages, narrative content. No detectable structural pattern. Store as text; pass directly to an LLM without pre-processing.
Structured data in key: value
or key = value format.
Config files, form submissions, metadata blocks, frontmatter.
Map directly to database columns or a dictionary/object.
Rows and columns — pipe-delimited, space-aligned, or CSV-formatted. Each row is a record; each column is a field. Map to a database table, DataFrame, or spreadsheet directly.
Indented or branching structure — file trees, org charts, nested lists, outline formats. Parent-child relationships are explicit through indentation or tree-drawing characters. Map to a nested JSON object or a recursive DB schema.
A segment where more than one structural pattern was detected —
e.g. a paragraph followed immediately by a KV block.
Check confidence
and block_count
to understand what's inside; consider splitting or routing
to a human reviewer.
Three output formats
Qrynt's engine produces one internal representation. Adapters transform that into the format that fits your use case. Choose one — or request all three for the same document.
Plain text output
Returns a formatted string. Segments are separated by blank lines. Structural markers (headings, section breaks) are preserved as readable text.
POST /export { "session_id": "...", "format": "human" }
POST /ingest/file/human
Flat JSON records
POST /export { "session_id": "...", "format": "machine" }
POST /ingest/file/machine
[
{
"segment_id": "fc6cfd283de599e3",
"type": "prose",
"content": {
"text": "In this matrix or 2 dimensional array the first row
is data of one object where each value is its relation
to other objects or features."
},
"confidence": 1,
"flags": [],
"section_label": null,
"source_span": null,
"classification_stats": {
"dominant_coverage_lines": 1,
"block_count": 1,
"dominant_source": "pre_classifier_bypass"
}
},
// ... more segments
]
Field reference
| Field | Type | Description |
|---|---|---|
| segment_id | string | Deterministic hex ID. Stable across identical inputs. Use as a primary key. |
| type | string | Structural classification: prose · kv_block · table · tree · mixed |
| content.text | string | The cleaned, normalized text content of this segment. |
| confidence | number | Classification certainty 0.0–1.0. Below 0.6 → consider human review. |
| flags | array | Quality warnings. Empty array means no issues. |
| section_label | string|null | Inferred parent section heading, or null if none detected. |
| source_span | object|null | Character offsets {start, end} in the original input. |
| classification_stats | object | Engine internals: coverage lines, block count, classification source. |
segment_id
as your primary key and type
as a column for filtering. Store content.text
in a text/varchar field. Index on section_label
if you need section-based queries.
Logical Document Model
The LDM uses blocks instead of segments, and adds tags — semantic labels that describe both the structural type and the content pattern. Blocks are ordered as they appeared in the original document.
POST /export { "session_id": "...", "format": "ai" }
POST /ingest/file/ai
{
"blocks": [
{
"type": "prose",
"label": null,
"content": {
"text": "In this matrix or 2 dimensional array the first row
is data of one object where each value is its relation
to other objects or features."
},
"tags": [
"PROSE",
"UNSTRUCTURED"
],
"confidence": 1
},
{
"type": "prose",
"label": null,
"content": {
"text": "Each row corresponds to one object (or entity,
or data sample)."
},
"tags": [
"PROSE",
"UNSTRUCTURED"
],
"confidence": 1
}
// ... more blocks
]
}
LDM field reference
| Field | Type | Description |
|---|---|---|
| type | string | Structural type: prose · kv_block · table · tree · mixed |
| label | string|null | Semantic label inferred from content (e.g. "Introduction", "Config"). null when none detected. |
| content.text | string | The block's text content, ready to embed or inject into a prompt. |
| tags | array | Semantic labels: PROSE · UNSTRUCTURED · STRUCTURED · KV · TABLE · TREE · MIXED. Multiple tags can apply. |
| confidence | number | Classification certainty 0.0–1.0. Use to filter blocks before passing to an LLM. |
confidence >= 0.8
before injecting into a prompt. Use
tags to route blocks
to different prompt templates — prose blocks go into a summarisation prompt,
kv_block blocks go into an extraction prompt, table blocks get serialised
to markdown before injection.
MachineAdapter vs AIAdapter — when to use which
| Scenario | Use |
|---|---|
| Store segments in PostgreSQL / MongoDB | MachineAdapter |
| Feed content into an LLM prompt | AIAdapter |
| Build a RAG retrieval index | AIAdapter |
| Export to CSV / data pipeline | MachineAdapter |
| Display cleaned document to a user | HumanAdapter |
| Detect and route by structure type | Either — type field is the same in both |
Endpoints
All endpoints require an X-API-Key
header for authenticated tiers. Free tier requests (no key) are limited
to 10 per day per IP. Get your key at
dashboard.
/ingest
Register a text payload for processing. Returns a
session_id
— pass this to /resolve next.
{
"payload": "your text here"
}
{
"session_id": "fc6cfd283de599e3",
"status": "created",
"char_count": 142,
"line_count": 4
}
/resolve
Run the engine on a registered session. Returns the full envelope with all segments classified. Processing time scales with document size.
{
"session_id": "fc6cfd283de599e3"
}
/export
Export a resolved session in your chosen format.
Call after /resolve completes.
{
"session_id": "fc6cfd283de599e3",
"format": "machine" // "human" | "machine" | "ai"
}
/ingest/file
Upload a file directly — PDF, DOCX, XLSX, HTML, CSV, or plain text.
Runs all three pipeline stages in a single request and returns
the full COC envelope immediately.
Supported formats: .pdf .docx .xlsx .html .csv .txt
— max 50MB.
curl -X POST https://moonlit-grail-386316.web.app/ingest/file \ -H "X-API-Key: qrynt_live_..." \ -F "file=@document.pdf"
ceil(characters / 100,000)
units from your daily limit — a 300k character document costs 3 units.
Check your remaining units in the
dashboard.