// Documentation

Qrynt Documentation

Qrynt takes unstructured or messy text — documents, exports, logs, pastes — and returns typed, structured output you can connect directly to a database, an AI pipeline, or a human review interface. This reference covers every concept, field, and output format.

New here? Start with the console — paste any text, process it, and see the output before writing a single line of code.
// How it works

The three-stage pipeline

Every request flows through three stages. Understanding this helps you know which endpoint to call and what to expect at each step.

POST /ingest Register text
POST /resolve Run engine
POST /export Get output

For file uploads, use POST /ingest/file — it runs all three stages in a single call and returns the full output immediately.

The engine detects structure in your text and classifies each piece into typed segments. You then export those segments in whichever format fits your use case: plain text, flat JSON records, or a Logical Document Model.


// Concepts

Output fields

Every segment in Qrynt's output carries the same set of fields regardless of its type. Here is what each one means and how to use it.

segment object

A segment is a coherent, self-contained chunk of your input text — a paragraph, a block of key-value pairs, a table, or a section header. Qrynt splits your input into segments before classifying each one independently. One document becomes an ordered list of segments.

// A document with 200 lines might produce 12 segments
segment_id string

A deterministic 16-character hex identifier for this segment, derived from its content. The same text always produces the same segment_id — useful for deduplication, caching, and stable foreign keys in a database.

"segment_id": "fc6cfd283de599e3"
confidence number 0.0 – 1.0

How certain the engine is about the structural classification of this segment. 1.0 means the engine is certain — the structure was unambiguous. Lower values (below 0.6) indicate ambiguity, often in mixed-content segments. Use this to decide whether to trust the classification automatically or route the segment to a human reviewer.

"confidence": 1 // certain → safe to process automatically
flags array

Anomalies or quality warnings detected on this segment during processing. An empty array [] means no issues were found. Populated flags might indicate encoding problems, truncated content, or unusual character distributions. Always check flags before storing segment content in a production database.

"flags": [] // no issues detected
section_label string | null

The inferred heading or section this segment belongs to, if one was detected in the document structure. null means no parent section was found — common in flat documents, logs, and chat exports. When present, use this to group related segments or build a document outline.

"section_label": null // flat document, no section detected
source_span object | null

Character offsets pointing back to where this segment appeared in the original input — {"start": N, "end": M}. Use this to highlight the original text, build diff views, or trace output back to source. null when the segment was synthesized or the source position is unavailable.

"source_span": null

classification_stats

Internal metrics from the classification engine. Useful for debugging unexpected classifications and understanding how confident the engine was in its structural decisions.

dominant_coverage_lines integer

The number of lines in this segment that matched the dominant structural pattern. For a 5-line KV block, this would be 5. A low number relative to the total segment length suggests mixed content or a weak structural signal.

"dominant_coverage_lines": 1
block_count integer

The number of structural blocks detected within this segment. A simple paragraph has 1 block. A segment with 3 separate key-value groups has 3 blocks. Higher counts indicate denser structural content.

"block_count": 1
dominant_source string

Which part of the engine made the final classification decision. Understanding this tells you how deeply the engine analysed this segment.

pre_classifier_bypass — the structure was clear enough that the engine bypassed deep analysis entirely. Fast, high-confidence classifications always carry this source.

structure_engine — the full table detector ran on this segment.

block_engine — KV, hierarchy, and context detectors ran.

fallback_engine — no strong structure was found; context-only detection was used.

none — classification failed; segment treated as prose.

"dominant_source": "pre_classifier_bypass"

// Structure types

What type means

Every segment is classified into one of five structural types. The type tells you what shape the content is in — which determines how you should process, store, or display it.

prose
Natural language

Continuous written text — sentences, paragraphs, explanations, chat messages, narrative content. No detectable structural pattern. Store as text; pass directly to an LLM without pre-processing.

→ "Each row corresponds to one object."
kv_block
Key-value pairs

Structured data in key: value or key = value format. Config files, form submissions, metadata blocks, frontmatter. Map directly to database columns or a dictionary/object.

→ name: Alice / age: 30 / role: admin
table
Tabular / columnar data

Rows and columns — pipe-delimited, space-aligned, or CSV-formatted. Each row is a record; each column is a field. Map to a database table, DataFrame, or spreadsheet directly.

→ | Name | Score | Grade |
tree
Hierarchical / nested

Indented or branching structure — file trees, org charts, nested lists, outline formats. Parent-child relationships are explicit through indentation or tree-drawing characters. Map to a nested JSON object or a recursive DB schema.

→ root / ├── src / │ └── main.py
mixed
Multiple structure types

A segment where more than one structural pattern was detected — e.g. a paragraph followed immediately by a KV block. Check confidence and block_count to understand what's inside; consider splitting or routing to a human reviewer.

→ confidence often < 0.8 for mixed segments

// Adapters

Three output formats

Qrynt's engine produces one internal representation. Adapters transform that into the format that fits your use case. Choose one — or request all three for the same document.

HumanAdapter

Plain text output

Use when: you want a clean, readable version of the document — for display, review, or downloading. No JSON, no schema. Just structured plain text.

Returns a formatted string. Segments are separated by blank lines. Structural markers (headings, section breaks) are preserved as readable text.

endpoint
POST /export  { "session_id": "...", "format": "human" }
POST /ingest/file/human
MachineAdapter

Flat JSON records

Use when: you're storing output in a database, passing it through a pipeline, or processing it programmatically. Each segment becomes a flat, self-contained JSON record.
endpoint
POST /export  { "session_id": "...", "format": "machine" }
POST /ingest/file/machine
example output — MachineAdapter
[
  {
    "segment_id": "fc6cfd283de599e3",
    "type": "prose",
    "content": {
      "text": "In this matrix or 2 dimensional array the first row
      is data of one object where each value is its relation
      to other objects or features."
    },
    "confidence": 1,
    "flags": [],
    "section_label": null,
    "source_span": null,
    "classification_stats": {
      "dominant_coverage_lines": 1,
      "block_count": 1,
      "dominant_source": "pre_classifier_bypass"
    }
  },
  // ... more segments
]

Field reference

Field Type Description
segment_id string Deterministic hex ID. Stable across identical inputs. Use as a primary key.
type string Structural classification: prose · kv_block · table · tree · mixed
content.text string The cleaned, normalized text content of this segment.
confidence number Classification certainty 0.0–1.0. Below 0.6 → consider human review.
flags array Quality warnings. Empty array means no issues.
section_label string|null Inferred parent section heading, or null if none detected.
source_span object|null Character offsets {start, end} in the original input.
classification_stats object Engine internals: coverage lines, block count, classification source.
Database tip: Use segment_id as your primary key and type as a column for filtering. Store content.text in a text/varchar field. Index on section_label if you need section-based queries.
AIAdapter / LDM

Logical Document Model

Use when: you're passing document content to an LLM, building a RAG pipeline, or need a structured representation that preserves semantic relationships between blocks. The LDM is designed for machine consumption, not human reading.

The LDM uses blocks instead of segments, and adds tags — semantic labels that describe both the structural type and the content pattern. Blocks are ordered as they appeared in the original document.

endpoint
POST /export  { "session_id": "...", "format": "ai" }
POST /ingest/file/ai
example output — AIAdapter / LDM
{
  "blocks": [
    {
      "type": "prose",
      "label": null,
      "content": {
        "text": "In this matrix or 2 dimensional array the first row
        is data of one object where each value is its relation
        to other objects or features."
      },
      "tags": [
        "PROSE",
        "UNSTRUCTURED"
      ],
      "confidence": 1
    },
    {
      "type": "prose",
      "label": null,
      "content": {
        "text": "Each row corresponds to one object (or entity,
        or data sample)."
      },
      "tags": [
        "PROSE",
        "UNSTRUCTURED"
      ],
      "confidence": 1
    }
    // ... more blocks
  ]
}

LDM field reference

Field Type Description
type string Structural type: prose · kv_block · table · tree · mixed
label string|null Semantic label inferred from content (e.g. "Introduction", "Config"). null when none detected.
content.text string The block's text content, ready to embed or inject into a prompt.
tags array Semantic labels: PROSE · UNSTRUCTURED · STRUCTURED · KV · TABLE · TREE · MIXED. Multiple tags can apply.
confidence number Classification certainty 0.0–1.0. Use to filter blocks before passing to an LLM.
LLM tip: Filter blocks by confidence >= 0.8 before injecting into a prompt. Use tags to route blocks to different prompt templates — prose blocks go into a summarisation prompt, kv_block blocks go into an extraction prompt, table blocks get serialised to markdown before injection.

MachineAdapter vs AIAdapter — when to use which

Scenario Use
Store segments in PostgreSQL / MongoDB MachineAdapter
Feed content into an LLM prompt AIAdapter
Build a RAG retrieval index AIAdapter
Export to CSV / data pipeline MachineAdapter
Display cleaned document to a user HumanAdapter
Detect and route by structure type Either — type field is the same in both

// API reference

Endpoints

All endpoints require an X-API-Key header for authenticated tiers. Free tier requests (no key) are limited to 10 per day per IP. Get your key at dashboard.

POST

/ingest

Register a text payload for processing. Returns a session_id — pass this to /resolve next.

request
{
  "payload": "your text here"
}
response
{
  "session_id": "fc6cfd283de599e3",
  "status": "created",
  "char_count": 142,
  "line_count": 4
}
POST

/resolve

Run the engine on a registered session. Returns the full envelope with all segments classified. Processing time scales with document size.

request
{
  "session_id": "fc6cfd283de599e3"
}
POST

/export

Export a resolved session in your chosen format. Call after /resolve completes.

request
{
  "session_id": "fc6cfd283de599e3",
  "format": "machine"  // "human" | "machine" | "ai"
}
POST

/ingest/file

Upload a file directly — PDF, DOCX, XLSX, HTML, CSV, or plain text. Runs all three pipeline stages in a single request and returns the full COC envelope immediately. Supported formats: .pdf .docx .xlsx .html .csv .txt — max 50MB.

request — multipart/form-data
curl -X POST https://moonlit-grail-386316.web.app/ingest/file \
  -H "X-API-Key: qrynt_live_..." \
  -F "file=@document.pdf"
Request units: File uploads consume ceil(characters / 100,000) units from your daily limit — a 300k character document costs 3 units. Check your remaining units in the dashboard.