Product
Vertical Training Compiler
Make training cheaper by compiling vertical-specific data into stable spines (schemas, policies, formats) and minimal leaf deltas. The compiler runs in your environment. The SaaS control plane stores configs, versions, and metrics—no raw data required.
Default: local compilation, metrics-only uploads, hash-based provenance.
What it does
A compiler, not a data marketplace. Use your data, keep it private.
Canonicalize
Normalize templates, tool calls, and volatile fields to reduce variance and maximize reuse and packing efficiency.
Extract spines
Pull stable vertical context into versioned spines (policies, schemas, output formats) with deterministic hashing.
Emit leaves
Write compact, minimal deltas as standard datasets (JSONL/Parquet) and produce metrics that quantify savings.
Outputs
Deliverables you can feed into your existing training stack.
Compiled dataset
- —spine_id + leaf_delta per example
- —deterministic hashes for reproducibility
- —compatible with HF Trainer / custom PyTorch
Metrics + provenance
- —token reduction and template reuse rate
- —variance reduction indicators
- —hash-only provenance (no raw text required)
Why it matters
Vertical data is naturally repetitive. The compiler makes that structure explicit.
Fewer tokens
Reduce repeated boilerplate and stabilize templates so you train on less text for the same behavior.
Fewer steps
Lower variance and clearer structure reduce gradient noise, often improving convergence.
Reusable assets
Versioned spines become long-lived organizational assets for both training and inference.
Compile a vertical in days, not months
Start with one workflow family (support, ops, legal, finance). Measure token reduction and reuse structure immediately.