Product

Vertical Training Compiler

Make training cheaper by compiling vertical-specific data into stable spines (schemas, policies, formats) and minimal leaf deltas. The compiler runs in your environment. The SaaS control plane stores configs, versions, and metrics—no raw data required.

Default: local compilation, metrics-only uploads, hash-based provenance.

What it does

A compiler, not a data marketplace. Use your data, keep it private.

Canonicalize

Normalize templates, tool calls, and volatile fields to reduce variance and maximize reuse and packing efficiency.

Extract spines

Pull stable vertical context into versioned spines (policies, schemas, output formats) with deterministic hashing.

Emit leaves

Write compact, minimal deltas as standard datasets (JSONL/Parquet) and produce metrics that quantify savings.

Outputs

Deliverables you can feed into your existing training stack.

Compiled dataset

  • spine_id + leaf_delta per example
  • deterministic hashes for reproducibility
  • compatible with HF Trainer / custom PyTorch

Metrics + provenance

  • token reduction and template reuse rate
  • variance reduction indicators
  • hash-only provenance (no raw text required)

Why it matters

Vertical data is naturally repetitive. The compiler makes that structure explicit.

Fewer tokens

Reduce repeated boilerplate and stabilize templates so you train on less text for the same behavior.

Fewer steps

Lower variance and clearer structure reduce gradient noise, often improving convergence.

Reusable assets

Versioned spines become long-lived organizational assets for both training and inference.

Compile a vertical in days, not months

Start with one workflow family (support, ops, legal, finance). Measure token reduction and reuse structure immediately.