Skip to main content

Runtime Core

Runtime Core is the serving and local inference layer.

It loads a runtime bundle, validates manifests and checksums, selects the route for a request, and either performs inference or returns a dry-run routing decision.

Responsibilities

validate runtime bundles before use
normalize chat-style inference requests
select task and semantic-family routes
load the base model plus selected adapters for real inference
expose the same core logic through CLI inference and the compatibility server

Inputs

one runtime bundle
prompt or OpenAI-style request payload
optional overrides such as task_type, semantic_family, and slm_override

Outputs

validation status for validate-runtime
dry-run route details for infer --dry-run
generated completion payloads for real inference or server requests

The compatibility server is non-streaming and intentionally thin in v0.1.

Responsibilities
Inputs
Outputs