Skip to main content

The local SLM engine alternative to LLM service handlers.

SLMCortex lets you compose focused small brains into an extensible local runtime, validate them before execution, and avoid turning every coding workflow into a hosted LLM bill.

slmcortex demodry run first
$ python scripts/run_slmcortex_demo.py

package python_slm
package debugging_slm
compose runtime/
validate-runtime runtime/
infer --dry-run
agent run --dry-run

outputs:
  python_slm/
  debugging_slm/
  runtime/
  agent-trace.json
The package is the unit of distribution.The engine is composed from focused SLMs.The runtime stays local and inspectable.

One explicit path from small brain to local engine

The landing path mirrors the actual product path in the source repo: package focused capabilities, compose a runtime bundle, validate it, then serve local inference or run the bounded agent workflow.

01

Package a small brain

Wrap a focused LoRA adapter with provenance, protected inputs, evaluation data, fingerprints, and routing metadata.

02

Compose the engine

Combine validated SLM packages into an extensible runtime bundle without mutating the source assets.

03

Validate before running

Check package and runtime structure before inference, serving, or agent behavior touches a local repository.

04

Serve locally

Use the same Runtime Core for dry-run routing, model-backed inference, a compatibility server, and bounded agent runs.

Why this beats another hosted handler

The point is not to claim better models. The point is to make local coding capabilities cheaper to evaluate, easier to inspect, and reliable enough to run through bounded control flow.

Common friction

Paid LLM services can turn every coding workflow into a metered remote dependency.

SLMCortex path

SLMCortex keeps focused SLM capabilities local, packaged, and inspectable before they run.

Common friction

One large general model is often used where a smaller, focused capability would be enough.

SLMCortex path

Compose small brains into a runtime bundle and extend the engine one capability at a time.

Common friction

Hosted API bills can climb before the workflow is even proven reliable.

SLMCortex path

Start with dry-run validation, then move to local inference when your backend and model setup are ready.

Proof you can inspect today

Start with dry-run validation. Move to real model-backed inference only when your local backend and model setup are ready.

See CLI commands

Python 3.11+

The documented setup starts from a standard virtual environment and editable install.

No-model demo

Packages checked-in adapters, composes a runtime, validates it, and runs dry-run inference and agent flow.

Backend choices

MLX is used on Apple Silicon; GGUF covers Linux, Windows, macOS Intel, and explicit GGUF use.

Minimal server

A non-streaming OpenAI-compatible compatibility server is available for local runtime experiments.

Bounded agent

The v0.1 agent is local and single-run, with writes controlled by flags rather than hidden background behavior.

Honest v0.1 boundaries

SLMCortex is useful to evaluate because its limits are visible. The current release is a narrow local path, not a broad production-agent platform.

  • Local, single-run execution only.
  • Bounded tool loop, not a full IDE agent.
  • Real inference requires local backend and model setup.
  • No benchmark, model-quality, or production-readiness claims.

Choose the next technical path

Use the docs to verify the demo, inspect commands, or read the runtime architecture.

Inspect source

Quickstart

Install SLMCortex and run the fastest no-model validation path.

Command Reference

See the packaging, routing, runtime, serving, and agent commands.

Architecture

Understand Factory, Composer, Runtime Core, and Agent Runtime.