This fork is doing risky runtime work: KV cache mutation, compacted-prefix execution, server endpoints, and long-session state handling. The public QA story needs to show how that work is reviewed and tested, not just that the benchmark numbers look good.
docs/BUGS-AND-FIXES.md.docs/HIGHLIGHTS.md.Primary QA sources: docs/HIGHLIGHTS.md, docs/BUGS-AND-FIXES.md, and docs/UPSTREAM-SYNC.md.
The review posture here is not “does the code look reasonable?” It is “assume there is a bug and try to find it.” That matters for compaction because the dangerous failures are not always obvious. They often look like silent context loss, wrong counts in metrics, or a crash that appears one request after a reclaim step.
| Review area | Why it matters |
|---|---|
| Scope and contract checks | Verify the implementation matches the declared slice and does not drift across repo or API boundaries. |
| Concrete execution traces | Force the reviewer to walk production, boundary, adversarial, security, and concurrency traces with real values. |
| State-machine review | Check before / during / after compaction and reclaim, especially for prefix matching and save/restore state. |
| Disprove-it pass | Try to break indexing, layout assumptions, integer math, and memory ownership instead of re-affirming the happy path. |
| Cross-repo contract review | Server endpoints, metrics, and runtime capability reporting have to match the fork and downstream consumers. |
docs/HIGHLIGHTS.md describes eight engine test tiers. The important point is that this is not a single benchmark script. The QA stack spans API behavior, performance regression, quality gates, Windows builds, and live reporting.
| Layer | Purpose |
|---|---|
| C++ and server tests | Catch local runtime regressions early and keep the main code paths shippable. |
| API and contract checks | Keep endpoints, metrics, and capability surfaces aligned with the implementation. |
| Performance regression checks | Watch decode speed, memory, and compaction latency so fixes do not quietly degrade hot paths. |
| Quality gates | Use cosine and related checks to keep compaction quality inside a defensible envelope. |
| Platform coverage | Include Windows and the published Apple Silicon path, not just a single local machine. |
| Live dashboards and reporting | Make benchmark drift visible instead of hiding it in ad hoc local runs. |
The repo keeps a focused set of active workflows for build, server smoke, performance, Windows validation, dashboard/reporting, and upstream maintenance. Exact workflow composition can evolve, but the public principle is stable: upstream changes and runtime changes do not reach the main branch without automated gates.
| Workflow | Role |
|---|---|
modelai-ci | Core build and test gate. |
modelai-server-smoke | Server-path smoke coverage. |
modelai-perf-smoke | Performance regression guard. |
modelai-ci-windows | Windows MSVC validation. |
modelai-dashboard | Benchmark aggregation/reporting path. |
modelai-upstream-sync | Scheduled upstream merge flow with gating. |
modelai-auto-label | Repository hygiene and routing support. |
The fork is not maintained as a dead snapshot. docs/UPSTREAM-SYNC.md documents a weekly upstream sync, with CI and manual review gates when the upstream delta touches sensitive areas such as KV cache management, graph construction, or server behavior.
modelai-main.Compaction can fail in ways users do not immediately see. A bad reclaim step can look like amnesia one request later. A bad state counter can make metrics lie about what is active. A host/device mismatch can turn into decode stalls at longer contexts. That is why the QA story here needs to be operational, not decorative.
| Evidence | Where it is documented |
|---|---|
| 29 distinct bug fixes | docs/BUGS-AND-FIXES.md |
| Benchmark summary and test counts | docs/HIGHLIGHTS.md |
| Sync workflow and manual gate | docs/UPSTREAM-SYNC.md |