Qwopus 3.6 35B-A3B Coder

QWOPUS 3.6

35B-A3B CODER

A thinking-OFF agentic coder that does MORE with LESS.

35B TOTAL 3B ACTIVE SPARSE MoE

This entire deck was built by the model itself — in OpenCode, thinking-off.

Tuned for Execution, Not Overthinking.

It reads files, picks tools, edits code, runs tests, reacts to errors, and ships — with fewer tokens, lower latency, and steadier behavior across long agent loops.

35B / 3B MoE

Sparse mixture-of-experts → fast local inference

NextN / MTP Head

Self-speculative decoding, ~250 tok/s on ONE GPU

Agent-Harness Native

Codex / OpenHands / Claude Code / OpenCode loops

Plot Twist: It's Better with Thinking **OFF**.

Across a held-out behavioral + long-horizon battery, thinking-OFF was best-or-tied on 9 of 11 axes.

0

THINKING HELPED

0

NO-OP

0

HURT

SWE-Bench Verified — Thinking Off

0

RESOLVED OF SUBMITTED PATCHES · SANS ERRORS · 171 / 274 · slice 0:300

0

RESOLVED / SUBMITTED

0

NON-EMPTY PATCH RATE

Single RTX 5090 · llama.cpp + MTP · ~250 tok/s

1/3 to 1/10 the Tokens. 100% Completion.

OFF

~323 / ~966 tok/turn

ON

~1,023 / ~9,991 tok/turn

THINKING OFF

Behavioral~323 tok/turn

Long-horizon~966 tok/turn

Empty/truncated0%

THINKING ON

Behavioral~1,023 tok/turn

Long-horizon~9,991 tok/turn

Hardest tasksDELIVERED NOTHING

For a many-turn agent, that cost compounds into real latency and dollars.

Head-to-Head: A Wash — Qwopus Edges the Coding, Far Cheaper.

Qwopus OFF

Ornith ON

Coder Training Sharpens the Direct Policy — And Leaves Reasoning Ungrounded.

1. More Coder Training

The no-think pathway gets sharper & more reliable

→

2. Reasoning Drift

The reasoning channel isn't outcome-grounded → decouples & drifts

→

3. Long-Horizon Compounding

Harmless on one-shot decisions, but COMPOUNDS into thrash, fixation, and non-delivery

PROOF: IT SHIPS.

It built AETHER DOMINION — a complete, playable single-file sci-fi RTS — entirely through an OpenCode agentic loop, thinking-off, on a local RTX 5090.

▶ PLAY THE GAME

Fog of War Dual-Track Enemy AI Worker Economy Energy-Beam Capital Ships Hand-Rendered Alien Planet

Run It Yourself.

SERVE

llama.cpp `llama-server`, GGUF Q5_K_M
`--spec-type draft-mtp`
`--reasoning off`

DRIVE

OpenCode → local OpenAI-compatible endpoint
temp 1.0 / top_p 0.95
`--pure`

DOES MORE. THINKS LESS.

QWOPUS 3.6 · 35B-A3B CODER

Model — Jackrong ↗ Hardware & live testing — Kyle Hessling @KyleHessling1 ↗ Benchmarks — Tom Turney @no_stp_on_snek ↗ Unsloth + Qwen

This entire presentation was generated by the model, in OpenCode, with thinking disabled.