QWOPUS 3.6
35B-A3B CODER
A thinking-OFF agentic coder that does MORE with LESS.
35B TOTAL 3B ACTIVE SPARSE MoE
Tuned for Execution, Not Overthinking.
It reads files, picks tools, edits code, runs tests, reacts to errors, and ships — with fewer tokens, lower latency, and steadier behavior across long agent loops.
35B / 3B MoE
Sparse mixture-of-experts → fast local inference
NextN / MTP Head
Self-speculative decoding, ~250 tok/s on ONE GPU
Agent-Harness Native
Codex / OpenHands / Claude Code / OpenCode loops
Plot Twist: It's Better with Thinking **OFF**.
Across a held-out behavioral + long-horizon battery, thinking-OFF was best-or-tied on 9 of 11 axes.
0
THINKING HELPED
0
NO-OP
0
HURT
Thinking helps on DECISIONS & RECALL. It hurts on PRODUCTION.
SWE-Bench Verified — Thinking Off
0
RESOLVED OF SUBMITTED PATCHES · SANS ERRORS · 171 / 274 · slice 0:300
0
RESOLVED / SUBMITTED
0
NON-EMPTY PATCH RATE
1/3 to 1/10 the Tokens. 100% Completion.
OFF
~323 / ~966 tok/turn
ON
~1,023 / ~9,991 tok/turn
THINKING OFF
Behavioral~323 tok/turn
Long-horizon~966 tok/turn
Empty/truncated0%
THINKING ON
Behavioral~1,023 tok/turn
Long-horizon~9,991 tok/turn
Hardest tasksDELIVERED NOTHING
Head-to-Head: A Wash — Qwopus Edges the Coding, Far Cheaper.
Qwopus OFF
Ornith ON
Standout edge: clean compliance — no over-gating, no needless permission-asking.
Coder Training Sharpens the Direct Policy — And Leaves Reasoning Ungrounded.
1. More Coder Training
The no-think pathway gets sharper & more reliable
2. Reasoning Drift
The reasoning channel isn't outcome-grounded → decouples & drifts
3. Long-Horizon Compounding
Harmless on one-shot decisions, but COMPOUNDS into thrash, fixation, and non-delivery
The fix isn't more reasoning-trace SFT — it's outcome-grounded RL.
PROOF: IT SHIPS.
It built AETHER DOMINION — a complete, playable single-file sci-fi RTS — entirely through an OpenCode agentic loop, thinking-off, on a local RTX 5090.
Fog of War Dual-Track Enemy AI Worker Economy Energy-Beam Capital Ships Hand-Rendered Alien Planet
…and it built THIS deck too.
Run It Yourself.
SERVE
llama.cpp `llama-server`, GGUF Q5_K_M
`--spec-type draft-mtp`
`--reasoning off`
DRIVE
OpenCode → local OpenAI-compatible endpoint
temp 1.0 / top_p 0.95
`--pure`
Keep thinking OFF. If you must reason, gate it to single decisions — never a hard 2048 cap.
DOES MORE. THINKS LESS.
QWOPUS 3.6 · 35B-A3B CODER