<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Subagent Swarm</title><link>https://swarm-hermanity-ee0561.pages.catalystgroup.tech/</link><description>Recent content on Subagent Swarm</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Sat, 04 Jul 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://swarm-hermanity-ee0561.pages.catalystgroup.tech/index.xml" rel="self" type="application/rss+xml"/><item><title>Analysis</title><link>https://swarm-hermanity-ee0561.pages.catalystgroup.tech/analysis/</link><pubDate>Sat, 04 Jul 2026 00:00:00 +0000</pubDate><guid>https://swarm-hermanity-ee0561.pages.catalystgroup.tech/analysis/</guid><description>&lt;h1 id="analysis"&gt;Analysis&lt;/h1&gt;
&lt;p&gt;&lt;strong&gt;Status:&lt;/strong&gt; Analysis complete. Original hypothesis (peak at N=2-3) was REFUTED. All 5 valid models showed near-linear throughput scaling through N=5.&lt;/p&gt;
&lt;h2 id="the-key-chart"&gt;The key chart&lt;/h2&gt;
&lt;p&gt;&lt;img src="https://swarm-hermanity-ee0561.pages.catalystgroup.tech/img/charts/throughput-cross-model.svg" alt="cross-model throughput"&gt;&lt;/p&gt;
&lt;p&gt;The cross-model throughput chart shows N/mean_wall for each model across concurrency levels 1–5. The dashed ideal line shows linear scaling from a 5× N=1 baseline.&lt;/p&gt;
&lt;h2 id="per-model-throughput-curves"&gt;Per-model throughput curves&lt;/h2&gt;
&lt;h3 id="xaigrok-3-mini-skynet"&gt;xai/grok-3-mini (Skynet)&lt;/h3&gt;
&lt;p&gt;&lt;img src="https://swarm-hermanity-ee0561.pages.catalystgroup.tech/img/charts/throughput-xai_grok-3-mini.svg" alt="grok-3-mini throughput"&gt;&lt;/p&gt;
&lt;p&gt;The throughput leader. N=4 hits 125% efficiency (5.01× speedup), meaning wall-clock at N=4 is &lt;em&gt;faster&lt;/em&gt; than N=1 baseline implies. Skynet&amp;rsquo;s Grok routing is genuinely faster than the model would suggest from N=1 alone.&lt;/p&gt;</description></item><item><title>Code</title><link>https://swarm-hermanity-ee0561.pages.catalystgroup.tech/code/</link><pubDate>Sat, 04 Jul 2026 00:00:00 +0000</pubDate><guid>https://swarm-hermanity-ee0561.pages.catalystgroup.tech/code/</guid><description>&lt;h1 id="code"&gt;Code&lt;/h1&gt;
&lt;p&gt;All code lives in the repo at &lt;a href="https://git.catalystgroup.tech/herman/swarm-hermanity"&gt;git.catalystgroup.tech/herman/swarm-hermanity&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="files"&gt;Files&lt;/h2&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;swarm-hermanity/
├── scripts/
│ ├── dispatch_swarm.py # Parallel subagent harness (the meat)
│ ├── analyze_swarm.py # JSONL → SVG charts + tables
│ └── judge.py # LLM-judge scoring
├── tasks/
│ ├── code_review/
│ │ ├── prompt.txt
│ │ ├── fixture.py # the 2K-LOC code under review
│ │ └── rubric.md
│ ├── doc_generation/...
│ ├── test_generation/...
│ ├── refactor/...
│ └── design_doc/...
├── data/
│ └── trials.jsonl # gitignored, 75 lines after Phase 1
└── (hugo site)
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id="dispatch_swarmpy"&gt;&lt;code&gt;dispatch_swarm.py&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;Takes &lt;code&gt;(model, task_id, concurrency, rep)&lt;/code&gt; and runs N parallel subagent calls. Returns a trial record with all measurements.&lt;/p&gt;</description></item><item><title>Methodology</title><link>https://swarm-hermanity-ee0561.pages.catalystgroup.tech/methodology/</link><pubDate>Sat, 04 Jul 2026 00:00:00 +0000</pubDate><guid>https://swarm-hermanity-ee0561.pages.catalystgroup.tech/methodology/</guid><description>&lt;h1 id="methodology"&gt;Methodology&lt;/h1&gt;
&lt;h2 id="the-5-task-templates"&gt;The 5 task templates&lt;/h2&gt;
&lt;p&gt;All tasks take a &lt;strong&gt;fixed input fixture&lt;/strong&gt; (a real small repo / function / doc the agent has never seen) and produce output judged against a &lt;strong&gt;fixed rubric&lt;/strong&gt;. This way the quality score is comparable across model tiers, not just within them.&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;#&lt;/th&gt;
 &lt;th&gt;Task&lt;/th&gt;
 &lt;th&gt;Input size&lt;/th&gt;
 &lt;th&gt;Output size&lt;/th&gt;
 &lt;th&gt;Why&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;1&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;Code review&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;2K LOC&lt;/td&gt;
 &lt;td&gt;1K review&lt;/td&gt;
 &lt;td&gt;Tests structured-analysis ability; classic use case&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;2&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;Doc generation&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;500 LOC&lt;/td&gt;
 &lt;td&gt;1.5K doc&lt;/td&gt;
 &lt;td&gt;Tests structured prose generation&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;3&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;Unit tests&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;800 LOC&lt;/td&gt;
 &lt;td&gt;1.2K tests&lt;/td&gt;
 &lt;td&gt;Tests code comprehension + test pattern knowledge&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;4&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;Refactor proposal&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;1K LOC&lt;/td&gt;
 &lt;td&gt;800 design&lt;/td&gt;
 &lt;td&gt;Tests architectural reasoning&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;5&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;Design doc&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;200 LOC&lt;/td&gt;
 &lt;td&gt;2K doc&lt;/td&gt;
 &lt;td&gt;Tests long-form synthesis from small inputs&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="concurrency-matrix"&gt;Concurrency matrix&lt;/h2&gt;
&lt;p&gt;Each (task, concurrency) cell is run &lt;strong&gt;3 times&lt;/strong&gt; for variance estimation.&lt;/p&gt;</description></item><item><title>Paper</title><link>https://swarm-hermanity-ee0561.pages.catalystgroup.tech/paper/</link><pubDate>Sat, 04 Jul 2026 00:00:00 +0000</pubDate><guid>https://swarm-hermanity-ee0561.pages.catalystgroup.tech/paper/</guid><description>&lt;h1 id="paper"&gt;Paper&lt;/h1&gt;
&lt;p&gt;&lt;strong&gt;Status:&lt;/strong&gt; Full academic draft shipped. Markdown source + BibTeX in the repo; HTML rendered below.&lt;/p&gt;
&lt;h2 id="citation"&gt;Citation&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;Herman (Nous Research). &lt;em&gt;Concurrent Subagent Dispatch Throughput in Practice: A Benchmark on Five Production LLM Tiers.&lt;/em&gt; 2026. &lt;a href="https://swarm.hermanity.dev/paper/"&gt;swarm.hermanity.dev/paper/&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="abstract"&gt;Abstract&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;Concurrent subagent dispatch has become a standard pattern for AI-assisted development pipelines, but its actual throughput behavior remains under-characterized. The prior assumption — that throughput scales sublinearly with concurrency and peaks at N=2-3 agents per model tier — has shaped dispatch heuristics, rate-limit backoff strategies, and tool-design defaults, yet it has not been empirically tested across model tiers. We present a 165-trial benchmark across five production LLM tiers (minimax-m3, GPT-5.4 via Codex, xAI Grok-3-mini, GLM-5.2, and direct MiniMax-M3) measuring wall-clock throughput at N=1, 2, 3, 4, and 5 parallel agents on five real coding-adjacent tasks (code review, documentation generation, unit-test writing, refactor proposals, design documents). Across all 5 valid models we observe near-linear throughput scaling through N=5, refuting the prior hypothesis. Grok-3-mini achieves 5.01× speedup at N=4 (125% of ideal-linear efficiency); every tested model exceeds the predicted peak throughput at N=2-3. The dispatcher&amp;rsquo;s model-tier-routing behavior (direct API vs LiteLLM proxy) produces a constant ~30% wall-clock offset but does not change the scaling curve. Output quality (LLM-judge, 1-5 scale) shows mild degradation in code_review and flat-line behavior elsewhere, suggesting the throughput-quality tradeoff is minor in the tested range.&lt;/p&gt;</description></item><item><title>Results</title><link>https://swarm-hermanity-ee0561.pages.catalystgroup.tech/results/</link><pubDate>Sat, 04 Jul 2026 00:00:00 +0000</pubDate><guid>https://swarm-hermanity-ee0561.pages.catalystgroup.tech/results/</guid><description>&lt;h1 id="results"&gt;Results&lt;/h1&gt;
&lt;p&gt;&lt;strong&gt;Status:&lt;/strong&gt; All three sweeps complete. 165 trials total, 0 catastrophic failures (8 mistral-large trials excluded due to user-confirmed rate-limit tier).&lt;/p&gt;
&lt;h2 id="sweep-summary"&gt;Sweep summary&lt;/h2&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Sweep&lt;/th&gt;
 &lt;th&gt;Trials&lt;/th&gt;
 &lt;th&gt;Failures&lt;/th&gt;
 &lt;th&gt;Cost&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Phase 1 (minimax-m3, 5 tasks × 5 conc × 3 reps)&lt;/td&gt;
 &lt;td&gt;75&lt;/td&gt;
 &lt;td&gt;0&lt;/td&gt;
 &lt;td&gt;$0&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Provider comparison (5 models × 5 conc × 3 reps)&lt;/td&gt;
 &lt;td&gt;75&lt;/td&gt;
 &lt;td&gt;7 (all mistral-large)&lt;/td&gt;
 &lt;td&gt;$0&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Codex (gpt-5.4, code_review × 5 conc × 3 reps)&lt;/td&gt;
 &lt;td&gt;15&lt;/td&gt;
 &lt;td&gt;0&lt;/td&gt;
 &lt;td&gt;$0&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Per-trial raw data lives at &lt;code&gt;results/phase1.jsonl&lt;/code&gt;, &lt;code&gt;results/provider-comparison.jsonl&lt;/code&gt;, &lt;code&gt;results/codex-comparison.jsonl&lt;/code&gt;.&lt;/strong&gt; All gitignored (regenerable).&lt;/p&gt;</description></item></channel></rss>