{"id":2309,"date":"2026-05-17T07:40:21","date_gmt":"2026-05-17T07:40:21","guid":{"rendered":"https:\/\/ai-box.eu\/?p=2309"},"modified":"2026-05-17T08:04:04","modified_gmt":"2026-05-17T08:04:04","slug":"nemo-agent-toolkit-multi-agent-supervisor-pattern-local","status":"publish","type":"post","link":"https:\/\/ai-box.eu\/en\/news\/nemo-agent-toolkit-multi-agent-supervisor-pattern-local\/2309\/","title":{"rendered":"NeMo Agent Toolkit &#8211; Multi-Agent Supervisor-Pattern local"},"content":{"rendered":"<p>After my last post, where I <a href=\"https:\/\/ai-box.eu\/en\/top-story-en\/nemo-agent-toolkit-orchestration\/\">dissected the ReAct loop in detail and built the first custom GPU status tool<\/a>, the next logical step follows: <strong>multi-agent orchestration with the supervisor pattern<\/strong>. Several specialized ReAct agents, each with its own toolset and its own identity, coordinated by an overarching supervisor agent. This is the point where &#8220;my agent can call tools&#8221; actually becomes &#8220;my agent can split up complex requests and delegate&#8221;. Still quite small in the example here, but the goal is to understand the principle and build it hands-on.<\/p>\n<p>And right up front: when I tried to get this running with my previous Qwen-2.5-7B model, I ran straight into some smaller problems that I&#8217;d like to spare you today. But more on that later.<\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_83 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/ai-box.eu\/en\/news\/nemo-agent-toolkit-multi-agent-supervisor-pattern-local\/2309\/#What_is_this_actually_about\" >What is this actually about?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/ai-box.eu\/en\/news\/nemo-agent-toolkit-multi-agent-supervisor-pattern-local\/2309\/#Prerequisites\" >Prerequisites<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/ai-box.eu\/en\/news\/nemo-agent-toolkit-multi-agent-supervisor-pattern-local\/2309\/#Why_Qwen-25-7B_fails_here\" >Why Qwen-2.5-7B fails here<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/ai-box.eu\/en\/news\/nemo-agent-toolkit-multi-agent-supervisor-pattern-local\/2309\/#Step_1_Provision_the_larger_model\" >Step 1: Provision the larger model<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/ai-box.eu\/en\/news\/nemo-agent-toolkit-multi-agent-supervisor-pattern-local\/2309\/#Step_2_Create_the_multi-agent_workflow\" >Step 2: Create the multi-agent workflow<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/ai-box.eu\/en\/news\/nemo-agent-toolkit-multi-agent-supervisor-pattern-local\/2309\/#Step_3_Launch_the_first_multi-agent_run\" >Step 3: Launch the first multi-agent run<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/ai-box.eu\/en\/news\/nemo-agent-toolkit-multi-agent-supervisor-pattern-local\/2309\/#Pros_and_cons_of_the_supervisor_pattern\" >Pros and cons of the supervisor pattern<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/ai-box.eu\/en\/news\/nemo-agent-toolkit-multi-agent-supervisor-pattern-local\/2309\/#Pitfalls_I_didnt_manage_to_avoid_while_building_this\" >Pitfalls I didn&#8217;t manage to avoid while building this<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/ai-box.eu\/en\/news\/nemo-agent-toolkit-multi-agent-supervisor-pattern-local\/2309\/#1_Qwen-7B_collapses_under_multi-agent_load\" >1. Qwen-7B collapses under multi-agent load<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/ai-box.eu\/en\/news\/nemo-agent-toolkit-multi-agent-supervisor-pattern-local\/2309\/#2_Keep_tool_descriptions_in_English\" >2. Keep tool descriptions in English<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-11\" href=\"https:\/\/ai-box.eu\/en\/news\/nemo-agent-toolkit-multi-agent-supervisor-pattern-local\/2309\/#3_Sub-agent_system_prompts_need_the_template_variables\" >3. Sub-agent system prompts need the template variables<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-12\" href=\"https:\/\/ai-box.eu\/en\/news\/nemo-agent-toolkit-multi-agent-supervisor-pattern-local\/2309\/#4_Expect_higher_token_consumption\" >4. Expect higher token consumption<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-13\" href=\"https:\/\/ai-box.eu\/en\/news\/nemo-agent-toolkit-multi-agent-supervisor-pattern-local\/2309\/#5_Custom_system_prompts_on_sub-agents_break_the_synthesis_phase\" >5. Custom system prompts on sub-agents break the synthesis phase<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-14\" href=\"https:\/\/ai-box.eu\/en\/news\/nemo-agent-toolkit-multi-agent-supervisor-pattern-local\/2309\/#When_is_the_effort_worth_it\" >When is the effort worth it?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-15\" href=\"https:\/\/ai-box.eu\/en\/news\/nemo-agent-toolkit-multi-agent-supervisor-pattern-local\/2309\/#Whats_coming_next\" >What&#8217;s coming next?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-16\" href=\"https:\/\/ai-box.eu\/en\/news\/nemo-agent-toolkit-multi-agent-supervisor-pattern-local\/2309\/#Conclusion\" >Conclusion<\/a><\/li><\/ul><\/nav><\/div>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"What_is_this_actually_about\"><\/span>What is this actually about?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>In a simple ReAct setup you have one agent that sees a list of tools and decides which one to call. That works wonderfully with three to five tools. But once you reach ten or more tools, something unpleasant happens \u2014 something we&#8217;ve actually known in computer science for many decades: the system prompt becomes gigantic (every tool brings its own description with it), and the LLM increasingly has problems making the right selection. This isn&#8217;t really a model problem in the narrow sense \u2014 it&#8217;s a <strong>cognitive-load problem<\/strong>.<\/p>\n<p>The supervisor pattern solves this through specialization. Instead of one agent with twenty tools, you end up with:<\/p>\n<ul class=\"wp-block-list\">\n<li>One <strong>supervisor<\/strong> that only sees three or four &#8220;tools&#8221; \u2014 and each of these tools is a specialized sub-agent<\/li>\n<li>Several <strong>specialist agents<\/strong>, each seeing only its own three to five tools and focused on its own domain<\/li>\n<\/ul>\n<p>Architecturally I&#8217;ve tried to depict it like this:<\/p>\n<div id=\"attachment_2311\" style=\"width: 742px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_multi_agent.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-2311\" class=\" wp-image-2311\" src=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_multi_agent-1024x417.jpg\" alt=\"NAT Multi Agent\" width=\"732\" height=\"298\" srcset=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_multi_agent-1024x417.jpg 1024w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_multi_agent-300x122.jpg 300w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_multi_agent-768x312.jpg 768w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_multi_agent-1536x625.jpg 1536w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_multi_agent-2048x833.jpg 2048w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_multi_agent-1080x439.jpg 1080w\" sizes=\"(max-width: 732px) 100vw, 732px\" \/><\/a><p id=\"caption-attachment-2311\" class=\"wp-caption-text\">NAT Multi Agent<\/p><\/div>\n<p>In the NeMo Agent Toolkit this works because \u2014 as I already hinted at in the orchestration post \u2014 <strong>an entire ReAct workflow is itself a function<\/strong> that you can reference as a tool in another workflow. &#8220;Everything is a function&#8221; becomes a concrete architectural property here.<\/p>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Prerequisites\"><\/span>Prerequisites<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>If you want to get hands-on now and run everything locally like I do, you&#8217;ll need the following components:<\/p>\n<ul class=\"wp-block-list\">\n<li>A working NAT setup as described in my <a href=\"https:\/\/ai-box.eu\/en\/top-story-en\/nemo-agent-toolkit-on-the-rtx-a6000-ada\/\">installation post<\/a><\/li>\n<li>The <code>gpu_status<\/code> tool from the <a href=\"https:\/\/ai-box.eu\/en\/top-story-en\/nemo-agent-toolkit-orchestration\/\">orchestration post<\/a> already set up and installed<\/li>\n<li>Ollama running as an inference server with the <code>wiki_search<\/code> tool available in NAT<\/li>\n<li>A <strong>larger model than Qwen-2.5-7B<\/strong> in Ollama. In my case it&#8217;s <code>qwen3.6:27b<\/code>, which works really well on my RTX A6000 Ada<\/li>\n<\/ul>\n<p>The last point about the model is the most important if you&#8217;re setting all this up locally. Let me briefly explain why.<\/p>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Why_Qwen-25-7B_fails_here\"><\/span>Why Qwen-2.5-7B fails here<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>My first reflex was: I&#8217;ll just take my trusted <code>qwen2.5:7b-instruct<\/code> that I also used in the previous posts \u2014 with 7 billion parameters it&#8217;s fast and frugal. The result was sobering. The run ended with this error message:<\/p>\n<pre class=\"wp-block-code\"><code>ReActAgentParsingFailedError: Failed to parse agent output after 3 attempts.\r\nError: Invalid Format: Missing 'Action:' after 'Thought:'.\r\nLLM output: ''<\/code><\/pre>\n<p>Look at the critical line: <strong>LLM output: &#8221;<\/strong>. The LLM returned <strong>nothing at all<\/strong>. So the result was an empty string. This isn&#8217;t a NAT bug, it&#8217;s a clear symptom: the 7B model collapsed under the cognitive load of the supervisor pattern. Maybe I could have extracted a result with a more refined prompt, but I want stability.<\/p>\n<p>Why? Imagine what the model is expected to do simultaneously in a single call:<\/p>\n<ul class=\"wp-block-list\">\n<li>Understand the long supervisor system prompt<\/li>\n<li>Parse the descriptions of all three sub-agents as &#8220;tools&#8221;<\/li>\n<li>Keep the multi-step ReAct format instruction in its head<\/li>\n<li>Analyze the user question and decompose it into sub-tasks<\/li>\n<li>Make a sensible action selection \u2014 and strictly adhere to the ReAct format while doing so<\/li>\n<\/ul>\n<p>At 7 billion parameters, the reasoning capacity for this simply isn&#8217;t enough. The model becomes overwhelmed and produces empty output. Classic symptomatology.<\/p>\n<p><strong>The lesson for your multi-agent deployment<\/strong>:<\/p>\n<p>From the supervisor pattern onwards, you need a significantly larger model. My recommendation, which I also tested for this post, is <strong>Qwen 3.6 27B<\/strong>, which with its 27 billion parameters fits comfortably into the 48 GB VRAM of the RTX A6000 Ada and handles the reasoning cleanly.<\/p>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Step_1_Provision_the_larger_model\"><\/span>Step 1: Provision the larger model<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>If you don&#8217;t already have <code>qwen3.6:27b<\/code> in Ollama, pull it now on your inference server:<\/p>\n<p><strong>Command:<\/strong> <code>ollama pull qwen3.6:27b<\/code><\/p>\n<p>The download is about 17 GB. On a decent internet connection that takes a few minutes. Afterwards, verify it&#8217;s available in your local cache:<\/p>\n<p><strong>Command:<\/strong> <code>ollama list | grep qwen3.6<\/code><\/p>\n<p>You should see <code>qwen3.6:27b<\/code> in the list.<\/p>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Step_2_Create_the_multi-agent_workflow\"><\/span>Step 2: Create the multi-agent workflow<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Now we&#8217;re finally getting to the hands-on part of this guide. Together we&#8217;ll build a single YAML workflow that simultaneously defines three specialized sub-agents and a supervising agent on top. Switch into the config directory and create the file. In my setup everything lives in the home folder in a subfolder <code>~\/nat-playground\/<\/code>.<\/p>\n<p><strong>Command:<\/strong> <code>cd ~\/nat-playground\/configs<\/code><\/p>\n<p>With the following command we create the workflow configuration.<\/p>\n<p><strong>Command:<\/strong> <code>nano experiment3_multi_agent.yml<\/code><\/p>\n<p>You&#8217;ll find the complete workflow in my GitHub repository again. Because of its length \u2014 about 80 lines of YAML with three standalone system prompts \u2014 I&#8217;ll link to the file instead of inlining it here.<\/p>\n<p><strong>GitHub Repository:<\/strong> <a href=\"https:\/\/github.com\/custom-build-robots\/nemo-agent-toolkit-examples\/blob\/main\/configs\/experiment3_multi_agent.yml\" target=\"_blank\" rel=\"noopener\">https:\/\/github.com\/custom-build-robots\/nemo-agent-toolkit-examples\/experiment3_multi_agent.yml<\/a><\/p>\n<p>Open the link, then copy the content and paste it into your file <code>experiment3_multi_agent.yml<\/code> in the terminal window. Then save the changes with <code>Ctrl + X<\/code> followed by <code>Y<\/code>.<\/p>\n<p>Let&#8217;s look at the structure without quoting the entire YAML. The workflow consists of four conceptual blocks:<\/p>\n<figure class=\"wp-block-table\">\n<table>\n<thead>\n<tr>\n<th>Block<\/th>\n<th>Content<\/th>\n<th>Purpose<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><code>llms:<\/code><\/td>\n<td>Ollama LLM with <code>qwen3.6:27b<\/code><\/td>\n<td>Shared by all four agents \u2014 memory-efficient<\/td>\n<\/tr>\n<tr>\n<td><code>functions:<\/code> Tools<\/td>\n<td><code>current_datetime<\/code>, <code>wiki_search<\/code>, <code>gpu_status<\/code><\/td>\n<td>The base tools that the sub-agents use<\/td>\n<\/tr>\n<tr>\n<td><code>functions:<\/code> Sub-agents<\/td>\n<td><code>research_agent<\/code>, <code>system_agent<\/code>, <code>time_agent<\/code><\/td>\n<td>Three standalone ReAct agents, each with one tool<\/td>\n<\/tr>\n<tr>\n<td><code>workflow:<\/code> Supervisor<\/td>\n<td>Top-level ReAct agent<\/td>\n<td>Sees the sub-agents as &#8220;tools&#8221; and delegates<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/figure>\n<p>The clever detail of this architecture: each sub-agent has its own description (the <code>description:<\/code> field in the YAML), and exactly this description is what the supervisor sees as a &#8220;tool description&#8221;. This means: the supervisor decides on the basis of these descriptions which specialist it addresses for a concrete question.<\/p>\n<p>For example, the supervisor sees the <code>research_agent<\/code> with the description &#8220;An expert agent for research questions. Use this when the user asks about historical events, famous people, scientific concepts&#8230;&#8221;. On a question like &#8220;Who was Konrad Zuse?&#8221; the supervisor will match this description and delegate.<\/p>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Step_3_Launch_the_first_multi-agent_run\"><\/span>Step 3: Launch the first multi-agent run<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Activate your venv if needed and run the workflow with a request that <strong>combines multiple domains<\/strong>. This is the showcase moment for the supervisor pattern:<\/p>\n<p><strong>Command:<\/strong> <code>source ~\/nat-playground\/.venv\/bin\/activate<\/code><\/p>\n<p><strong>Command:<\/strong> <code>nat run --config_file experiment3_multi_agent.yml --input \"What time is it, how is my GPU utilized, and who was Konrad Zuse?\"<\/code><\/p>\n<p>On the first call, Ollama loads the 27B model into VRAM. That takes around 10-15 seconds. On the second call the model is resident and the workflow starts immediately.<\/p>\n<p>The following image shows the result on my terminal after running the multi-agent workflow.<\/p>\n<div id=\"attachment_2315\" style=\"width: 1034px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_multi_agent_workflow_result.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-2315\" class=\"size-large wp-image-2315\" src=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_multi_agent_workflow_result-1024x520.jpg\" alt=\"NAT Multi Agent - Workflow result\" width=\"1024\" height=\"520\" srcset=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_multi_agent_workflow_result-1024x520.jpg 1024w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_multi_agent_workflow_result-300x152.jpg 300w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_multi_agent_workflow_result-768x390.jpg 768w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_multi_agent_workflow_result-1080x549.jpg 1080w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_multi_agent_workflow_result.jpg 1453w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/a><p id=\"caption-attachment-2315\" class=\"wp-caption-text\">NAT Multi Agent &#8211; Workflow result<\/p><\/div>\n<p>Here&#8217;s the result as text for easier reading:<\/p>\n<blockquote><p>Workflow Result:<br \/>\nThe current time is 07:23.<\/p>\n<p>Your GPU (NVIDIA RTX 6000 Ada Generation) is currently heavily loaded:<br \/>\n&#8211; **Utilization:** 95%<br \/>\n&#8211; **Memory:** 85.3% (41920 MiB of 49140 MiB)<br \/>\n&#8211; **Temperature:** 74\u00b0C<br \/>\n&#8211; **Power draw:** 299.90W (close to the maximum of 300W)<\/p>\n<p>Konrad Zuse (1910-1995) was a German engineer and is considered one of the pioneers of computer development. He is best known as the inventor of the first freely programmable computer, the **Z3**, which was completed in 1941. His other significant achievements include the development of the first high-level programming language **Plankalk\u00fcl** and the introduction of binary floating-point arithmetic.<\/p><\/blockquote>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Pros_and_cons_of_the_supervisor_pattern\"><\/span>Pros and cons of the supervisor pattern<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Before you go and rebuild everything as multi-agent setups, here&#8217;s an honest comparison. The supervisor pattern is powerful, but not free:<\/p>\n<figure class=\"wp-block-table\">\n<table>\n<thead>\n<tr>\n<th>Aspect<\/th>\n<th>Single Agent<\/th>\n<th>Multi-Agent Supervisor<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>Context size<\/strong><\/td>\n<td>All tool descriptions in one prompt<\/td>\n<td>Each sub-agent sees only its relevant tools<\/td>\n<\/tr>\n<tr>\n<td><strong>Reasoning load<\/strong><\/td>\n<td>One LLM call must decide everything<\/td>\n<td>Split into multiple specialized mini-calls<\/td>\n<\/tr>\n<tr>\n<td><strong>Error isolation<\/strong><\/td>\n<td>A confused agent delivers nothing useful<\/td>\n<td>A sub-agent can fail, the supervisor reacts<\/td>\n<\/tr>\n<tr>\n<td><strong>Scalability<\/strong><\/td>\n<td>With 20+ tools it becomes chaotic<\/td>\n<td>Arbitrarily many specialists possible<\/td>\n<\/tr>\n<tr>\n<td><strong>Reusability<\/strong><\/td>\n<td>Tools only in this one workflow<\/td>\n<td>Specialists are workflows themselves, reusable<\/td>\n<\/tr>\n<tr>\n<td><strong>Latency<\/strong><\/td>\n<td>Lower \u2014 one LLM call per iteration<\/td>\n<td>Higher \u2014 at least N+1 LLM calls per request<\/td>\n<\/tr>\n<tr>\n<td><strong>VRAM requirement<\/strong><\/td>\n<td>A small model is often enough<\/td>\n<td>Larger model recommended<\/td>\n<\/tr>\n<tr>\n<td><strong>Debugging<\/strong><\/td>\n<td>One trace, linear logic<\/td>\n<td>Nested traces, more complexity<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/figure>\n<p>My personal rule of thumb: <strong>under 5 tools \u2014 single agent. Above 10 tools \u2014 supervisor. In between \u2014 it depends<\/strong> on how homogeneous the tools are. If they&#8217;re all from the same domain \u2014 e.g. all database queries \u2014 stick with single agent. If they&#8217;re very different \u2014 database + web + hardware + filesystem \u2014 specialization starts to pay off.<\/p>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Pitfalls_I_didnt_manage_to_avoid_while_building_this\"><\/span>Pitfalls I didn&#8217;t manage to avoid while building this<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"1_Qwen-7B_collapses_under_multi-agent_load\"><\/span>1. Qwen-7B collapses under multi-agent load<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>I already described this above. Empty LLM output. The fix is a larger model. Even though that costs more VRAM \u2014 if you want to use multi-agent seriously, this isn&#8217;t negotiable.<\/p>\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"2_Keep_tool_descriptions_in_English\"><\/span>2. Keep tool descriptions in English<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Even if your user input is in another language and your final answer is supposed to be in that language, the tool descriptions and sub-agent descriptions <strong>should still be in English<\/strong>. This didn&#8217;t really surprise me. The reason: LLMs are primarily trained on English instructions, and they understand English tool descriptions much more precisely than other languages. The language separation \u2014 English for the system mechanics, your target language for the user-facing output \u2014 works surprisingly well in practice.<\/p>\n<p>With Chinese-origin models you may need to evaluate writing the system prompt in Chinese instead. That&#8217;s something to test on a per-model basis.<\/p>\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"3_Sub-agent_system_prompts_need_the_template_variables\"><\/span>3. Sub-agent system prompts need the template variables<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Just like the top-level ReAct agent, sub-agents also need <code>{tools}<\/code> and <code>{tool_names}<\/code> in their <code>system_prompt<\/code> \u2014 otherwise you get the well-known <code>ValueError: Invalid system_prompt<\/code>. I already knew this from the first post, but with three parallel system prompts you really have to maintain the discipline. Until I had the workflow running I actually had to research this point quite a bit.<\/p>\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"4_Expect_higher_token_consumption\"><\/span>4. Expect higher token consumption<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>A single multi-agent call typically produced 4 to 6 LLM calls for me (supervisor + one or two iterations per sub-agent). At a 27B model with roughly 30 tokens\/sec, this means a noticeable latency of 15 to 30 seconds for the full answer. If that&#8217;s too long for you, evaluate whether the tool-calling agent (instead of the ReAct agent) is an option \u2014 that agent type is usually more efficient.<\/p>\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"5_Custom_system_prompts_on_sub-agents_break_the_synthesis_phase\"><\/span>5. Custom system prompts on sub-agents break the synthesis phase<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>When you use a <code>react_agent<\/code> as a sub-agent in a multi-agent setup, you should <strong>not give it a custom system prompt<\/strong>. NAT&#8217;s standard prompt is tested with the internal conversation-history rendering, and your own format templates often lead to the LLM producing empty output after the tool call. It also took me a while to find this solution. <strong>My personal verdict:<\/strong> custom prompts belong on the supervisor, not on the sub-agents.<\/p>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"When_is_the_effort_worth_it\"><\/span>When is the effort worth it?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>While building this setup I kept thinking: do I actually need this? Here are the scenarios where the multi-agent pattern really pays off:<\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Domain separation<\/strong>: You have clearly delineated subject areas (accounting, research, hardware) that each have their own tool sets<\/li>\n<li><strong>Multi-stage workflows<\/strong>: When a request necessarily runs through several phases (research \u2192 analysis \u2192 summary)<\/li>\n<li><strong>Tool explosion<\/strong>: You end up with 15-30 tools and notice that the single agent increasingly makes wrong decisions because it gets confused \ud83d\ude09<\/li>\n<li><strong>Team setup<\/strong>: Different people maintain different specialist agents. Here a clean separation can lead to significantly better parallel development of the agents<\/li>\n<\/ul>\n<p>For pure hobby projects with five to seven tools I&#8217;d stay with a single agent. The multi-agent complexity only justifies itself from a certain degree of heterogeneity onwards.<\/p>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Whats_coming_next\"><\/span>What&#8217;s coming next?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>With the multi-agent setup you&#8217;ve now seen the full bandwidth of NAT orchestration: from single agent over custom tools all the way to the hierarchical supervisor. What&#8217;s still left to explore:<\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Tool-calling agent<\/strong>: Instead of ReAct plaintext parsing, use the native function-calling API of Qwen 3.6 \u2014 more robust and faster<\/li>\n<li><strong>MCP integration<\/strong>: Plug in existing MCP servers (filesystem, GitHub, Slack) as tools, instead of writing everything yourself<\/li>\n<li><strong>Memory plugin<\/strong>: Long-term memory for the supervisor, so it can learn from previous requests<\/li>\n<li><strong>A2A protocol<\/strong>: Distribute specialist agents across different machines<\/li>\n<li><strong>Parallel sub-agents<\/strong>: Instead of delegating sequentially, let the supervisor kick off several independent sub-tasks in parallel<\/li>\n<\/ul>\n<p>My concrete next step, however, is something quite different: I want to add an <strong>ESP32 controller<\/strong> as an additional specialist agent that communicates with my robot car. That would give my multi-agent setup not just software tools, but real hardware actuators. Exactly the bridge between language model and physical AI that I keep talking about in this series.<\/p>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Conclusion\"><\/span>Conclusion<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Multi-agent orchestration with the NeMo Agent Toolkit is a conceptually elegant step upwards \u2014 &#8220;one agent with tools&#8221; becomes &#8220;a team of specialized agents with a coordinator&#8221;. But: this step has a cost. Larger model, higher latency, more debug complexity.<\/p>\n<p>What I take away from building this setup:<\/p>\n<ol class=\"wp-block-list\">\n<li><strong>Model size is critical for multi-agent.<\/strong> 7B isn&#8217;t enough. 27B is my minimum for clean supervisor loops.<\/li>\n<li><strong>Context window must be explicitly configured.<\/strong> Ollama&#8217;s default is usually too small. <code>num_ctx: 16384<\/code> is my new default value for agent workflows.<\/li>\n<li><strong>Language separation works.<\/strong> English for the system mechanics, your target language for the user output \u2014 that&#8217;s the most effective combination I&#8217;ve found so far. Though maybe I&#8217;ll have to learn Chinese as well.<\/li>\n<li><strong>Sub-agent descriptions are the new tool-description problem.<\/strong> Bad descriptions = wrong delegation. Invest your time here.<\/li>\n<\/ol>\n<p>If you&#8217;ve worked through this post, you master the full ReAct spectrum of NAT. In my case on an RTX A6000 Ada (or comparable hardware). This gives you a solid foundation for everything currently happening in the agentic AI world. From simple chat helpers to complex hybrid setups with hardware integration.<\/p>\n<p>If you build your own multi-agent setups and hit interesting patterns, comments or emails are welcome. My repository with all workflows from this series is at <a href=\"https:\/\/github.com\/custom-build-robots\/nemo-agent-toolkit-examples\" target=\"_blank\" rel=\"noopener\">github.com\/custom-build-robots\/nemo-agent-toolkit-examples<\/a>.<\/p>\n<p>Good luck with your own multi-agent setup!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>After my last post, where I dissected the ReAct loop in detail and built the first custom GPU status tool, the next logical step follows: multi-agent orchestration with the supervisor pattern. Several specialized ReAct agents, each with its own toolset and its own identity, coordinated by an overarching supervisor agent. This is the point where [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":2311,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_et_pb_use_builder":"","_et_pb_old_content":"","_et_gb_content_width":"","footnotes":""},"categories":[162,8,50],"tags":[1253,1221,1254,333,1245,1224,1220,1255,306,1226,1252,1243,1222,1176,1247],"class_list":["post-2309","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-large-language-models-en","category-news","category-top-story-en","tag-agent-orchestration","tag-agentic-ai","tag-hierarchical-agents","tag-langchain-en","tag-multi-agent","tag-nat","tag-nemo-agent-toolkit","tag-num_ctx","tag-ollama-en","tag-python-venv","tag-qwen-3-6-27b","tag-react","tag-react-agent","tag-rtx-a6000-ada","tag-supervisor-pattern","et-has-post-format-content","et_post_format-et-post-format-standard"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.6 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>NeMo Agent Toolkit - Multi-Agent Supervisor-Pattern local - Exploring the Future: Inside the AI Box<\/title>\n<meta name=\"description\" content=\"Multi-Agent mit dem NeMo Agent Toolkit: drei spezialisierte ReAct-Agenten unter einem Supervisor hands-on mit Qwen 3.6 27B auf der RTX A6000 Ada.Build a multi-agent supervisor pattern with the NeMo Agent Toolkit: hands-on with Qwen 3.6 27B, plus the pitfalls we hit along the way.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/ai-box.eu\/en\/news\/nemo-agent-toolkit-multi-agent-supervisor-pattern-local\/2309\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"NeMo Agent Toolkit - Multi-Agent Supervisor-Pattern local - Exploring the Future: Inside the AI Box\" \/>\n<meta property=\"og:description\" content=\"Multi-Agent mit dem NeMo Agent Toolkit: drei spezialisierte ReAct-Agenten unter einem Supervisor hands-on mit Qwen 3.6 27B auf der RTX A6000 Ada.Build a multi-agent supervisor pattern with the NeMo Agent Toolkit: hands-on with Qwen 3.6 27B, plus the pitfalls we hit along the way.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/ai-box.eu\/en\/news\/nemo-agent-toolkit-multi-agent-supervisor-pattern-local\/2309\/\" \/>\n<meta property=\"og:site_name\" content=\"Exploring the Future: Inside the AI Box\" \/>\n<meta property=\"article:published_time\" content=\"2026-05-17T07:40:21+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-05-17T08:04:04+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_multi_agent.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"2497\" \/>\n\t<meta property=\"og:image:height\" content=\"1016\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Maker\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Ingmar_Stapel\" \/>\n<meta name=\"twitter:site\" content=\"@Ingmar_Stapel\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Maker\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"12 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/nemo-agent-toolkit-multi-agent-supervisor-pattern-local\\\/2309\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/nemo-agent-toolkit-multi-agent-supervisor-pattern-local\\\/2309\\\/\"},\"author\":{\"name\":\"Maker\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/#\\\/schema\\\/person\\\/cc91d08618b3feeef6926591b465eab1\"},\"headline\":\"NeMo Agent Toolkit &#8211; Multi-Agent Supervisor-Pattern local\",\"datePublished\":\"2026-05-17T07:40:21+00:00\",\"dateModified\":\"2026-05-17T08:04:04+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/nemo-agent-toolkit-multi-agent-supervisor-pattern-local\\\/2309\\\/\"},\"wordCount\":2299,\"commentCount\":0,\"image\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/nemo-agent-toolkit-multi-agent-supervisor-pattern-local\\\/2309\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/ai-box.eu\\\/wp-content\\\/uploads\\\/2026\\\/05\\\/NAT_multi_agent.jpg\",\"keywords\":[\"agent orchestration\",\"Agentic AI\",\"hierarchical agents\",\"LangChain\",\"Multi-Agent\",\"NAT\",\"NeMo Agent Toolkit\",\"num_ctx\",\"Ollama\",\"Python venv\",\"Qwen 3.6 27B\",\"ReAct\",\"ReAct Agent\",\"RTX A6000 Ada\",\"Supervisor-Pattern\"],\"articleSection\":[\"Large Language Models\",\"News\",\"Top story\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/nemo-agent-toolkit-multi-agent-supervisor-pattern-local\\\/2309\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/nemo-agent-toolkit-multi-agent-supervisor-pattern-local\\\/2309\\\/\",\"url\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/nemo-agent-toolkit-multi-agent-supervisor-pattern-local\\\/2309\\\/\",\"name\":\"NeMo Agent Toolkit - Multi-Agent Supervisor-Pattern local - Exploring the Future: Inside the AI Box\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/nemo-agent-toolkit-multi-agent-supervisor-pattern-local\\\/2309\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/nemo-agent-toolkit-multi-agent-supervisor-pattern-local\\\/2309\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/ai-box.eu\\\/wp-content\\\/uploads\\\/2026\\\/05\\\/NAT_multi_agent.jpg\",\"datePublished\":\"2026-05-17T07:40:21+00:00\",\"dateModified\":\"2026-05-17T08:04:04+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/#\\\/schema\\\/person\\\/cc91d08618b3feeef6926591b465eab1\"},\"description\":\"Multi-Agent mit dem NeMo Agent Toolkit: drei spezialisierte ReAct-Agenten unter einem Supervisor hands-on mit Qwen 3.6 27B auf der RTX A6000 Ada.Build a multi-agent supervisor pattern with the NeMo Agent Toolkit: hands-on with Qwen 3.6 27B, plus the pitfalls we hit along the way.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/nemo-agent-toolkit-multi-agent-supervisor-pattern-local\\\/2309\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/nemo-agent-toolkit-multi-agent-supervisor-pattern-local\\\/2309\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/nemo-agent-toolkit-multi-agent-supervisor-pattern-local\\\/2309\\\/#primaryimage\",\"url\":\"https:\\\/\\\/ai-box.eu\\\/wp-content\\\/uploads\\\/2026\\\/05\\\/NAT_multi_agent.jpg\",\"contentUrl\":\"https:\\\/\\\/ai-box.eu\\\/wp-content\\\/uploads\\\/2026\\\/05\\\/NAT_multi_agent.jpg\",\"width\":2497,\"height\":1016,\"caption\":\"NAT Multi Agent\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/nemo-agent-toolkit-multi-agent-supervisor-pattern-local\\\/2309\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Start\",\"item\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"NeMo Agent Toolkit &#8211; Multi-Agent Supervisor-Pattern local\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/#website\",\"url\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/\",\"name\":\"Exploring the Future: Inside the AI Box\",\"description\":\"Inside the AI Box, we share our experiences and discoveries in the world of artificial intelligence.\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/#\\\/schema\\\/person\\\/cc91d08618b3feeef6926591b465eab1\",\"name\":\"Maker\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g\",\"caption\":\"Maker\"},\"description\":\"I live in Bavaria near Munich. In my head I always have many topics and try out especially in the field of Internet new media much in my spare time. I write on the blog because it makes me fun to report about the things that inspire me. I am happy about every comment, about suggestion and very about questions.\",\"sameAs\":[\"https:\\\/\\\/ai-box.eu\"],\"url\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/author\\\/ingmars\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"NeMo Agent Toolkit - Multi-Agent Supervisor-Pattern local - Exploring the Future: Inside the AI Box","description":"Multi-Agent mit dem NeMo Agent Toolkit: drei spezialisierte ReAct-Agenten unter einem Supervisor hands-on mit Qwen 3.6 27B auf der RTX A6000 Ada.Build a multi-agent supervisor pattern with the NeMo Agent Toolkit: hands-on with Qwen 3.6 27B, plus the pitfalls we hit along the way.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/ai-box.eu\/en\/news\/nemo-agent-toolkit-multi-agent-supervisor-pattern-local\/2309\/","og_locale":"en_US","og_type":"article","og_title":"NeMo Agent Toolkit - Multi-Agent Supervisor-Pattern local - Exploring the Future: Inside the AI Box","og_description":"Multi-Agent mit dem NeMo Agent Toolkit: drei spezialisierte ReAct-Agenten unter einem Supervisor hands-on mit Qwen 3.6 27B auf der RTX A6000 Ada.Build a multi-agent supervisor pattern with the NeMo Agent Toolkit: hands-on with Qwen 3.6 27B, plus the pitfalls we hit along the way.","og_url":"https:\/\/ai-box.eu\/en\/news\/nemo-agent-toolkit-multi-agent-supervisor-pattern-local\/2309\/","og_site_name":"Exploring the Future: Inside the AI Box","article_published_time":"2026-05-17T07:40:21+00:00","article_modified_time":"2026-05-17T08:04:04+00:00","og_image":[{"width":2497,"height":1016,"url":"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_multi_agent.jpg","type":"image\/jpeg"}],"author":"Maker","twitter_card":"summary_large_image","twitter_creator":"@Ingmar_Stapel","twitter_site":"@Ingmar_Stapel","twitter_misc":{"Written by":"Maker","Est. reading time":"12 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/ai-box.eu\/en\/news\/nemo-agent-toolkit-multi-agent-supervisor-pattern-local\/2309\/#article","isPartOf":{"@id":"https:\/\/ai-box.eu\/en\/news\/nemo-agent-toolkit-multi-agent-supervisor-pattern-local\/2309\/"},"author":{"name":"Maker","@id":"https:\/\/ai-box.eu\/en\/#\/schema\/person\/cc91d08618b3feeef6926591b465eab1"},"headline":"NeMo Agent Toolkit &#8211; Multi-Agent Supervisor-Pattern local","datePublished":"2026-05-17T07:40:21+00:00","dateModified":"2026-05-17T08:04:04+00:00","mainEntityOfPage":{"@id":"https:\/\/ai-box.eu\/en\/news\/nemo-agent-toolkit-multi-agent-supervisor-pattern-local\/2309\/"},"wordCount":2299,"commentCount":0,"image":{"@id":"https:\/\/ai-box.eu\/en\/news\/nemo-agent-toolkit-multi-agent-supervisor-pattern-local\/2309\/#primaryimage"},"thumbnailUrl":"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_multi_agent.jpg","keywords":["agent orchestration","Agentic AI","hierarchical agents","LangChain","Multi-Agent","NAT","NeMo Agent Toolkit","num_ctx","Ollama","Python venv","Qwen 3.6 27B","ReAct","ReAct Agent","RTX A6000 Ada","Supervisor-Pattern"],"articleSection":["Large Language Models","News","Top story"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/ai-box.eu\/en\/news\/nemo-agent-toolkit-multi-agent-supervisor-pattern-local\/2309\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/ai-box.eu\/en\/news\/nemo-agent-toolkit-multi-agent-supervisor-pattern-local\/2309\/","url":"https:\/\/ai-box.eu\/en\/news\/nemo-agent-toolkit-multi-agent-supervisor-pattern-local\/2309\/","name":"NeMo Agent Toolkit - Multi-Agent Supervisor-Pattern local - Exploring the Future: Inside the AI Box","isPartOf":{"@id":"https:\/\/ai-box.eu\/en\/#website"},"primaryImageOfPage":{"@id":"https:\/\/ai-box.eu\/en\/news\/nemo-agent-toolkit-multi-agent-supervisor-pattern-local\/2309\/#primaryimage"},"image":{"@id":"https:\/\/ai-box.eu\/en\/news\/nemo-agent-toolkit-multi-agent-supervisor-pattern-local\/2309\/#primaryimage"},"thumbnailUrl":"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_multi_agent.jpg","datePublished":"2026-05-17T07:40:21+00:00","dateModified":"2026-05-17T08:04:04+00:00","author":{"@id":"https:\/\/ai-box.eu\/en\/#\/schema\/person\/cc91d08618b3feeef6926591b465eab1"},"description":"Multi-Agent mit dem NeMo Agent Toolkit: drei spezialisierte ReAct-Agenten unter einem Supervisor hands-on mit Qwen 3.6 27B auf der RTX A6000 Ada.Build a multi-agent supervisor pattern with the NeMo Agent Toolkit: hands-on with Qwen 3.6 27B, plus the pitfalls we hit along the way.","breadcrumb":{"@id":"https:\/\/ai-box.eu\/en\/news\/nemo-agent-toolkit-multi-agent-supervisor-pattern-local\/2309\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/ai-box.eu\/en\/news\/nemo-agent-toolkit-multi-agent-supervisor-pattern-local\/2309\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/ai-box.eu\/en\/news\/nemo-agent-toolkit-multi-agent-supervisor-pattern-local\/2309\/#primaryimage","url":"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_multi_agent.jpg","contentUrl":"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_multi_agent.jpg","width":2497,"height":1016,"caption":"NAT Multi Agent"},{"@type":"BreadcrumbList","@id":"https:\/\/ai-box.eu\/en\/news\/nemo-agent-toolkit-multi-agent-supervisor-pattern-local\/2309\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Start","item":"https:\/\/ai-box.eu\/en\/"},{"@type":"ListItem","position":2,"name":"NeMo Agent Toolkit &#8211; Multi-Agent Supervisor-Pattern local"}]},{"@type":"WebSite","@id":"https:\/\/ai-box.eu\/en\/#website","url":"https:\/\/ai-box.eu\/en\/","name":"Exploring the Future: Inside the AI Box","description":"Inside the AI Box, we share our experiences and discoveries in the world of artificial intelligence.","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/ai-box.eu\/en\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/ai-box.eu\/en\/#\/schema\/person\/cc91d08618b3feeef6926591b465eab1","name":"Maker","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g","caption":"Maker"},"description":"I live in Bavaria near Munich. In my head I always have many topics and try out especially in the field of Internet new media much in my spare time. I write on the blog because it makes me fun to report about the things that inspire me. I am happy about every comment, about suggestion and very about questions.","sameAs":["https:\/\/ai-box.eu"],"url":"https:\/\/ai-box.eu\/en\/author\/ingmars\/"}]}},"_links":{"self":[{"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/posts\/2309","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/comments?post=2309"}],"version-history":[{"count":6,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/posts\/2309\/revisions"}],"predecessor-version":[{"id":2321,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/posts\/2309\/revisions\/2321"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/media\/2311"}],"wp:attachment":[{"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/media?parent=2309"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/categories?post=2309"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/tags?post=2309"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}