{"id":2288,"date":"2026-05-17T05:05:20","date_gmt":"2026-05-17T05:05:20","guid":{"rendered":"https:\/\/ai-box.eu\/?p=2288"},"modified":"2026-06-05T04:37:42","modified_gmt":"2026-06-05T04:37:42","slug":"nemo-agent-toolkit-genai-agent-orchestration-run-locally","status":"publish","type":"post","link":"https:\/\/ai-box.eu\/en\/large-language-models-en\/nemo-agent-toolkit-genai-agent-orchestration-run-locally\/2288\/","title":{"rendered":"NeMo Agent Toolkit &#8211; GenAI Agent Orchestration Run Locally"},"content":{"rendered":"<p>Agent orchestration is conceptually exactly the leap that turns &#8220;LLM inference&#8221; into actual &#8220;intelligent applications&#8221;. I already have a working NAT setup with Ollama in place, as described in my blog post here &#8220;<a href=\"https:\/\/ai-box.eu\/large-language-models\/nemo-agent-toolkit-ollama\/2282\/\" target=\"_blank\" rel=\"noopener\">NeMo Agent Toolkit on the RTX A6000 Ada \u2013 from the inference layer to the orchestrator layer<\/a>&#8220;. Now I want to describe the approach and the architecture step by step, from the basics all the way to complex multi-agent patterns.<\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_84 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/ai-box.eu\/en\/large-language-models-en\/nemo-agent-toolkit-genai-agent-orchestration-run-locally\/2288\/#What_does_%E2%80%9Corchestration%E2%80%9D_actually_mean\" >What does &#8220;orchestration&#8221; actually mean?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/ai-box.eu\/en\/large-language-models-en\/nemo-agent-toolkit-genai-agent-orchestration-run-locally\/2288\/#The_ReAct_loop_in_detail\" >The ReAct loop in detail<\/a><ul class='ez-toc-list-level-4' ><li class='ez-toc-heading-level-4'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/ai-box.eu\/en\/large-language-models-en\/nemo-agent-toolkit-genai-agent-orchestration-run-locally\/2288\/#Experiment_1_Tool_description_influences_tool_selection\" >Experiment 1: Tool description influences tool selection<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/ai-box.eu\/en\/large-language-models-en\/nemo-agent-toolkit-genai-agent-orchestration-run-locally\/2288\/#Hands-on_Writing_your_own_Python_tool\" >Hands-on: Writing your own Python tool<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/ai-box.eu\/en\/large-language-models-en\/nemo-agent-toolkit-genai-agent-orchestration-run-locally\/2288\/#Step_1_Writing_the_Python_tool\" >Step 1: Writing the Python tool<\/a><ul class='ez-toc-list-level-4' ><li class='ez-toc-heading-level-4'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/ai-box.eu\/en\/large-language-models-en\/nemo-agent-toolkit-genai-agent-orchestration-run-locally\/2288\/#Step_2_Registering_the_tool_as_a_package\" >Step 2: Registering the tool as a package<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/ai-box.eu\/en\/large-language-models-en\/nemo-agent-toolkit-genai-agent-orchestration-run-locally\/2288\/#Step_3_Installing_the_tool_in_the_active_venv\" >Step 3: Installing the tool in the active venv<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/ai-box.eu\/en\/large-language-models-en\/nemo-agent-toolkit-genai-agent-orchestration-run-locally\/2288\/#Step_4_Verifying_that_NAT_sees_the_tool\" >Step 4: Verifying that NAT sees the tool<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/ai-box.eu\/en\/large-language-models-en\/nemo-agent-toolkit-genai-agent-orchestration-run-locally\/2288\/#Step_5_Building_a_workflow_with_the_new_tool\" >Step 5: Building a workflow with the new tool<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/ai-box.eu\/en\/large-language-models-en\/nemo-agent-toolkit-genai-agent-orchestration-run-locally\/2288\/#Step_6_Running_the_first_hardware_agent\" >Step 6: Running the first hardware agent<\/a><\/li><\/ul><\/nav><\/div>\n<h3 class=\"text-text-100 mt-3 -mb-1 text-[1.125rem] font-bold\"><span class=\"ez-toc-section\" id=\"What_does_%E2%80%9Corchestration%E2%80%9D_actually_mean\"><\/span>What does &#8220;orchestration&#8221; actually mean?<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p class=\"font-claude-response-body break-words whitespace-normal leading-[1.7]\">In the context of the NeMo Agent Toolkit and related approaches, orchestration refers to the <strong>coordination of multiple components<\/strong> into meaningful overall work. These components can be:<\/p>\n<ul class=\"[li_&amp;]:mb-0 [li_&amp;]:mt-1 [li_&amp;]:gap-1 [&amp;:not(:last-child)_ul]:pb-1 [&amp;:not(:last-child)_ol]:pb-1 list-disc flex flex-col gap-1 pl-8 mb-3\">\n<li class=\"font-claude-response-body whitespace-normal break-words pl-2\"><strong>Tools<\/strong> (functions that the agent calls, e.g. <code class=\"bg-text-200\/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]\">wikipedia_search<\/code> or the <code>Date_Time<\/code> function)<\/li>\n<li class=\"font-claude-response-body whitespace-normal break-words pl-2\"><strong>LLMs<\/strong> (different models for different tasks, depending on their capabilities)<\/li>\n<li class=\"font-claude-response-body whitespace-normal break-words pl-2\"><strong>Agents<\/strong> (self-contained ReAct loops that can themselves serve as &#8220;tools&#8221; for higher-level agents)<\/li>\n<li class=\"font-claude-response-body whitespace-normal break-words pl-2\"><strong>Memory<\/strong> (short-term and long-term memory between calls)<\/li>\n<\/ul>\n<p class=\"font-claude-response-body break-words whitespace-normal leading-[1.7]\">Here I want to present the four fundamental orchestration patterns that you can easily build with a NAT setup and ReAct:<\/p>\n<p>Pattern 1: Single Agent with Tool Selection<\/p>\n<div id=\"attachment_2290\" style=\"width: 488px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_Single_Agent_tool-1024x236.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-2290\" class=\" wp-image-2290\" src=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_Single_Agent_tool-1024x236.jpg\" alt=\"Single Agent with Tool Selection\" width=\"478\" height=\"110\" srcset=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_Single_Agent_tool-1024x236.jpg 1024w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_Single_Agent_tool-300x69.jpg 300w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_Single_Agent_tool-768x177.jpg 768w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_Single_Agent_tool-1536x354.jpg 1536w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_Single_Agent_tool-1080x249.jpg 1080w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_Single_Agent_tool.jpg 1959w\" sizes=\"(max-width: 478px) 100vw, 478px\" \/><\/a><p id=\"caption-attachment-2290\" class=\"wp-caption-text\">Single Agent with Tool Selection<\/p><\/div>\n<p>Pattern 2: Sequential Pipeline<\/p>\n<div id=\"attachment_2292\" style=\"width: 782px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_Sequential_Pipeline-scaled.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-2292\" class=\" wp-image-2292\" src=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_Sequential_Pipeline-1024x138.jpg\" alt=\"Sequential Pipeline\" width=\"772\" height=\"104\" srcset=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_Sequential_Pipeline-1024x138.jpg 1024w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_Sequential_Pipeline-300x41.jpg 300w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_Sequential_Pipeline-768x104.jpg 768w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_Sequential_Pipeline-1536x207.jpg 1536w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_Sequential_Pipeline-2048x277.jpg 2048w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_Sequential_Pipeline-1080x146.jpg 1080w\" sizes=\"(max-width: 772px) 100vw, 772px\" \/><\/a><p id=\"caption-attachment-2292\" class=\"wp-caption-text\">Sequential Pipeline<\/p><\/div>\n<p>Pattern 3: Supervisor \/ Worker (hierarchical)<\/p>\n<div id=\"attachment_2294\" style=\"width: 608px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_Supervisor_Worker-1024x493.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-2294\" class=\" wp-image-2294\" src=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_Supervisor_Worker-1024x493.jpg\" alt=\"Supervisor \/ Worker (hierarchical)\" width=\"598\" height=\"288\" srcset=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_Supervisor_Worker-1024x493.jpg 1024w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_Supervisor_Worker-300x144.jpg 300w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_Supervisor_Worker-768x369.jpg 768w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_Supervisor_Worker-1536x739.jpg 1536w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_Supervisor_Worker-2048x985.jpg 2048w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_Supervisor_Worker-1080x520.jpg 1080w\" sizes=\"(max-width: 598px) 100vw, 598px\" \/><\/a><p id=\"caption-attachment-2294\" class=\"wp-caption-text\">Supervisor \/ Worker (hierarchical)<\/p><\/div>\n<p>Pattern 4: Parallel with Aggregation<\/p>\n<div id=\"attachment_2296\" style=\"width: 691px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_Parallel_Aggregation-scaled.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-2296\" class=\" wp-image-2296\" src=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_Parallel_Aggregation-1024x269.jpg\" alt=\"Parallel with Aggregation\" width=\"681\" height=\"179\" srcset=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_Parallel_Aggregation-1024x269.jpg 1024w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_Parallel_Aggregation-300x79.jpg 300w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_Parallel_Aggregation-768x202.jpg 768w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_Parallel_Aggregation-1536x403.jpg 1536w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_Parallel_Aggregation-2048x538.jpg 2048w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_Parallel_Aggregation-1080x284.jpg 1080w\" sizes=\"(max-width: 681px) 100vw, 681px\" \/><\/a><p id=\"caption-attachment-2296\" class=\"wp-caption-text\">Parallel with Aggregation<\/p><\/div>\n<p>The most important conceptual detail in NAT: <strong>Everything is a function.<\/strong> A tool is a function. An agent is a function. An entire workflow is a function. This makes NAT enormously composable. We already know this concept from a number of other agentic tools. With this architecture a workflow can use another workflow as a tool, without there being any architectural difference. That is extremely flexible and powerful.<\/p>\n<h3 class=\"text-text-100 mt-3 -mb-1 text-[1.125rem] font-bold\"><span class=\"ez-toc-section\" id=\"The_ReAct_loop_in_detail\"><\/span>The ReAct loop in detail<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p class=\"font-claude-response-body break-words whitespace-normal leading-[1.7]\">Before we build multi-agent setups, we need to understand what happens within a single loop. When you call <code class=\"bg-text-200\/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]\">nat run --config_file ollama_agent.yml --input \"...\"<\/code>, the following happens:<\/p>\n<p>&nbsp;<\/p>\n<ul>\n<li>Prompt composition<br \/>\nThe <code>system_prompt<\/code> is filled in with <code>{tools}<\/code> and <code>{tool_names}<\/code>.<br \/>\nThen sent to the LLM: &#8220;Here are your tools, here is the question.&#8221;<\/li>\n<li>Iteration 1<br \/>\n1. The LLM generates: &#8220;Thought: &#8230; Action: <code>tool_X<\/code> Action Input: {&#8230;}&#8221;<br \/>\n2. NAT parses the format, extracts <code>tool_X<\/code> and its inputs<br \/>\n3. NAT calls <code>tool_X(...)<\/code> \u2014 either a Python function call,<br \/>\n4. an HTTP request, or a database query<br \/>\n5. The result is appended to the context as &#8220;Observation: &#8230;&#8221;<\/li>\n<li>Iteration 2 (if needed)<br \/>\n1. The LLM receives the extended context (Thought+Action+Observation)<br \/>\n2. Decides: another tool or the final answer?<br \/>\n3. If a tool: continue as above. If done: &#8220;Final Answer: &#8230;&#8221;<\/li>\n<li>Output<br \/>\nNAT extracts &#8220;Final Answer:&#8221; and returns it.<\/li>\n<\/ul>\n<p>The critical point: the LLM behind the process makes its decision <strong>solely on the basis of the tool descriptions<\/strong>. If <code class=\"bg-text-200\/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]\">wikipedia_search<\/code> is described as &#8220;Search Wikipedia for facts&#8221; and <code class=\"bg-text-200\/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]\">current_datetime<\/code> as &#8220;Returns the current date and time&#8221;, then the LLM learns from these descriptions when to use which tool. That is why it is very important that the tools are described clearly and properly. A duplication of tools with descriptions that are not clearly distinguishable should be avoided when the affected tools return fundamentally different results.<\/p>\n<h4 class=\"text-text-100 mt-2 -mb-1 text-base font-bold\"><span class=\"ez-toc-section\" id=\"Experiment_1_Tool_description_influences_tool_selection\"><\/span>Experiment 1: Tool description influences tool selection<span class=\"ez-toc-section-end\"><\/span><\/h4>\n<p class=\"font-claude-response-body break-words whitespace-normal leading-[1.7]\">I&#8217;ll now assume that you have a working NeMo Agent Toolkit setup available. Let&#8217;s take a close look at the process in practice. Create the following workflow:<\/p>\n<p>You are in the active virtual environment of your NAT setup. Now run the following two commands:<\/p>\n<p><span class=\"token token\"><strong>Command:<\/strong> <code>cd<\/code><\/span><code> ~\/nat-playground\/configs<\/code><\/p>\n<p>Now you create the following <code>experiment1_tool_descriptions.yml<\/code> workflow.<\/p>\n<p><span class=\"token token\"><strong>Command:<\/strong> <code>nano<\/code><\/span><code> experiment1_tool_descriptions.yml<\/code><\/p>\n<p>Since there would be far too much of what I&#8217;ll call &#8220;coding&#8221; here, the workflow definition is available in my GitHub repository that goes with this project.<\/p>\n<p>GitHub repository: <a href=\"https:\/\/github.com\/custom-build-robots\/nemo-agent-toolkit-examples\/blob\/main\/configs\/experiment1_tool_descriptions.yml\" target=\"_blank\" rel=\"noopener\">https:\/\/github.com\/custom-build-robots\/configs\/experiment1_tool_descriptions.yml<\/a><\/p>\n<p>Now that you have copied the content into the workflow and saved it with CTRL + X followed by a Y. Ollama is running as the inference server, and now you run the workflow three times as follows.<\/p>\n<ul>\n<li>Question 1: Time-related \u2192 should choose current_datetime\n<ul>\n<li><span class=\"token token\"><strong>Command: <\/strong><\/span><code>nat run --config_file experiment1_tool_descriptions.yml <\/code><code>--input \"What time is it?\"<\/code><\/li>\n<\/ul>\n<\/li>\n<li>Question 2: Knowledge question \u2192 should choose wikipedia_search\n<ul>\n<li><span class=\"token token\"><strong>Command: <\/strong><\/span><code>nat run --config_file experiment1_tool_descriptions.yml <\/code><code>--input \"What was the Battle of the Teutoburg Forest?\"<\/code><\/li>\n<\/ul>\n<\/li>\n<li>Question 3: Combined \u2192 should call BOTH one after the other\n<ul>\n<li><span class=\"token token\"><strong>Command: <\/strong><\/span><code>nat run --config_file experiment1_tool_descriptions.yml<\/code><code> --input \"What day is it today and which historical event happened on 13 March 1986?\"<\/code><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p class=\"font-claude-response-body break-words whitespace-normal leading-[1.7]\">Take a look at the traces. For question 3 you will probably observe that the agent does not use the date tool for &#8220;13 March 1986&#8221; (the date is, after all, given in the question), but only for &#8220;today&#8221;. This is exactly the point: the model understands from the <strong>tool name and description<\/strong> what is useful when.<\/p>\n<p class=\"font-claude-response-body break-words whitespace-normal leading-[1.7]\"><strong>Takeaway:<\/strong> Tool descriptions are your most important lever. When your agent chooses the wrong tool, it usually isn&#8217;t that the model is &#8220;dumb&#8221; but that the description is unclear.<\/p>\n<h3 class=\"text-text-100 mt-3 -mb-1 text-[1.125rem] font-bold\"><span class=\"ez-toc-section\" id=\"Hands-on_Writing_your_own_Python_tool\"><\/span>Hands-on: Writing your own Python tool<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p class=\"font-claude-response-body break-words whitespace-normal leading-[1.7]\">Now we finally get to the exciting part. Let&#8217;s build a <strong>GPU status tool<\/strong> that queries <code class=\"bg-text-200\/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]\">nvidia-smi<\/code> on your server and returns the values. This lets you ask your agent: &#8220;How busy is my GPU right now?&#8221; and it gets a concrete answer from nvidia-smi about your hardware.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Step_1_Writing_the_Python_tool\"><\/span>Step 1: Writing the Python tool<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Now we are going to leave the config folder of our nat-playground and move into the tools folder. There we place our new tool that we want to build. To do so, please run the following commands.<\/p>\n<p><span class=\"token token\"><strong>Command:<\/strong> <code>cd<\/code><\/span><code> ~\/nat-playground\/tools<\/code><\/p>\n<p><span class=\"token token\"><strong>Command:<\/strong> <code>mkdir<\/code><\/span><code> -p gpu_status<\/code><\/p>\n<p><span class=\"token token\"><strong>Command:<\/strong> <code>cd<\/code><\/span><code> gpu_status<\/code><\/p>\n<p><span class=\"token token\"><strong>Command:<\/strong> <code>nano<\/code><\/span><code> gpu_status_tool.py<\/code><\/p>\n<p>Now you have to download the following tool definition, which I provide here as a Python program named <code>gpu_status_tool.py<\/code>, from GitHub.<\/p>\n<p>GitHub repository: <a href=\"https:\/\/github.com\/custom-build-robots\/gpu_status\/gpu_status_tool.py\" target=\"_blank\" rel=\"noopener\">https:\/\/github.com\/custom-build-robots\/gpu_status\/gpu_status_tool.py<\/a><\/p>\n<p>Now paste the content, i.e. the Python code, into the file so that you end up with the tool stored in the folder <code>~\/nat-playground\/tools\/gpu_status<\/code>. Save with Ctrl + X followed by a Y.<\/p>\n<h4 class=\"text-text-100 mt-2 -mb-1 text-base font-bold\"><span class=\"ez-toc-section\" id=\"Step_2_Registering_the_tool_as_a_package\"><\/span>Step 2: Registering the tool as a package<span class=\"ez-toc-section-end\"><\/span><\/h4>\n<p class=\"font-claude-response-body break-words whitespace-normal leading-[1.7]\">NAT recognizes custom tools via Python entry points. We need a small <code class=\"bg-text-200\/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]\">pyproject.toml<\/code>:<\/p>\n<p><span class=\"token token\"><strong>Command:<\/strong> <code>~\/nat-playground\/tools\/gpu_status<\/code><\/span><\/p>\n<p><span class=\"token token\"><strong>Command:<\/strong> <code>nano pyproject.toml<\/code><\/span><\/p>\n<p>Again, the same procedure. The content of the <span class=\"token token\"><code>pyproject.toml<\/code><\/span> is available here on GitHub.<\/p>\n<p>GitHub repository: <a href=\"https:\/\/github.com\/custom-build-robots\/nemo-agent-toolkit-examples\/blob\/main\/tools\/gpu_status\/pyproject.toml\" target=\"_blank\" rel=\"noopener\">https:\/\/github.com\/custom-build-robots\/gpu_status\/pyproject.toml<\/a><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Step_3_Installing_the_tool_in_the_active_venv\"><\/span>Step 3: Installing the tool in the active venv<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Now please switch to the nat-playground folder to register the tool.<\/p>\n<p><span class=\"token token\"><strong>Command:<\/strong> <code>cd ~\/nat-playground<\/code><\/span><\/p>\n<p>If the virtual environment is not active, please activate it.<\/p>\n<p><span class=\"token token\"><strong>Command:<\/strong> <\/span><code>source .venv\/bin\/activate<\/code><\/p>\n<p>With the following command the gpu_status tool is installed.<\/p>\n<p><span class=\"token token\"><strong>Command:<\/strong> <\/span><code>uv pip install -e tools\/gpu_status<\/code><\/p>\n<p>The <code class=\"bg-text-200\/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]\">-e<\/code> flag means &#8220;editable install&#8221;, which is super handy because when you change something in the Python code, you don&#8217;t have to reinstall the tool.<\/p>\n<p>For me, the output in the terminal window then looked like this.<\/p>\n<div id=\"attachment_2299\" style=\"width: 759px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_GPU_tool.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-2299\" class=\"size-full wp-image-2299\" src=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_GPU_tool.jpg\" alt=\"NAT GPU NVIDIA-SMI - tool\" width=\"749\" height=\"210\" srcset=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_GPU_tool.jpg 749w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_GPU_tool-300x84.jpg 300w\" sizes=\"(max-width: 749px) 100vw, 749px\" \/><\/a><p id=\"caption-attachment-2299\" class=\"wp-caption-text\">NAT GPU NVIDIA-SMI &#8211; tool<\/p><\/div>\n<h3><span class=\"ez-toc-section\" id=\"Step_4_Verifying_that_NAT_sees_the_tool\"><\/span>Step 4: Verifying that NAT sees the tool<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Now comes the exciting part: does our NAT setup know about the new tool? To find out, run the following command.<\/p>\n<p><strong>Command:<\/strong> <code>nat info components -t <span class=\"token token\">function<\/span> <span class=\"token token\">|<\/span> <span class=\"token token\">grep<\/span> -i gpu<\/code><\/p>\n<p>For me the output looked like the image shown below.<\/p>\n<div id=\"attachment_2301\" style=\"width: 1034px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_GPU_tool_installed-1024x158.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-2301\" class=\"size-large wp-image-2301\" src=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_GPU_tool_installed-1024x158.jpg\" alt=\"NAT GPU NVIDIA-SMI - tool installed\" width=\"1024\" height=\"158\" srcset=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_GPU_tool_installed-1024x158.jpg 1024w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_GPU_tool_installed-300x46.jpg 300w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_GPU_tool_installed-768x118.jpg 768w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_GPU_tool_installed.jpg 1053w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/a><p id=\"caption-attachment-2301\" class=\"wp-caption-text\">NAT GPU NVIDIA-SMI &#8211; tool installed<\/p><\/div>\n<h3><span class=\"ez-toc-section\" id=\"Step_5_Building_a_workflow_with_the_new_tool\"><\/span>Step 5: Building a workflow with the new tool<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Now, to build the new workflow, we have to go back into the configs folder.<\/p>\n<p><strong>Command:<\/strong> <code><span class=\"token token\">cd<\/span> ~\/nat-playground\/configs<\/code><\/p>\n<p>We create the new workflow with the following command.<\/p>\n<p><strong>Command:<\/strong> <code><span class=\"token token\">nano<\/span> experiment2_gpu_agent.yml<\/code><\/p>\n<p>You&#8217;ll find the workflow itself again in my GitHub repository. Paste the content into <code>experiment2_gpu_agent.yml<\/code> and then save the file.<\/p>\n<p>GitHub repository: <a href=\"https:\/\/github.com\/custom-build-robots\/nemo-agent-toolkit-examples\/blob\/main\/configs\/experiment2_gpu_agent.yml\" target=\"_blank\" rel=\"noopener\">https:\/\/github.com\/custom-build-robots\/configs\/experiment2_gpu_agent.yml<\/a><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Step_6_Running_the_first_hardware_agent\"><\/span>Step 6: Running the first hardware agent<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Now we run the workflow <code>experiment2_gpu_agent.yml<\/code>, which calls our <code>gpu_status<\/code> tool and will hopefully return the GPU utilization.<\/p>\n<p><strong>Command:<\/strong> <code>nat run --config_file experiment2_gpu_agent.yml --input \"How busy is my GPU right now and is inference work running?\"<\/code><\/p>\n<p>The answer I received was: &#8220;Yes, the GPU is currently heavily loaded with a utilization of 92%. Inference work is being carried out. Memory usage is only 15.1%, the temperature sensor reads 38\u00b0C and power consumption is 248.77W.&#8221;<\/p>\n<p>And here is the matching image:<\/p>\n<div id=\"attachment_2304\" style=\"width: 1034px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_GPU_tool_result-1024x557.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-2304\" class=\"size-large wp-image-2304\" src=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_GPU_tool_result-1024x557.jpg\" alt=\"NAT GPU NVIDIA-SMI - tool result\" width=\"1024\" height=\"557\" srcset=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_GPU_tool_result-1024x557.jpg 1024w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_GPU_tool_result-300x163.jpg 300w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_GPU_tool_result-768x418.jpg 768w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_GPU_tool_result-1536x835.jpg 1536w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_GPU_tool_result-2048x1114.jpg 2048w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_GPU_tool_result-1080x587.jpg 1080w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/a><p id=\"caption-attachment-2304\" class=\"wp-caption-text\">NAT GPU NVIDIA-SMI &#8211; tool result<\/p><\/div>\n<p>Perfect \u2014 now you have built your own first tool that reads system information directly from your inference machine.<\/p>\n<p>Congratulations, you have now created your first own tool.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Agent orchestration is conceptually exactly the leap that turns &#8220;LLM inference&#8221; into actual &#8220;intelligent applications&#8221;. I already have a working NAT setup with Ollama in place, as described in my blog post here &#8220;NeMo Agent Toolkit on the RTX A6000 Ada \u2013 from the inference layer to the orchestrator layer&#8220;. Now I want to describe [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":2294,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_et_pb_use_builder":"","_et_pb_old_content":"","_et_gb_content_width":"","footnotes":""},"categories":[162,50],"tags":[1242,1221,1244,1505,1492,1249,1503,1354,1498,1493,1245,1495,1224,1491,1220,1250,306,1499,1502,1504,1243,1222,1494,1176,1248,1500,1032,1501,1247,1497,1496,1246,1251],"class_list":["post-2288","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-large-language-models-en","category-top-story-en","tag-agent-orchestrierung","tag-agentic-ai","tag-custom-python-tool","tag-entry-points","tag-genai-agent-orchestration","tag-gpu-status","tag-gpu-status-tool","tag-inference-server","tag-llm-orchestration","tag-local-ai-agents","tag-multi-agent","tag-multi-agent-patterns","tag-nat","tag-nat-orchestration","tag-nemo-agent-toolkit","tag-nvidia-smi","tag-ollama-en","tag-on-premise-inference","tag-parallel-aggregation","tag-pyproject-toml","tag-react","tag-react-agent","tag-react-loop","tag-rtx-a6000-ada","tag-sequential-pipeline","tag-single-agent","tag-sovereign-ai","tag-supervisor-worker","tag-supervisor-pattern","tag-tool-description","tag-tool-selection","tag-tool-beschreibung","tag-workflow-yaml","et-has-post-format-content","et_post_format-et-post-format-standard"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.7 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>NeMo Agent Toolkit - GenAI Agent Orchestration Run Locally - Exploring the Future: Inside the AI Box<\/title>\n<meta name=\"description\" content=\"Local GenAI agent orchestration with the NeMo Agent Toolkit and Ollama: understand the ReAct loop, learn four multi-agent patterns and build a GPU status tool.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/ai-box.eu\/en\/large-language-models-en\/nemo-agent-toolkit-genai-agent-orchestration-run-locally\/2288\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"NeMo Agent Toolkit - GenAI Agent Orchestration Run Locally - Exploring the Future: Inside the AI Box\" \/>\n<meta property=\"og:description\" content=\"Local GenAI agent orchestration with the NeMo Agent Toolkit and Ollama: understand the ReAct loop, learn four multi-agent patterns and build a GPU status tool.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/ai-box.eu\/en\/large-language-models-en\/nemo-agent-toolkit-genai-agent-orchestration-run-locally\/2288\/\" \/>\n<meta property=\"og:site_name\" content=\"Exploring the Future: Inside the AI Box\" \/>\n<meta property=\"article:published_time\" content=\"2026-05-17T05:05:20+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-06-05T04:37:42+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_Supervisor_Worker.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"2484\" \/>\n\t<meta property=\"og:image:height\" content=\"1195\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Maker\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Ingmar_Stapel\" \/>\n<meta name=\"twitter:site\" content=\"@Ingmar_Stapel\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Maker\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"7 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/large-language-models-en\\\/nemo-agent-toolkit-genai-agent-orchestration-run-locally\\\/2288\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/large-language-models-en\\\/nemo-agent-toolkit-genai-agent-orchestration-run-locally\\\/2288\\\/\"},\"author\":{\"name\":\"Maker\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/#\\\/schema\\\/person\\\/cc91d08618b3feeef6926591b465eab1\"},\"headline\":\"NeMo Agent Toolkit &#8211; GenAI Agent Orchestration Run Locally\",\"datePublished\":\"2026-05-17T05:05:20+00:00\",\"dateModified\":\"2026-06-05T04:37:42+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/large-language-models-en\\\/nemo-agent-toolkit-genai-agent-orchestration-run-locally\\\/2288\\\/\"},\"wordCount\":1276,\"commentCount\":0,\"image\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/large-language-models-en\\\/nemo-agent-toolkit-genai-agent-orchestration-run-locally\\\/2288\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/ai-box.eu\\\/wp-content\\\/uploads\\\/2026\\\/05\\\/NAT_Supervisor_Worker.jpg\",\"keywords\":[\"Agent-Orchestrierung\",\"Agentic AI\",\"Custom Python Tool\",\"Entry-Points\",\"GenAI agent orchestration\",\"GPU-Status\",\"GPU-Status-Tool\",\"inference server\",\"LLM orchestration\",\"local AI agents\",\"Multi-Agent\",\"multi-agent patterns\",\"NAT\",\"NAT orchestration\",\"NeMo Agent Toolkit\",\"nvidia-smi\",\"Ollama\",\"on-premise inference\",\"Parallel Aggregation\",\"pyproject.toml\",\"ReAct\",\"ReAct Agent\",\"ReAct loop\",\"RTX A6000 Ada\",\"Sequential Pipeline\",\"Single Agent\",\"sovereign AI\",\"Supervisor Worker\",\"Supervisor-Pattern\",\"tool description\",\"tool selection\",\"Tool-Beschreibung\",\"Workflow YAML\"],\"articleSection\":[\"Large Language Models\",\"Top story\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/ai-box.eu\\\/en\\\/large-language-models-en\\\/nemo-agent-toolkit-genai-agent-orchestration-run-locally\\\/2288\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/large-language-models-en\\\/nemo-agent-toolkit-genai-agent-orchestration-run-locally\\\/2288\\\/\",\"url\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/large-language-models-en\\\/nemo-agent-toolkit-genai-agent-orchestration-run-locally\\\/2288\\\/\",\"name\":\"NeMo Agent Toolkit - GenAI Agent Orchestration Run Locally - Exploring the Future: Inside the AI Box\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/large-language-models-en\\\/nemo-agent-toolkit-genai-agent-orchestration-run-locally\\\/2288\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/large-language-models-en\\\/nemo-agent-toolkit-genai-agent-orchestration-run-locally\\\/2288\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/ai-box.eu\\\/wp-content\\\/uploads\\\/2026\\\/05\\\/NAT_Supervisor_Worker.jpg\",\"datePublished\":\"2026-05-17T05:05:20+00:00\",\"dateModified\":\"2026-06-05T04:37:42+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/#\\\/schema\\\/person\\\/cc91d08618b3feeef6926591b465eab1\"},\"description\":\"Local GenAI agent orchestration with the NeMo Agent Toolkit and Ollama: understand the ReAct loop, learn four multi-agent patterns and build a GPU status tool.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/large-language-models-en\\\/nemo-agent-toolkit-genai-agent-orchestration-run-locally\\\/2288\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/ai-box.eu\\\/en\\\/large-language-models-en\\\/nemo-agent-toolkit-genai-agent-orchestration-run-locally\\\/2288\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/large-language-models-en\\\/nemo-agent-toolkit-genai-agent-orchestration-run-locally\\\/2288\\\/#primaryimage\",\"url\":\"https:\\\/\\\/ai-box.eu\\\/wp-content\\\/uploads\\\/2026\\\/05\\\/NAT_Supervisor_Worker.jpg\",\"contentUrl\":\"https:\\\/\\\/ai-box.eu\\\/wp-content\\\/uploads\\\/2026\\\/05\\\/NAT_Supervisor_Worker.jpg\",\"width\":2484,\"height\":1195,\"caption\":\"Supervisor \\\/ Worker (hierarchisch)\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/large-language-models-en\\\/nemo-agent-toolkit-genai-agent-orchestration-run-locally\\\/2288\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Start\",\"item\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"NeMo Agent Toolkit &#8211; GenAI Agent Orchestration Run Locally\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/#website\",\"url\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/\",\"name\":\"Exploring the Future: Inside the AI Box\",\"description\":\"Inside the AI Box, we share our experiences and discoveries in the world of artificial intelligence.\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/#\\\/schema\\\/person\\\/cc91d08618b3feeef6926591b465eab1\",\"name\":\"Maker\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g\",\"caption\":\"Maker\"},\"description\":\"I live in Bavaria near Munich. In my head I always have many topics and try out especially in the field of Internet new media much in my spare time. I write on the blog because it makes me fun to report about the things that inspire me. I am happy about every comment, about suggestion and very about questions.\",\"sameAs\":[\"https:\\\/\\\/ai-box.eu\"],\"url\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/author\\\/ingmars\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"NeMo Agent Toolkit - GenAI Agent Orchestration Run Locally - Exploring the Future: Inside the AI Box","description":"Local GenAI agent orchestration with the NeMo Agent Toolkit and Ollama: understand the ReAct loop, learn four multi-agent patterns and build a GPU status tool.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/ai-box.eu\/en\/large-language-models-en\/nemo-agent-toolkit-genai-agent-orchestration-run-locally\/2288\/","og_locale":"en_US","og_type":"article","og_title":"NeMo Agent Toolkit - GenAI Agent Orchestration Run Locally - Exploring the Future: Inside the AI Box","og_description":"Local GenAI agent orchestration with the NeMo Agent Toolkit and Ollama: understand the ReAct loop, learn four multi-agent patterns and build a GPU status tool.","og_url":"https:\/\/ai-box.eu\/en\/large-language-models-en\/nemo-agent-toolkit-genai-agent-orchestration-run-locally\/2288\/","og_site_name":"Exploring the Future: Inside the AI Box","article_published_time":"2026-05-17T05:05:20+00:00","article_modified_time":"2026-06-05T04:37:42+00:00","og_image":[{"width":2484,"height":1195,"url":"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_Supervisor_Worker.jpg","type":"image\/jpeg"}],"author":"Maker","twitter_card":"summary_large_image","twitter_creator":"@Ingmar_Stapel","twitter_site":"@Ingmar_Stapel","twitter_misc":{"Written by":"Maker","Est. reading time":"7 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/ai-box.eu\/en\/large-language-models-en\/nemo-agent-toolkit-genai-agent-orchestration-run-locally\/2288\/#article","isPartOf":{"@id":"https:\/\/ai-box.eu\/en\/large-language-models-en\/nemo-agent-toolkit-genai-agent-orchestration-run-locally\/2288\/"},"author":{"name":"Maker","@id":"https:\/\/ai-box.eu\/en\/#\/schema\/person\/cc91d08618b3feeef6926591b465eab1"},"headline":"NeMo Agent Toolkit &#8211; GenAI Agent Orchestration Run Locally","datePublished":"2026-05-17T05:05:20+00:00","dateModified":"2026-06-05T04:37:42+00:00","mainEntityOfPage":{"@id":"https:\/\/ai-box.eu\/en\/large-language-models-en\/nemo-agent-toolkit-genai-agent-orchestration-run-locally\/2288\/"},"wordCount":1276,"commentCount":0,"image":{"@id":"https:\/\/ai-box.eu\/en\/large-language-models-en\/nemo-agent-toolkit-genai-agent-orchestration-run-locally\/2288\/#primaryimage"},"thumbnailUrl":"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_Supervisor_Worker.jpg","keywords":["Agent-Orchestrierung","Agentic AI","Custom Python Tool","Entry-Points","GenAI agent orchestration","GPU-Status","GPU-Status-Tool","inference server","LLM orchestration","local AI agents","Multi-Agent","multi-agent patterns","NAT","NAT orchestration","NeMo Agent Toolkit","nvidia-smi","Ollama","on-premise inference","Parallel Aggregation","pyproject.toml","ReAct","ReAct Agent","ReAct loop","RTX A6000 Ada","Sequential Pipeline","Single Agent","sovereign AI","Supervisor Worker","Supervisor-Pattern","tool description","tool selection","Tool-Beschreibung","Workflow YAML"],"articleSection":["Large Language Models","Top story"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/ai-box.eu\/en\/large-language-models-en\/nemo-agent-toolkit-genai-agent-orchestration-run-locally\/2288\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/ai-box.eu\/en\/large-language-models-en\/nemo-agent-toolkit-genai-agent-orchestration-run-locally\/2288\/","url":"https:\/\/ai-box.eu\/en\/large-language-models-en\/nemo-agent-toolkit-genai-agent-orchestration-run-locally\/2288\/","name":"NeMo Agent Toolkit - GenAI Agent Orchestration Run Locally - Exploring the Future: Inside the AI Box","isPartOf":{"@id":"https:\/\/ai-box.eu\/en\/#website"},"primaryImageOfPage":{"@id":"https:\/\/ai-box.eu\/en\/large-language-models-en\/nemo-agent-toolkit-genai-agent-orchestration-run-locally\/2288\/#primaryimage"},"image":{"@id":"https:\/\/ai-box.eu\/en\/large-language-models-en\/nemo-agent-toolkit-genai-agent-orchestration-run-locally\/2288\/#primaryimage"},"thumbnailUrl":"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_Supervisor_Worker.jpg","datePublished":"2026-05-17T05:05:20+00:00","dateModified":"2026-06-05T04:37:42+00:00","author":{"@id":"https:\/\/ai-box.eu\/en\/#\/schema\/person\/cc91d08618b3feeef6926591b465eab1"},"description":"Local GenAI agent orchestration with the NeMo Agent Toolkit and Ollama: understand the ReAct loop, learn four multi-agent patterns and build a GPU status tool.","breadcrumb":{"@id":"https:\/\/ai-box.eu\/en\/large-language-models-en\/nemo-agent-toolkit-genai-agent-orchestration-run-locally\/2288\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/ai-box.eu\/en\/large-language-models-en\/nemo-agent-toolkit-genai-agent-orchestration-run-locally\/2288\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/ai-box.eu\/en\/large-language-models-en\/nemo-agent-toolkit-genai-agent-orchestration-run-locally\/2288\/#primaryimage","url":"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_Supervisor_Worker.jpg","contentUrl":"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/NAT_Supervisor_Worker.jpg","width":2484,"height":1195,"caption":"Supervisor \/ Worker (hierarchisch)"},{"@type":"BreadcrumbList","@id":"https:\/\/ai-box.eu\/en\/large-language-models-en\/nemo-agent-toolkit-genai-agent-orchestration-run-locally\/2288\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Start","item":"https:\/\/ai-box.eu\/en\/"},{"@type":"ListItem","position":2,"name":"NeMo Agent Toolkit &#8211; GenAI Agent Orchestration Run Locally"}]},{"@type":"WebSite","@id":"https:\/\/ai-box.eu\/en\/#website","url":"https:\/\/ai-box.eu\/en\/","name":"Exploring the Future: Inside the AI Box","description":"Inside the AI Box, we share our experiences and discoveries in the world of artificial intelligence.","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/ai-box.eu\/en\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/ai-box.eu\/en\/#\/schema\/person\/cc91d08618b3feeef6926591b465eab1","name":"Maker","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g","caption":"Maker"},"description":"I live in Bavaria near Munich. In my head I always have many topics and try out especially in the field of Internet new media much in my spare time. I write on the blog because it makes me fun to report about the things that inspire me. I am happy about every comment, about suggestion and very about questions.","sameAs":["https:\/\/ai-box.eu"],"url":"https:\/\/ai-box.eu\/en\/author\/ingmars\/"}]}},"_links":{"self":[{"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/posts\/2288","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/comments?post=2288"}],"version-history":[{"count":8,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/posts\/2288\/revisions"}],"predecessor-version":[{"id":2474,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/posts\/2288\/revisions\/2474"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/media\/2294"}],"wp:attachment":[{"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/media?parent=2288"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/categories?post=2288"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/tags?post=2288"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}