{"id":2354,"date":"2026-05-23T10:08:42","date_gmt":"2026-05-23T10:08:42","guid":{"rendered":"https:\/\/ai-box.eu\/?p=2354"},"modified":"2026-05-24T03:51:00","modified_gmt":"2026-05-24T03:51:00","slug":"an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing","status":"publish","type":"post","link":"https:\/\/ai-box.eu\/en\/news\/an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing\/2354\/","title":{"rendered":"An MCP Server for Multi-GPU Monitoring \u2013 Step by Step with Python, pynvml and EMA Smoothing"},"content":{"rendered":"<p>In my last posts I built the <strong>inference layer<\/strong> (Ollama, TensorRT-LLM) and the <strong>orchestrator layer<\/strong> (NeMo Agent Toolkit) on my application server in combination with my AI server. What&#8217;s been missing is a meaningful bridge between these server layers and the world of embedded devices in my workshop. That was the moment I took a closer look at the <strong>Model Context Protocol (MCP)<\/strong>. My idea is that my first own MCP server should be something I can actually use: <strong>GPU monitoring<\/strong> for the two RTX A6000 cards in my inference server.<\/p>\n<p>Concretely: I want to know from completely different devices, for example my ESP-Claw with LED ring, how busy my GPUs are right now. No Grafana, no cloud service, but a lightweight, locally running MCP-standard server that outputs exactly the metrics I need. And in the end my ESP-Claw-based setup should be able to query it via MCP client. I&#8217;m curious whether it&#8217;ll work and what I&#8217;ll learn along the way.<\/p>\n<p>In this post I&#8217;ll show you how I built the MCP server. It&#8217;s written in <strong>Python<\/strong>, I&#8217;ll use <strong>pynvml<\/strong> for the GPU utilization and <strong>FastMCP<\/strong>, including <strong>multi-GPU support<\/strong>, <strong>EMA smoothing<\/strong> for a calmer display, and an optional setup as a <strong>systemd service<\/strong> so the MCP server always runs in the background.<\/p>\n<p>Once the small programs are finished I&#8217;ll link them inline to my GitHub repository.<\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_85 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/ai-box.eu\/en\/news\/an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing\/2354\/#Whats_this_actually_about\" >What&#8217;s this actually about?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/ai-box.eu\/en\/news\/an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing\/2354\/#Prerequisites\" >Prerequisites<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/ai-box.eu\/en\/news\/an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing\/2354\/#Step_1_Install_pynvml_and_test_GPU_detection\" >Step 1: Install pynvml and test GPU detection<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/ai-box.eu\/en\/news\/an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing\/2354\/#Step_2_Install_FastMCP\" >Step 2: Install FastMCP<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/ai-box.eu\/en\/news\/an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing\/2354\/#Step_3_Write_the_minimal_MCP_server\" >Step 3: Write the minimal MCP server<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/ai-box.eu\/en\/news\/an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing\/2354\/#Step_4_Add_multi-GPU_utilization_and_memory_usage\" >Step 4: Add multi-GPU utilization and memory usage<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/ai-box.eu\/en\/news\/an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing\/2354\/#Step_5_Integrate_EMA_smoothing\" >Step 5: Integrate EMA smoothing<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/ai-box.eu\/en\/news\/an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing\/2354\/#Step_6_Start_and_test_the_server\" >Step 6: Start and test the server<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/ai-box.eu\/en\/news\/an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing\/2354\/#Step_7_optional_Set_it_up_as_a_systemd_service\" >Step 7 (optional): Set it up as a systemd service<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/ai-box.eu\/en\/news\/an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing\/2354\/#Pitfalls_I_want_to_mention_here\" >Pitfalls I want to mention here<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-11\" href=\"https:\/\/ai-box.eu\/en\/news\/an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing\/2354\/#1_nvidia-ml-py_vs_pynvml_%E2%80%93_confusing_names\" >1. nvidia-ml-py vs. pynvml \u2013 confusing names<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-12\" href=\"https:\/\/ai-box.eu\/en\/news\/an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing\/2354\/#2_NVML_initialization_fails\" >2. NVML initialization fails<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-13\" href=\"https:\/\/ai-box.eu\/en\/news\/an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing\/2354\/#3_FastMCP_version_and_transport_names\" >3. FastMCP version and transport names<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-14\" href=\"https:\/\/ai-box.eu\/en\/news\/an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing\/2354\/#4_EMA_values_dont_update\" >4. EMA values don&#8217;t update<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-15\" href=\"https:\/\/ai-box.eu\/en\/news\/an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing\/2354\/#5_What_you_should_NOT_do\" >5. What you should NOT do<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-16\" href=\"https:\/\/ai-box.eu\/en\/news\/an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing\/2354\/#Whats_coming_next\" >What&#8217;s coming next?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-17\" href=\"https:\/\/ai-box.eu\/en\/news\/an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing\/2354\/#My_personal_takeaway\" >My personal takeaway<\/a><\/li><\/ul><\/nav><\/div>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Whats_this_actually_about\"><\/span>What&#8217;s this actually about?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Before we get started, a quick look at the architecture so it&#8217;s clear what I want to achieve:<\/p>\n<div id=\"attachment_2338\" style=\"width: 670px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/MCP_Server_GPU_load-1024x628.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-2338\" class=\"wp-image-2338 \" src=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/MCP_Server_GPU_load-1024x628.jpg\" alt=\"MCP Server - GPU load\" width=\"660\" height=\"405\" srcset=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/MCP_Server_GPU_load-1024x628.jpg 1024w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/MCP_Server_GPU_load-300x184.jpg 300w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/MCP_Server_GPU_load-768x471.jpg 768w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/MCP_Server_GPU_load-1536x942.jpg 1536w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/MCP_Server_GPU_load-2048x1256.jpg 2048w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/MCP_Server_GPU_load-1080x662.jpg 1080w\" sizes=\"(max-width: 660px) 100vw, 660px\" \/><\/a><p id=\"caption-attachment-2338\" class=\"wp-caption-text\">MCP Server &#8211; GPU load<\/p><\/div>\n<p>An <strong>MCP server<\/strong> exposes tools that an <strong>MCP client<\/strong> can call over a standard protocol. The nice thing: as long as the client speaks MCP \u2013 and many do by now \u2013 it doesn&#8217;t matter what&#8217;s on the other end. My server will therefore be useful not just for the ESP-Claw, but also for my <strong>Hermes Agent<\/strong>, the <strong>NeMo Agent Toolkit<\/strong>, any <strong>LangChain agent<\/strong> with MCP support, or my own Python scripts. All of these clients can check what kind of load is currently on the inference server.<\/p>\n<p><strong>pynvml<\/strong> is the Python binding to the <strong>NVIDIA Management Library (NVML)<\/strong> \u2013 exactly the library we need to read out the data. That way we get structured values directly from the inference server.<\/p>\n<p><strong>FastMCP<\/strong> is a lightweight Python library that implements the MCP protocol for us. NVIDIA itself uses <a href=\"https:\/\/github.com\/NVIDIA\/NeMo-Agent-Toolkit\/blob\/develop\/docs\/source\/run-workflows\/fastmcp-server.md\" target=\"_blank\" rel=\"noopener\">FastMCP in the NeMo Agent Toolkit<\/a> to publish workflows as MCP servers \u2013 we&#8217;ll use it here in the standalone variant.<\/p>\n<p><strong>EMA (Exponential Moving Average)<\/strong> is needed because raw GPU utilization is jumpy. During an inference request the value jumps from 0 to 100% within milliseconds and back. Whoever sends that unfiltered to an LED ring will see nervous flickering instead of a pleasant display. EMA smooths that out.<\/p>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Prerequisites\"><\/span>Prerequisites<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Before we get started, a couple of things that need to be in order:<\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Ubuntu 24.04 LTS<\/strong> (or a comparable Linux)<\/li>\n<li><strong>At least one NVIDIA GPU<\/strong> with current drivers (mine is a dual RTX A6000 Ada, but any CUDA-capable card works)<\/li>\n<li><strong>Python 3.11, 3.12 or 3.13<\/strong> on the host \u2013 Ubuntu 24.04 ships with 3.12<\/li>\n<li><strong><code>uv<\/code> as the package manager<\/strong> \u2013 if you remember my NAT post, we installed it there. If not: <code>curl -LsSf https:\/\/astral.sh\/uv\/install.sh | sh<\/code><\/li>\n<li><strong>A network setup<\/strong> in which your MCP client can reach the server (LAN is enough, no internet required)<\/li>\n<\/ul>\n<p>If you still need to prepare your server for AI inference in general, take a look at my foundation post first.<\/p>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Step_1_Install_pynvml_and_test_GPU_detection\"><\/span>Step 1: Install pynvml and test GPU detection<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>We start with the absolute basics: can Python even talk to the NVIDIA driver? For that we create a project directory, a dedicated Python environment, and install pynvml.<\/p>\n<p><strong>Command:<\/strong> <code>mkdir -p ~\/gpu-monitor-mcp &amp;&amp; cd ~\/gpu-monitor-mcp<\/code><\/p>\n<p><strong>Command:<\/strong> <code>uv venv --python 3.12 --seed .venv<\/code><\/p>\n<p><strong>Command:<\/strong> <code>source .venv\/bin\/activate<\/code><\/p>\n<p>There are two packages on PyPI with almost identical names and a confusing history. <strong><code class=\"bg-text-200\/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]\">nvidia-ml-py<\/code><\/strong> is the official package maintained by NVIDIA. That&#8217;s exactly what we want to install now. There&#8217;s also another package called <strong><code class=\"bg-text-200\/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]\">pynvml<\/code><\/strong>, which used to be an independent third-party library, is meanwhile maintained by the NVIDIA RAPIDS team and has been officially deprecated since version 13 (September 2025). It now only pulls in <code class=\"bg-text-200\/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]\">nvidia-ml-py<\/code> as a dependency, so technically it still works \u2013 but we shouldn&#8217;t use it anymore. Install <code class=\"bg-text-200\/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]\">nvidia-ml-py<\/code> directly.<\/p>\n<p><strong>Command:<\/strong> <code>uv pip install nvidia-ml-py<\/code><\/p>\n<p>Even so, the Python import is simply <code>pynvml<\/code> \u2013 the official package installs the module under that name. Confusing, but that&#8217;s how it grew historically.<\/p>\n<p>For testing, let&#8217;s create a small script. In the terminal, run the following command. I use nano:<\/p>\n<p><strong>Command:<\/strong> <code>nano test_gpus.py<\/code><\/p>\n<p>Now paste the following Python code into the still empty test_gpus.py file.<\/p>\n<pre class=\"wp-block-code\"><code>import pynvml\r\n\r\npynvml.nvmlInit()\r\ncount = pynvml.nvmlDeviceGetCount()\r\nprint(f\"GPUs found: {count}\\n\")\r\n\r\nfor i in range(count):\r\n    handle = pynvml.nvmlDeviceGetHandleByIndex(i)\r\n    name = pynvml.nvmlDeviceGetName(handle)\r\n    util = pynvml.nvmlDeviceGetUtilizationRates(handle)\r\n    mem = pynvml.nvmlDeviceGetMemoryInfo(handle)\r\n    temp = pynvml.nvmlDeviceGetTemperature(handle, pynvml.NVML_TEMPERATURE_GPU)\r\n    \r\n    print(f\"GPU {i}: {name}\")\r\n    print(f\"  Utilization: {util.gpu}%\")\r\n    print(f\"  VRAM: {mem.used \/ 1024**3:.1f} \/ {mem.total \/ 1024**3:.1f} GB\")\r\n    print(f\"  Temperature: {temp} \u00b0C\\n\")\r\n\r\npynvml.nvmlShutdown()<\/code><\/pre>\n<p>Save the file with <code>CTRL + X<\/code>, then <code>Y<\/code> and <code>ENTER<\/code>. Now we can run it with the following command:<\/p>\n<p><strong>Command:<\/strong> <code>python test_gpus.py<\/code><\/p>\n<p>On my machine, the little Python program shows the following. With this I have confirmation that I can access the data for both my GPUs:<\/p>\n<div id=\"attachment_2340\" style=\"width: 591px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/nvidia-ml-py_Server_GPU_load.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-2340\" class=\"size-full wp-image-2340\" src=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/nvidia-ml-py_Server_GPU_load.jpg\" alt=\"nvidia-ml-py - Server GPU load\" width=\"581\" height=\"290\" srcset=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/nvidia-ml-py_Server_GPU_load.jpg 581w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/nvidia-ml-py_Server_GPU_load-300x150.jpg 300w\" sizes=\"(max-width: 581px) 100vw, 581px\" \/><\/a><p id=\"caption-attachment-2340\" class=\"wp-caption-text\">nvidia-ml-py &#8211; Server GPU load<\/p><\/div>\n<p>If it works for you too, your drivers and pynvml are fine. If not, please check the pitfalls section at the end.<\/p>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Step_2_Install_FastMCP\"><\/span>Step 2: Install FastMCP<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Now we still need to install the fastmcp library into our virtual environment:<\/p>\n<p><strong>Command:<\/strong> <code>uv pip install fastmcp<\/code><\/p>\n<p>This pulls in about 40 to 50 packages and went incredibly fast. Among them are the HTTP stack, Pydantic, the MCP SDK and a bit more. With <code>uv<\/code> the whole thing is done within a few seconds.<\/p>\n<p>To verify everything worked, please run the following command once.<\/p>\n<p><strong>Command:<\/strong> <code>python -c \"from fastmcp import FastMCP; print('FastMCP ready')\"<\/code><\/p>\n<p>In my case the output was &#8220;FastMCP ready&#8221;. If no errors appear, we&#8217;re good to go and have the foundation to build the MCP server.<\/p>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Step_3_Write_the_minimal_MCP_server\"><\/span>Step 3: Write the minimal MCP server<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>We start with an absolutely minimal server that exposes only a single tool \u2013 namely the number of installed GPUs. Only once that runs will we extend the MCP server step by step.<\/p>\n<p><strong>Command:<\/strong> <code>nano gpu_monitor.py<\/code><\/p>\n<pre class=\"wp-block-code\"><code>\"\"\"\r\nGPU Monitor MCP Server \u2013 Step 1: Minimal test\r\n\"\"\"\r\nimport pynvml\r\nfrom fastmcp import FastMCP\r\n\r\n# Initialize NVML once at server start\r\npynvml.nvmlInit()\r\nGPU_COUNT = pynvml.nvmlDeviceGetCount()\r\n\r\n# Create the MCP server with a descriptive name\r\nmcp = FastMCP(name=\"GPU Monitor\")\r\n\r\n@mcp.tool()\r\ndef get_gpu_count() -&gt; int:\r\n    \"\"\"Returns the number of available GPUs.\"\"\"\r\n    return GPU_COUNT\r\n\r\nif __name__ == \"__main__\":\r\n    # Listen on all interfaces so clients on the LAN can connect.\r\n    # Port 8765 \u2013 you can also choose another one.\r\n    mcp.run(transport=\"sse\", host=\"0.0.0.0\", port=8765)<\/code><\/pre>\n<p>What exactly is happening in this small script that provides us with an MCP server?<\/p>\n<ul class=\"wp-block-list\">\n<li><code>pynvml.nvmlInit()<\/code> runs exactly once at server start \u2013 we don&#8217;t want to repeat the NVML initialization on every tool request, that would be wasteful.<\/li>\n<li><code>FastMCP(name=\"GPU Monitor\")<\/code> creates the server.<\/li>\n<li>The <code>@mcp.tool()<\/code> decorator marks a Python function as an MCP tool. FastMCP automatically uses the <strong>type hints<\/strong> (<code>-&gt; int<\/code>) and the <strong>docstring<\/strong> to tell clients what the tool does. This is exactly the description an LLM will later read to decide when to call the tool.<\/li>\n<li><code>mcp.run(transport=\"sse\", ...)<\/code> starts the server with <strong>Server-Sent Events (SSE)<\/strong> as the transport. That&#8217;s HTTP-based and works cleanly over the network, in contrast to the default transport <code>stdio<\/code>, which only works locally via pipes.<\/li>\n<\/ul>\n<p>If you haven&#8217;t saved the little Python script (our MCP server) yet, do it now. Save the file with <code>CTRL + X<\/code>, then <code>Y<\/code> and <code>ENTER<\/code>.<\/p>\n<p>With the following command we execute the Python script and start the FastMCP server.<\/p>\n<p><strong>Command:<\/strong> <code>python gpu_monitor.py<\/code><\/p>\n<p>Now the FastMCP server should start and your terminal window should show the following.<\/p>\n<div id=\"attachment_2342\" style=\"width: 1034px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/Fast-MCP_nvidia-ml-py_Server_GPU_load-1024x401.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-2342\" class=\"size-large wp-image-2342\" src=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/Fast-MCP_nvidia-ml-py_Server_GPU_load-1024x401.jpg\" alt=\"Fast-MCP Server - Server GPU load\" width=\"1024\" height=\"401\" srcset=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/Fast-MCP_nvidia-ml-py_Server_GPU_load-1024x401.jpg 1024w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/Fast-MCP_nvidia-ml-py_Server_GPU_load-300x117.jpg 300w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/Fast-MCP_nvidia-ml-py_Server_GPU_load-768x301.jpg 768w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/Fast-MCP_nvidia-ml-py_Server_GPU_load-1080x423.jpg 1080w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/Fast-MCP_nvidia-ml-py_Server_GPU_load.jpg 1149w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/a><p id=\"caption-attachment-2342\" class=\"wp-caption-text\">Fast-MCP Server &#8211; Server GPU load<\/p><\/div>\n<p>With this, your first own MCP server is running, and we can now extend it further.<\/p>\n<p>If you want to test the MCP server now, you can&#8217;t just open its URL in a browser. If you&#8217;re tempted to simply open <code class=\"bg-text-200\/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]\">http:\/\/192.168.2.57:8765\/<\/code> in your browser, you&#8217;ll only get a &#8216;Not Found&#8217; message. That&#8217;s not an error. MCP servers are not websites and don&#8217;t expose anything at the root URL. We need a proper MCP client for testing.<\/p>\n<p>So to see anything you need an MCP Inspector on your machine. Since I&#8217;m running Windows on the client side, I installed Anthropic&#8217;s MCP Inspector via PowerShell with the following command:<\/p>\n<p><strong>Command:<\/strong> <code>npx @modelcontextprotocol\/inspector<\/code><\/p>\n<p>After installation \u2013 assuming you didn&#8217;t have it installed yet \u2013 a browser window opens. In the top left, make sure to switch the Transport Type to SSE (*1). Then enter the IP address of your FastMCP server with port and \/sse at the end (*2). In my case it&#8217;s:<\/p>\n<p><strong>URL:<\/strong> <code>http:\/\/192.168.2.57:8765\/sse<\/code><\/p>\n<p>In the following image you can see at (*3) that the connection worked and the name &#8220;GPU Monitor&#8221; of our MCP server is displayed.<\/p>\n<div id=\"attachment_2344\" style=\"width: 1034px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/MCP_inspector-scaled.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-2344\" class=\"size-large wp-image-2344\" src=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/MCP_inspector-1024x845.jpg\" alt=\"MCP-Inspector\" width=\"1024\" height=\"845\" srcset=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/MCP_inspector-1024x845.jpg 1024w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/MCP_inspector-300x247.jpg 300w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/MCP_inspector-768x633.jpg 768w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/MCP_inspector-1536x1267.jpg 1536w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/MCP_inspector-2048x1689.jpg 2048w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/MCP_inspector-1080x891.jpg 1080w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/a><p id=\"caption-attachment-2344\" class=\"wp-caption-text\">MCP-Inspector<\/p><\/div>\n<p>You can now stop the MCP server in the terminal with <code>CTRL + C<\/code>. Next, we&#8217;ll move on to the actual functionality and develop our MCP server into a real GPU monitor service.<\/p>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Step_4_Add_multi-GPU_utilization_and_memory_usage\"><\/span>Step 4: Add multi-GPU utilization and memory usage<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Now we&#8217;ll extend the MCP server, or rather our small script, with the actually interesting tools. We adapt the <code>gpu_monitor.py<\/code> Python file and replace the previous content with the following code that I&#8217;ve made available on GitHub, since it would be too long here and not pretty when embedded inline.<\/p>\n<p><strong>URL: <\/strong><a href=\"https:\/\/github.com\/custom-build-robots\/mcp-gpu-monitor\/blob\/main\/gpu_monitor.py\" target=\"_blank\" rel=\"noopener\">gpu_monitor.py<\/a><\/p>\n<p>Three important points about this code:<\/p>\n<ol class=\"wp-block-list\">\n<li><strong>Each tool has a <code>gpu_id<\/code> parameter with a default of <code>0<\/code>.<\/strong> This way, single-GPU and multi-GPU systems work transparently. With ID 0, a client that only knows one GPU can omit the parameter. Multi-GPU clients can explicitly query each individual GPU. I implemented it this way because I want my ESP-Claw to display both GPUs on a 24-LED ring.<\/li>\n<li><strong>Each tool returns a dictionary, not a tuple or a list.<\/strong> That makes the response structure self-explanatory. The LLM client will later see keys like <code>\"temperature_c\"<\/code>, not an anonymous number.<\/li>\n<li><strong>The docstrings are carefully written.<\/strong> FastMCP passes them along to the client, and an LLM agent decides based on this description when to invoke your tool. Better docstrings \u2192 better tool selection. I still need to gather more experience here on what kind of description works well with which model.<\/li>\n<\/ol>\n<p>Once you&#8217;ve restarted the small MCP server after the changes and refreshed the MCP Inspector, you should see the newly added tools. In my case it looks like this:<\/p>\n<div id=\"attachment_2348\" style=\"width: 1034px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/MCP_inspector_tools-scaled.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-2348\" class=\"size-large wp-image-2348\" src=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/MCP_inspector_tools-1024x725.jpg\" alt=\"MCP inspector tools overview\" width=\"1024\" height=\"725\" srcset=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/MCP_inspector_tools-1024x725.jpg 1024w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/MCP_inspector_tools-300x212.jpg 300w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/MCP_inspector_tools-768x543.jpg 768w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/MCP_inspector_tools-1536x1087.jpg 1536w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/MCP_inspector_tools-2048x1449.jpg 2048w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/MCP_inspector_tools-400x284.jpg 400w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/MCP_inspector_tools-1080x764.jpg 1080w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/a><p id=\"caption-attachment-2348\" class=\"wp-caption-text\">MCP inspector tools overview<\/p><\/div>\n<p>You can already list the number of GPUs if you want. In other words, invoke the tools of your MCP server.<\/p>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Step_5_Integrate_EMA_smoothing\"><\/span>Step 5: Integrate EMA smoothing<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Now comes the actual core: the <strong>Exponential Moving Average<\/strong>. The idea in one line:<\/p>\n<pre class=\"wp-block-code\"><code>new_smoothed_value = alpha \u00d7 new_raw_value + (1 - alpha) \u00d7 old_smoothed_value<\/code><\/pre>\n<p><code>alpha<\/code> is the <strong>smoothing factor<\/strong> between 0 and 1. A high value (e.g. 0.7) means the server reacts quickly to changes with little smoothing. A low value (e.g. 0.1) means heavy smoothing and slower reaction. For GPU displays on the LED ring I got very good results with <strong>alpha = 0.3<\/strong>. That&#8217;s fast enough to make peaks visible, calm enough not to flicker, and it also works reliably over a long WAN connection.<\/p>\n<p>We build the smoothing into a small class so it keeps its state across multiple tool calls. For this, we adapt the <code>gpu_monitor.py<\/code> Python file once more and replace the previous content with the following code, which I&#8217;ve made available on GitHub since it would be too long here and not pretty inline.<\/p>\n<p><strong>URL:\u00a0<\/strong><a href=\"https:\/\/github.com\/custom-build-robots\/mcp-gpu-monitor\/blob\/main\/gpu_monitor_EMA.py\" target=\"_blank\" rel=\"noopener\">gpu_monitor_EMA.py<\/a><\/p>\n<p>What&#8217;s changed compared to Step 4?<\/p>\n<ul class=\"wp-block-list\">\n<li>The <strong><code>EMASmoother<\/code> class<\/strong> is new. It keeps a separate smoothed value per key (e.g. <code>gpu0_compute<\/code>, <code>gpu1_compute<\/code>). That way the GPUs don&#8217;t get mixed up.<\/li>\n<li><code>get_gpu_utilization()<\/code> has a new parameter <code>smoothed=True<\/code>. The default is ON because that&#8217;s what you want in 95% of cases. If you need the raw value (for logging, for example), set <code>smoothed=False<\/code>.<\/li>\n<li>New is <code>get_all_gpus_summary()<\/code> \u2013 the idea of an <strong>aggregator tool<\/strong> that delivers the status of all GPUs in a single call. Exactly what an ESP32 client needs: one request every 1 to 2 seconds, all data in one shot. Saves network round trips.<\/li>\n<\/ul>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Step_6_Start_and_test_the_server\"><\/span>Step 6: Start and test the server<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Now start the server again with all of its updates.<\/p>\n<p><strong>Command:<\/strong> <code>python gpu_monitor_EMA.py<\/code><\/p>\n<p>You&#8217;ll see the startup message of your FastMCP server in the terminal. Now run the test to check everything still works as described above multiple times.<\/p>\n<p>I want my MCP server to always run when the server boots up, so I&#8217;m setting it up as a background job as described in the next section.<\/p>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Step_7_optional_Set_it_up_as_a_systemd_service\"><\/span>Step 7 (optional): Set it up as a systemd service<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>If the MCP server should run permanently \u2013 which it should for me, because my ESP32-Claw should be able to query it at any time \u2013 I set up a systemd service.<\/p>\n<p>Create the service file:<\/p>\n<p><strong>Command:<\/strong> <code>sudo nano \/etc\/systemd\/system\/gpu-monitor-mcp.service<\/code><\/p>\n<p>You&#8217;ll need to adjust the service file to match your directories and filenames. Read the description carefully and change the places that are different on your machine.<\/p>\n<pre class=\"wp-block-code\"><code>[Unit]\r\nDescription=GPU Monitor MCP Server\r\nAfter=network.target\r\n\r\n[Service]\r\nType=simple\r\nUser=ingmar\r\nWorkingDirectory=\/home\/ingmar\/gpu-monitor-mcp\r\nExecStart=\/home\/ingmar\/gpu-monitor-mcp\/.venv\/bin\/python \/home\/ingmar\/gpu-monitor-mcp\/gpu_monitor.py\r\nRestart=on-failure\r\nRestartSec=5\r\n\r\n[Install]\r\nWantedBy=multi-user.target<\/code><\/pre>\n<p>Adjust paths and username (<code>ingmar<\/code>) accordingly.<\/p>\n<p>Activate and start:<\/p>\n<p><strong>Command:<\/strong> <code>sudo systemctl daemon-reload<\/code><\/p>\n<p><strong>Command:<\/strong> <code>sudo systemctl enable --now gpu-monitor-mcp<\/code><\/p>\n<p><strong>Command:<\/strong> <code>sudo systemctl status gpu-monitor-mcp<\/code><\/p>\n<p>If everything is configured correctly, the server now also starts automatically when the server reboots. You can check the logs with <code>journalctl -u gpu-monitor-mcp -f<\/code> in case there are any issues.<\/p>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Pitfalls_I_want_to_mention_here\"><\/span>Pitfalls I want to mention here<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"1_nvidia-ml-py_vs_pynvml_%E2%80%93_confusing_names\"><\/span>1. <code>nvidia-ml-py<\/code> vs. <code>pynvml<\/code> \u2013 confusing names<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Both libraries exist on PyPI. <strong>The official package maintained by NVIDIA<\/strong> is called <strong><code>nvidia-ml-py<\/code><\/strong> but is imported as <code>pynvml<\/code>. There&#8217;s also an older package called <code>pynvml<\/code> from a third party that is no longer maintained, so we don&#8217;t want to install it. If something goes wrong: <code>uv pip uninstall pynvml nvidia-ml-py<\/code> and then install only <code>nvidia-ml-py<\/code> cleanly.<\/p>\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"2_NVML_initialization_fails\"><\/span>2. NVML initialization fails<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>If <code>pynvml.nvmlInit()<\/code> throws a <code>NVMLError_LibraryNotFound<\/code>, the driver is usually not installed correctly, or the driver version doesn&#8217;t match the CUDA version. Check whether your GPU(s) are visible with the following command:<\/p>\n<p><strong>Command:<\/strong> <code>nvidia-smi<\/code><\/p>\n<p>If that already doesn&#8217;t work, it&#8217;s a driver issue, not a pynvml one.<\/p>\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"3_FastMCP_version_and_transport_names\"><\/span>3. FastMCP version and transport names<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>FastMCP is currently evolving fast. The label <code>transport=\"sse\"<\/code> may be replaced by <code>transport=\"streamable-http\"<\/code> or similar names in future versions. If you get a <code>ValueError<\/code> about transport at startup, check the current FastMCP docs.<\/p>\n<p>URL: <a href=\"https:\/\/gofastmcp.com\/getting-started\/welcome\" target=\"_blank\" rel=\"noopener\">https:\/\/gofastmcp.com\/getting-started\/welcome<\/a><\/p>\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"4_EMA_values_dont_update\"><\/span>4. EMA values don&#8217;t update<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>On the first tool call the smoother has no previous value and simply takes the raw value as-is. That&#8217;s intentional and only becomes a nuisance if your first call happens to read an unrealistic value (e.g. because there&#8217;s no load at that moment). If you want to avoid that, you can preload the smoother with a dummy value (e.g. 0) at server start.<\/p>\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"5_What_you_should_NOT_do\"><\/span>5. What you should NOT do<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<ul class=\"wp-block-list\">\n<li><strong><code>pynvml.nvmlInit()<\/code> on every tool call<\/strong> \u2013 that costs unnecessary time and is not thread-safe.<\/li>\n<li><strong><code>pynvml.nvmlShutdown()<\/code> at the end of a tool call<\/strong> \u2013 that destroys the state between requests. The shutdown belongs in a server-shutdown handler, if used at all.<\/li>\n<li><strong>Exposing the server unprotected to the open internet.<\/strong> MCP has no built-in authentication. For LAN operation in your own workshop that&#8217;s fine, anything beyond that at minimum behind Tailscale, WireGuard, or a reverse proxy with token auth.<\/li>\n<\/ul>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Whats_coming_next\"><\/span>What&#8217;s coming next?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>With this server you have the data provider. What&#8217;s still missing is the <strong>consumer<\/strong>. In the next posts I&#8217;ll build that out in two directions:<\/p>\n<ul class=\"wp-block-list\">\n<li><strong>NeMo Agent Toolkit as MCP client:<\/strong> I&#8217;ll connect the server to my NAT workflow from the last post. That way my agent can include real hardware values in its responses \u2013 for example when I ask <em>&#8220;Are my GPUs currently free for fine-tuning?&#8221;<\/em>, the agent calls the tool, sees 5% load, and clearly says yes.<\/li>\n<li><strong>ESP-Claw as MCP client with LED ring:<\/strong> That&#8217;s the actual goal of this series. My ESP32-P4 with the ESP-Claw framework has a built-in MCP client capability. A custom skill calls <code>get_all_gpus_summary<\/code> every two seconds and visualizes compute utilization as a filled ring, temperature as a color gradient, and power draw as brightness.<\/li>\n<\/ul>\n<p>Both are independent and exciting building blocks that I&#8217;m planning to build piece by piece in the next posts.<\/p>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"My_personal_takeaway\"><\/span>My personal takeaway<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>What surprised me most about this small project: <strong>how little code it actually takes to build a meaningful infrastructure bridge.<\/strong> About 100 lines of Python, a very focused library (FastMCP), and a very old, very stable NVIDIA API (NVML) \u2013 and I already have a data source that any MCP-compatible tool can tap into. From a small ESP32 microcontroller to Claude Desktop on my laptop. <strong>No cloud, no SaaS subscription, no third-party API key.<\/strong><\/p>\n<p>Exactly this constellation \u2013 <strong>standard protocol plus own hardware plus minimal software<\/strong> \u2013 is for me the practical core of what I mean by &#8220;sovereign AI&#8221;. It&#8217;s not about reinventing every piece yourself. It&#8217;s about knowing the building blocks, being able to combine them, and not taking on dependencies you don&#8217;t need.<\/p>\n<p>My MCP server for the GPUs is small. But it&#8217;s mine. And that makes the difference.<\/p>\n<p>Good luck building your own!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In my last posts I built the inference layer (Ollama, TensorRT-LLM) and the orchestrator layer (NeMo Agent Toolkit) on my application server in combination with my AI server. What&#8217;s been missing is a meaningful bridge between these server layers and the world of embedded devices in my workshop. That was the moment I took a [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":2343,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_et_pb_use_builder":"","_et_pb_old_content":"","_et_gb_content_width":"","footnotes":""},"categories":[162,8,50],"tags":[640,1305,1036,1314,1308,1306,1317,1046,1307,1312,1220,1311,1310,1309,1313,315,1316,1032,1315],"class_list":["post-2354","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-large-language-models-en","category-news","category-top-story-en","tag-ai-box","tag-ema-smoothing","tag-esp-claw","tag-exponential-moving-average","tag-fastmcp","tag-local-ai-infrastructure","tag-mcp-inspector","tag-mcp-server","tag-model-context-protocol","tag-multi-gpu","tag-nemo-agent-toolkit","tag-nvidia-gpu-monitoring","tag-nvidia-ml-py","tag-pynvml","tag-python-tutorial","tag-rtx-a6000-en","tag-server-sent-events","tag-sovereign-ai","tag-systemd-service","et-has-post-format-content","et_post_format-et-post-format-standard"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v28.1 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>An MCP Server for Multi-GPU Monitoring \u2013 Step by Step with Python, pynvml and EMA Smoothing - Exploring the Future: Inside the AI Box<\/title>\n<meta name=\"description\" content=\"Step-by-step guide for building your own MCP server for NVIDIA GPU monitoring with Python and FastMCP \u2013 including multi-GPU, EMA smoothing and systemd service.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/ai-box.eu\/en\/news\/an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing\/2354\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"An MCP Server for Multi-GPU Monitoring \u2013 Step by Step with Python, pynvml and EMA Smoothing - Exploring the Future: Inside the AI Box\" \/>\n<meta property=\"og:description\" content=\"Step-by-step guide for building your own MCP server for NVIDIA GPU monitoring with Python and FastMCP \u2013 including multi-GPU, EMA smoothing and systemd service.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/ai-box.eu\/en\/news\/an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing\/2354\/\" \/>\n<meta property=\"og:site_name\" content=\"Exploring the Future: Inside the AI Box\" \/>\n<meta property=\"article:published_time\" content=\"2026-05-23T10:08:42+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-05-24T03:51:00+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/Fast-MCP_nvidia-ml-py_Server_GPU_load.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1149\" \/>\n\t<meta property=\"og:image:height\" content=\"450\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Maker\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Ingmar_Stapel\" \/>\n<meta name=\"twitter:site\" content=\"@Ingmar_Stapel\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Maker\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"15 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing\\\/2354\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing\\\/2354\\\/\"},\"author\":{\"name\":\"Maker\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/#\\\/schema\\\/person\\\/cc91d08618b3feeef6926591b465eab1\"},\"headline\":\"An MCP Server for Multi-GPU Monitoring \u2013 Step by Step with Python, pynvml and EMA Smoothing\",\"datePublished\":\"2026-05-23T10:08:42+00:00\",\"dateModified\":\"2026-05-24T03:51:00+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing\\\/2354\\\/\"},\"wordCount\":2673,\"commentCount\":0,\"image\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing\\\/2354\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/ai-box.eu\\\/wp-content\\\/uploads\\\/2026\\\/05\\\/Fast-MCP_nvidia-ml-py_Server_GPU_load.jpg\",\"keywords\":[\"AI-Box\",\"EMA smoothing\",\"ESP-Claw\",\"Exponential Moving Average\",\"FastMCP\",\"local AI infrastructure\",\"MCP Inspector\",\"MCP-Server\",\"Model Context Protocol\",\"Multi-GPU\",\"NeMo Agent Toolkit\",\"NVIDIA GPU Monitoring\",\"nvidia-ml-py\",\"pynvml\",\"Python Tutorial\",\"RTX A6000\",\"Server-Sent Events\",\"sovereign AI\",\"systemd Service\"],\"articleSection\":[\"Large Language Models\",\"News\",\"Top story\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing\\\/2354\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing\\\/2354\\\/\",\"url\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing\\\/2354\\\/\",\"name\":\"An MCP Server for Multi-GPU Monitoring \u2013 Step by Step with Python, pynvml and EMA Smoothing - Exploring the Future: Inside the AI Box\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing\\\/2354\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing\\\/2354\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/ai-box.eu\\\/wp-content\\\/uploads\\\/2026\\\/05\\\/Fast-MCP_nvidia-ml-py_Server_GPU_load.jpg\",\"datePublished\":\"2026-05-23T10:08:42+00:00\",\"dateModified\":\"2026-05-24T03:51:00+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/#\\\/schema\\\/person\\\/cc91d08618b3feeef6926591b465eab1\"},\"description\":\"Step-by-step guide for building your own MCP server for NVIDIA GPU monitoring with Python and FastMCP \u2013 including multi-GPU, EMA smoothing and systemd service.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing\\\/2354\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing\\\/2354\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing\\\/2354\\\/#primaryimage\",\"url\":\"https:\\\/\\\/ai-box.eu\\\/wp-content\\\/uploads\\\/2026\\\/05\\\/Fast-MCP_nvidia-ml-py_Server_GPU_load.jpg\",\"contentUrl\":\"https:\\\/\\\/ai-box.eu\\\/wp-content\\\/uploads\\\/2026\\\/05\\\/Fast-MCP_nvidia-ml-py_Server_GPU_load.jpg\",\"width\":1149,\"height\":450,\"caption\":\"Fast-MCP Server - Server GPU load\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing\\\/2354\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Start\",\"item\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"An MCP Server for Multi-GPU Monitoring \u2013 Step by Step with Python, pynvml and EMA Smoothing\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/#website\",\"url\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/\",\"name\":\"Exploring the Future: Inside the AI Box\",\"description\":\"Inside the AI Box, we share our experiences and discoveries in the world of artificial intelligence.\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/#\\\/schema\\\/person\\\/cc91d08618b3feeef6926591b465eab1\",\"name\":\"Maker\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g\",\"caption\":\"Maker\"},\"description\":\"I live in Bavaria near Munich. In my head I always have many topics and try out especially in the field of Internet new media much in my spare time. I write on the blog because it makes me fun to report about the things that inspire me. I am happy about every comment, about suggestion and very about questions.\",\"sameAs\":[\"https:\\\/\\\/ai-box.eu\"],\"url\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/author\\\/ingmars\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"An MCP Server for Multi-GPU Monitoring \u2013 Step by Step with Python, pynvml and EMA Smoothing - Exploring the Future: Inside the AI Box","description":"Step-by-step guide for building your own MCP server for NVIDIA GPU monitoring with Python and FastMCP \u2013 including multi-GPU, EMA smoothing and systemd service.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/ai-box.eu\/en\/news\/an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing\/2354\/","og_locale":"en_US","og_type":"article","og_title":"An MCP Server for Multi-GPU Monitoring \u2013 Step by Step with Python, pynvml and EMA Smoothing - Exploring the Future: Inside the AI Box","og_description":"Step-by-step guide for building your own MCP server for NVIDIA GPU monitoring with Python and FastMCP \u2013 including multi-GPU, EMA smoothing and systemd service.","og_url":"https:\/\/ai-box.eu\/en\/news\/an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing\/2354\/","og_site_name":"Exploring the Future: Inside the AI Box","article_published_time":"2026-05-23T10:08:42+00:00","article_modified_time":"2026-05-24T03:51:00+00:00","og_image":[{"width":1149,"height":450,"url":"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/Fast-MCP_nvidia-ml-py_Server_GPU_load.jpg","type":"image\/jpeg"}],"author":"Maker","twitter_card":"summary_large_image","twitter_creator":"@Ingmar_Stapel","twitter_site":"@Ingmar_Stapel","twitter_misc":{"Written by":"Maker","Est. reading time":"15 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/ai-box.eu\/en\/news\/an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing\/2354\/#article","isPartOf":{"@id":"https:\/\/ai-box.eu\/en\/news\/an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing\/2354\/"},"author":{"name":"Maker","@id":"https:\/\/ai-box.eu\/en\/#\/schema\/person\/cc91d08618b3feeef6926591b465eab1"},"headline":"An MCP Server for Multi-GPU Monitoring \u2013 Step by Step with Python, pynvml and EMA Smoothing","datePublished":"2026-05-23T10:08:42+00:00","dateModified":"2026-05-24T03:51:00+00:00","mainEntityOfPage":{"@id":"https:\/\/ai-box.eu\/en\/news\/an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing\/2354\/"},"wordCount":2673,"commentCount":0,"image":{"@id":"https:\/\/ai-box.eu\/en\/news\/an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing\/2354\/#primaryimage"},"thumbnailUrl":"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/Fast-MCP_nvidia-ml-py_Server_GPU_load.jpg","keywords":["AI-Box","EMA smoothing","ESP-Claw","Exponential Moving Average","FastMCP","local AI infrastructure","MCP Inspector","MCP-Server","Model Context Protocol","Multi-GPU","NeMo Agent Toolkit","NVIDIA GPU Monitoring","nvidia-ml-py","pynvml","Python Tutorial","RTX A6000","Server-Sent Events","sovereign AI","systemd Service"],"articleSection":["Large Language Models","News","Top story"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/ai-box.eu\/en\/news\/an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing\/2354\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/ai-box.eu\/en\/news\/an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing\/2354\/","url":"https:\/\/ai-box.eu\/en\/news\/an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing\/2354\/","name":"An MCP Server for Multi-GPU Monitoring \u2013 Step by Step with Python, pynvml and EMA Smoothing - Exploring the Future: Inside the AI Box","isPartOf":{"@id":"https:\/\/ai-box.eu\/en\/#website"},"primaryImageOfPage":{"@id":"https:\/\/ai-box.eu\/en\/news\/an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing\/2354\/#primaryimage"},"image":{"@id":"https:\/\/ai-box.eu\/en\/news\/an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing\/2354\/#primaryimage"},"thumbnailUrl":"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/Fast-MCP_nvidia-ml-py_Server_GPU_load.jpg","datePublished":"2026-05-23T10:08:42+00:00","dateModified":"2026-05-24T03:51:00+00:00","author":{"@id":"https:\/\/ai-box.eu\/en\/#\/schema\/person\/cc91d08618b3feeef6926591b465eab1"},"description":"Step-by-step guide for building your own MCP server for NVIDIA GPU monitoring with Python and FastMCP \u2013 including multi-GPU, EMA smoothing and systemd service.","breadcrumb":{"@id":"https:\/\/ai-box.eu\/en\/news\/an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing\/2354\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/ai-box.eu\/en\/news\/an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing\/2354\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/ai-box.eu\/en\/news\/an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing\/2354\/#primaryimage","url":"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/Fast-MCP_nvidia-ml-py_Server_GPU_load.jpg","contentUrl":"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/Fast-MCP_nvidia-ml-py_Server_GPU_load.jpg","width":1149,"height":450,"caption":"Fast-MCP Server - Server GPU load"},{"@type":"BreadcrumbList","@id":"https:\/\/ai-box.eu\/en\/news\/an-mcp-server-for-multi-gpu-monitoring-step-by-step-with-python-pynvml-and-ema-smoothing\/2354\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Start","item":"https:\/\/ai-box.eu\/en\/"},{"@type":"ListItem","position":2,"name":"An MCP Server for Multi-GPU Monitoring \u2013 Step by Step with Python, pynvml and EMA Smoothing"}]},{"@type":"WebSite","@id":"https:\/\/ai-box.eu\/en\/#website","url":"https:\/\/ai-box.eu\/en\/","name":"Exploring the Future: Inside the AI Box","description":"Inside the AI Box, we share our experiences and discoveries in the world of artificial intelligence.","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/ai-box.eu\/en\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/ai-box.eu\/en\/#\/schema\/person\/cc91d08618b3feeef6926591b465eab1","name":"Maker","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g","caption":"Maker"},"description":"I live in Bavaria near Munich. In my head I always have many topics and try out especially in the field of Internet new media much in my spare time. I write on the blog because it makes me fun to report about the things that inspire me. I am happy about every comment, about suggestion and very about questions.","sameAs":["https:\/\/ai-box.eu"],"url":"https:\/\/ai-box.eu\/en\/author\/ingmars\/"}]}},"_links":{"self":[{"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/posts\/2354","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/comments?post=2354"}],"version-history":[{"count":2,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/posts\/2354\/revisions"}],"predecessor-version":[{"id":2358,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/posts\/2354\/revisions\/2358"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/media\/2343"}],"wp:attachment":[{"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/media?parent=2354"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/categories?post=2354"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/tags?post=2354"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}