{"id":2268,"date":"2026-05-16T10:23:22","date_gmt":"2026-05-16T10:23:22","guid":{"rendered":"https:\/\/ai-box.eu\/?p=2268"},"modified":"2026-05-16T11:31:06","modified_gmt":"2026-05-16T11:31:06","slug":"preparing-an-ubuntu-24-04-server-for-ai-inference-cuda-docker-nvidia-container-toolkit","status":"publish","type":"post","link":"https:\/\/ai-box.eu\/en\/top-story-en\/preparing-an-ubuntu-24-04-server-for-ai-inference-cuda-docker-nvidia-container-toolkit\/2268\/","title":{"rendered":"Preparing an Ubuntu 24.04 Server for AI Inference: CUDA, Docker, NVIDIA Container Toolkit"},"content":{"rendered":"<p>Whether I later want to run <strong>TensorRT-LLM<\/strong>, <strong>Ollama<\/strong>, <strong>vLLM<\/strong>, or any other container-based inference framework on my server, the <strong>base installation<\/strong> is always the same: an up-to-date Ubuntu, the matching NVIDIA driver, Docker, and the NVIDIA Container Toolkit so that containers can even access the GPU in the first place.<\/p>\n<p>In this post I&#8217;ll show you my own setup script <code>server_setup.sh<\/code>, which handles this base installation on a fresh Ubuntu 24.04 in one go. The script is the foundation I link back to in all my other inference posts \u2014 work through it once here, and you&#8217;ll have the platform ready for any further AI project.<\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_83 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/ai-box.eu\/en\/top-story-en\/preparing-an-ubuntu-24-04-server-for-ai-inference-cuda-docker-nvidia-container-toolkit\/2268\/#What_you_need\" >What you need<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/ai-box.eu\/en\/top-story-en\/preparing-an-ubuntu-24-04-server-for-ai-inference-cuda-docker-nvidia-container-toolkit\/2268\/#What_the_script_does\" >What the script does<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/ai-box.eu\/en\/top-story-en\/preparing-an-ubuntu-24-04-server-for-ai-inference-cuda-docker-nvidia-container-toolkit\/2268\/#Why_these_components\" >Why these components?<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/ai-box.eu\/en\/top-story-en\/preparing-an-ubuntu-24-04-server-for-ai-inference-cuda-docker-nvidia-container-toolkit\/2268\/#NVIDIA_CUDA_Toolkit_on_the_host_even_though_we_use_containers\" >NVIDIA CUDA Toolkit on the host, even though we use containers?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/ai-box.eu\/en\/top-story-en\/preparing-an-ubuntu-24-04-server-for-ai-inference-cuda-docker-nvidia-container-toolkit\/2268\/#Docker_from_the_official_repo_not_from_Ubuntu\" >Docker from the official repo, not from Ubuntu<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/ai-box.eu\/en\/top-story-en\/preparing-an-ubuntu-24-04-server-for-ai-inference-cuda-docker-nvidia-container-toolkit\/2268\/#NVIDIA_Container_Toolkit_%E2%80%94_the_most_important_component\" >NVIDIA Container Toolkit \u2014 the most important component<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/ai-box.eu\/en\/top-story-en\/preparing-an-ubuntu-24-04-server-for-ai-inference-cuda-docker-nvidia-container-toolkit\/2268\/#Getting_and_running_the_script\" >Getting and running the script<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/ai-box.eu\/en\/top-story-en\/preparing-an-ubuntu-24-04-server-for-ai-inference-cuda-docker-nvidia-container-toolkit\/2268\/#Reboot_and_verification\" >Reboot and verification<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/ai-box.eu\/en\/top-story-en\/preparing-an-ubuntu-24-04-server-for-ai-inference-cuda-docker-nvidia-container-toolkit\/2268\/#Typical_stumbling_blocks\" >Typical stumbling blocks<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/ai-box.eu\/en\/top-story-en\/preparing-an-ubuntu-24-04-server-for-ai-inference-cuda-docker-nvidia-container-toolkit\/2268\/#What_isnt_in_the_script_and_why\" >What isn&#8217;t in the script (and why)<\/a><\/li><\/ul><\/nav><\/div>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"What_you_need\"><\/span>What you need<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<ul class=\"wp-block-list\">\n<li>A server or workstation with an <strong>NVIDIA GPU<\/strong><\/li>\n<li>A freshly installed <strong>Ubuntu 24.04 LTS<\/strong> (server or desktop variant, doesn&#8217;t matter)<\/li>\n<li>An <strong>internet connection<\/strong> (the script pulls several GB of packages)<\/li>\n<li>At least <strong>30 GB of free storage<\/strong> for the base installation (for container images and models later, considerably more)<\/li>\n<li>A <strong>non-root user<\/strong> with <code>sudo<\/code> rights<\/li>\n<\/ul>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"What_the_script_does\"><\/span>What the script does<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>The script runs in six clearly separated steps:<\/p>\n<ol class=\"wp-block-list\">\n<li><strong>System update:<\/strong> bring all existing packages up to date<\/li>\n<li><strong>OpenSSH server:<\/strong> so you can later run the server headless<\/li>\n<li><strong>Base packages:<\/strong> curl, git, gnupg, Midnight Commander, and a few helpers<\/li>\n<li><strong>NVIDIA CUDA Toolkit 13.1 + driver:<\/strong> the GPU layer<\/li>\n<li><strong>Docker:<\/strong> container runtime from the official Docker repo<\/li>\n<li><strong>NVIDIA Container Toolkit:<\/strong> the bridge that lets Docker containers access the GPU<\/li>\n<\/ol>\n<p>At the end, a reboot is due \u2014 then the server is ready for any container-based inference stack.<\/p>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Why_these_components\"><\/span>Why these components?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"NVIDIA_CUDA_Toolkit_on_the_host_even_though_we_use_containers\"><\/span>NVIDIA CUDA Toolkit on the host, even though we use containers?<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>This sounds contradictory at first, because container-based inference frameworks bring their own CUDA version inside the image. Still, I install the CUDA Toolkit on the host. Two reasons:<\/p>\n<ul>\n<li><strong>First<\/strong>, the <code>cuda-drivers<\/code> package brings the matching NVIDIA driver along automatically. You absolutely need it on the host \u2014 the driver userspace (<code>libcuda.so<\/code>) is also used by containers, because GPUs cannot be containerized.<\/li>\n<li><strong>Second<\/strong>, it&#8217;s practical to have the CUDA Toolkit natively at hand \u2014 for occasional quick tests with <code>nvcc<\/code>, <code>nvidia-smi<\/code>, and similar tools, without spinning up an extra container.<\/li>\n<\/ul>\n<p>I picked the version <strong>CUDA 13.1<\/strong> because it was the most stable current version at the time of writing the script. If you run the script later, briefly check beforehand whether there&#8217;s a newer version \u2014 at NVIDIA this changes on a half-year rhythm.<\/p>\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Docker_from_the_official_repo_not_from_Ubuntu\"><\/span>Docker from the official repo, not from Ubuntu<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Ubuntu ships the package <code>docker.io<\/code> in its standard repos. It works in principle, but is usually significantly older than what Docker itself publishes. More importantly: the official packages bring <strong>Docker Compose<\/strong> as a plugin directly along, which isn&#8217;t the case with <code>docker.io<\/code>. You&#8217;ll need Compose practically right away as soon as you do multi-container setups or persistence with volumes.<\/p>\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"NVIDIA_Container_Toolkit_%E2%80%94_the_most_important_component\"><\/span>NVIDIA Container Toolkit \u2014 the most important component<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>This is the connecting piece without which Docker containers can do nothing with your GPU. The toolkit reconfigures Docker in such a way that the command <code>docker run --gpus all ...<\/code> actually works and the container gets access to the GPU devices and the NVIDIA driver.<\/p>\n<p>In the script I pin the version to <code>1.19.0-1<\/code>. There&#8217;s a pragmatic reason for this: updates of the NVIDIA Container Toolkit have occasionally broken setups of mine in the past, because the API was no longer compatible with certain driver versions. With a pinned version the setup stays reproducible \u2014 and I consciously decide when to update.<\/p>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Getting_and_running_the_script\"><\/span>Getting and running the script<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Download the script <code>server_setup.sh<\/code> from my GitHub repository:<\/p>\n<p>Here&#8217;s the link to the script on GitHub: <a href=\"https:\/\/github.com\/custom-build-robots\/tensorrt-llm-edge-prep\/tree\/main\/script\" target=\"_blank\" rel=\"noopener\">tensorrt-llm-edge-prep-script<\/a><\/p>\n<p>Save it on the freshly installed server. I always create a folder called scripts, which on my machine typically sits in the home directory:<\/p>\n<p>Then the script has to be made executable with the following command.<\/p>\n<p>Command: <code>chmod +x server_setup.sh<\/code><\/p>\n<p>Please read the script through once before you run it. With system setup scripts, &#8220;I know exactly what&#8217;s going to happen&#8221; is a good habit. With the following command you run the script.<\/p>\n<p>Command: <code>.\/server_setup.sh<\/code><\/p>\n<p>Important: <strong>Do not run as root.<\/strong> The script checks for this and aborts otherwise. It uses <code>sudo<\/code> where it needs root rights, but keeps the normal user context.<\/p>\n<p>Depending on your internet connection the whole thing takes 5\u201315 minutes. At the end the completion message appears with a note about the required reboot.<\/p>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Reboot_and_verification\"><\/span>Reboot and verification<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>After the script, a restart is mandatory. Both because of the freshly installed NVIDIA driver and because of the Docker group membership (which only becomes active on a new login):<\/p>\n<pre class=\"wp-block-code\"><code>sudo reboot<\/code><\/pre>\n<p>After the reboot, three short tests to check whether everything works together cleanly:<\/p>\n<pre class=\"wp-block-code\">Are the GPU and the driver visible?\r\nCommand: <code>nvidia-smi\r\n<\/code> The familiar view should appear with your GPU, driver version, and memory usage<\/pre>\n<div id=\"attachment_1163\" style=\"width: 922px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2024\/01\/NVIDIA_SMI.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-1163\" class=\"wp-image-1163 size-full\" src=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2024\/01\/NVIDIA_SMI.png\" alt=\"NVIDIA SMI Screen\" width=\"912\" height=\"533\" srcset=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2024\/01\/NVIDIA_SMI.png 912w, https:\/\/ai-box.eu\/wp-content\/uploads\/2024\/01\/NVIDIA_SMI-300x175.png 300w, https:\/\/ai-box.eu\/wp-content\/uploads\/2024\/01\/NVIDIA_SMI-768x449.png 768w\" sizes=\"(max-width: 912px) 100vw, 912px\" \/><\/a><p id=\"caption-attachment-1163\" class=\"wp-caption-text\">NVIDIA SMI Screen<\/p><\/div>\n<pre class=\"wp-block-code\">Now the test whether Docker also works without the sudo command?\r\nCommand: <code>docker run hello-world\r\n<\/code> You should now see a response like <code>\"Hello from Docker!\"<\/code> as a greeting. No permission errors should appear. Is the Container Toolkit installed? The check is done with the following command. Command: <code>nvidia-ctk --version\r\n<\/code> A version output like <code>\"NVIDIA Container Toolkit CLI version 1.19.0\"<\/code> should now appear. The probably most important test now \u2014 can the container access the GPU? Command: <code>docker run --rm --gpus all ubuntu:24.04 nvidia-smi\r\n<\/code> The same nvidia-smi table as on the host should now appear here as well, only this time from inside the container<\/pre>\n<p>If all four tests pass, you have a fully functional GPU container platform. That is the foundation on which any of the common inference frameworks is now runnable.<\/p>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Typical_stumbling_blocks\"><\/span>Typical stumbling blocks<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Two problems I&#8217;ve come across with various installations over the years:<\/p>\n<p><strong>1. The Nouveau driver blocks the NVIDIA driver.<\/strong><\/p>\n<p>With some Ubuntu installations the kernel automatically loads the open-source Nouveau driver, which then blocks the NVIDIA driver. Symptom: after the reboot <code>nvidia-smi<\/code> throws &#8220;NVIDIA-SMI has failed because it couldn&#8217;t communicate with the NVIDIA driver&#8221;. One possible solution can look like this:<\/p>\n<p>Command: <code>echo \"blacklist nouveau\" | sudo tee \/etc\/modprobe.d\/blacklist-nouveau.conf<\/code><\/p>\n<p>Command: <code>echo \"options nouveau modeset=0\" | sudo tee -a \/etc\/modprobe.d\/blacklist-nouveau.conf<\/code><\/p>\n<p>Command: <code>sudo update-initramfs -u<\/code><\/p>\n<p>Command: <code>sudo reboot<\/code><code><\/code><\/p>\n<p>After the restart, <code>nvidia-smi<\/code> should then work.<\/p>\n<p><strong>2. &#8220;docker permission denied&#8221; despite a correct installation.<\/strong> You were indeed added to the Docker group, but your current shell session doesn&#8217;t know about it yet. Solution: log out and log back in \u2014 or simply reboot the entire server, as recommended at the end of the script.<\/p>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"What_isnt_in_the_script_and_why\"><\/span>What isn&#8217;t in the script (and why)<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Deliberately not included are a few things that every admin wants to decide individually:<\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Static IP configuration:<\/strong> depends on your home network; better done manually via Netplan<\/li>\n<li><strong>Firewall rules:<\/strong> depends on what you&#8217;ll be serving later (HTTP, SSH tunnel, OpenAI API)<\/li>\n<li><strong>Samba:<\/strong> I used to have it in there, but it&#8217;s superfluous for a pure inference server<\/li>\n<\/ul>\n<p>If you need these things, please set them up yourself after the restart. The script thus stays lean, reproducible, and focused on its actual purpose: GPU + container platform.<\/p>\n<p>If you have improvements to the script or need a different CUDA\/Toolkit version \u2014 pull request on GitHub or a comment here; I&#8217;m happy about feedback.<\/p>\n<p>&nbsp;<\/p>\n<br>\r\n\t<br>\r\n<h2>Article overview - TensorRT-LLM on the RTX A6000 Ada:<\/h2>\r\n<a title=\"Preparing an Ubuntu 24.04 Server for AI Inference: CUDA, Docker, NVIDIA Container Toolkit\" href=\"https:\/\/ai-box.eu\/en\/top-story-en\/preparing-an-ubuntu-24-04-server-for-ai-inference-cuda-docker-nvidia-container-toolkit\/2268\/\">Preparing an Ubuntu 24.04 Server for AI Inference: CUDA, Docker, NVIDIA Container Toolkit<\/a><br>\r\n<a title=\"TensorRT-LLM on the RTX A6000 Ada: Preparing for the Edge-LLM Ecosystem\" href=\"https:\/\/ai-box.eu\/en\/top-story-en\/tensorrt-llm-on-the-rtx-a6000-ada-preparing-for-the-edge-llm-ecosystem\/2255\/\">TensorRT-LLM on the RTX A6000 Ada: Preparing for the Edge-LLM Ecosystem<\/a><br>\r\n<a title=\"TensorRT-LLM on Ubuntu 24.04: Setup with Docker and Helper Scripts\" href=\"https:\/\/ai-box.eu\/en\/top-story-en\/tensorrt-llm-on-ubuntu-24-04-setup-with-docker-and-helper-scripts\/2257\/\">TensorRT-LLM on Ubuntu 24.04: Setup with Docker and Helper Scripts<\/a><br>\r\n<a title=\"TensorRT-LLM Pipeline: Building Persistent Engines with FP16 and FP8\" href=\"https:\/\/ai-box.eu\/en\/top-story-en\/tensorrt-llm-pipeline-building-persistent-engines-with-fp16-and-fp8\/2262\/\">TensorRT-LLM Pipeline: Building Persistent Engines with FP16 and FP8<\/a><br>\r\n<a title=\"TensorRT-LLM in Numbers: FP16 vs. FP8 on the RTX A6000 Ada\" href=\"https:\/\/ai-box.eu\/en\/top-story-en\/tensorrt-llm-in-numbers-fp16-vs-fp8-on-the-rtx-a6000-ada\/2266\/\">TensorRT-LLM in Numbers: FP16 vs. FP8 on the RTX A6000 Ada<\/a><br>\r\n\t\r\n\t<br>\r\n\t<br>\n","protected":false},"excerpt":{"rendered":"<p>Whether I later want to run TensorRT-LLM, Ollama, vLLM, or any other container-based inference framework on my server, the base installation is always the same: an up-to-date Ubuntu, the matching NVIDIA driver, Docker, and the NVIDIA Container Toolkit so that containers can even access the GPU in the first place. In this post I&#8217;ll show [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":2233,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_et_pb_use_builder":"","_et_pb_old_content":"","_et_gb_content_width":"","footnotes":""},"categories":[53,162,50],"tags":[1208,393,1213,1209,353,1187,1211,1215,1182,1216,1218,1199,1210,1214,1212,123,1198,1217],"class_list":["post-2268","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-hardware-en","category-large-language-models-en","category-top-story-en","tag-ai-inference-server","tag-cuda-en","tag-cuda-13-1","tag-cuda-toolkit","tag-docker","tag-docker-setup","tag-gpu-container","tag-gpu-passthrough-docker","tag-hardware-fp8","tag-llm-inference-setup","tag-nouveau-blacklist","tag-nvidia-container-toolkit","tag-nvidia-driver","tag-nvidia-ctk","tag-server_setup-sh","tag-setup","tag-ubuntu-24-04","tag-ubuntu-gpu-server","et-has-post-format-content","et_post_format-et-post-format-standard"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.6 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Preparing an Ubuntu 24.04 Server for AI Inference: CUDA, Docker, NVIDIA Container Toolkit - Exploring the Future: Inside the AI Box<\/title>\n<meta name=\"description\" content=\"Ubuntu 24.04 AI inference setup with CUDA, Docker and NVIDIA Container Toolkit \u2014 the foundation for TensorRT-LLM, Ollama, vLLM and friends.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/ai-box.eu\/en\/top-story-en\/preparing-an-ubuntu-24-04-server-for-ai-inference-cuda-docker-nvidia-container-toolkit\/2268\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Preparing an Ubuntu 24.04 Server for AI Inference: CUDA, Docker, NVIDIA Container Toolkit - Exploring the Future: Inside the AI Box\" \/>\n<meta property=\"og:description\" content=\"Ubuntu 24.04 AI inference setup with CUDA, Docker and NVIDIA Container Toolkit \u2014 the foundation for TensorRT-LLM, Ollama, vLLM and friends.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/ai-box.eu\/en\/top-story-en\/preparing-an-ubuntu-24-04-server-for-ai-inference-cuda-docker-nvidia-container-toolkit\/2268\/\" \/>\n<meta property=\"og:site_name\" content=\"Exploring the Future: Inside the AI Box\" \/>\n<meta property=\"article:published_time\" content=\"2026-05-16T10:23:22+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-05-16T11:31:06+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/Tensor_RT_LLM_container_test.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"781\" \/>\n\t<meta property=\"og:image:height\" content=\"418\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Maker\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Ingmar_Stapel\" \/>\n<meta name=\"twitter:site\" content=\"@Ingmar_Stapel\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Maker\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/top-story-en\\\/preparing-an-ubuntu-24-04-server-for-ai-inference-cuda-docker-nvidia-container-toolkit\\\/2268\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/top-story-en\\\/preparing-an-ubuntu-24-04-server-for-ai-inference-cuda-docker-nvidia-container-toolkit\\\/2268\\\/\"},\"author\":{\"name\":\"Maker\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/#\\\/schema\\\/person\\\/cc91d08618b3feeef6926591b465eab1\"},\"headline\":\"Preparing an Ubuntu 24.04 Server for AI Inference: CUDA, Docker, NVIDIA Container Toolkit\",\"datePublished\":\"2026-05-16T10:23:22+00:00\",\"dateModified\":\"2026-05-16T11:31:06+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/top-story-en\\\/preparing-an-ubuntu-24-04-server-for-ai-inference-cuda-docker-nvidia-container-toolkit\\\/2268\\\/\"},\"wordCount\":1046,\"commentCount\":0,\"image\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/top-story-en\\\/preparing-an-ubuntu-24-04-server-for-ai-inference-cuda-docker-nvidia-container-toolkit\\\/2268\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/ai-box.eu\\\/wp-content\\\/uploads\\\/2026\\\/05\\\/Tensor_RT_LLM_container_test.jpg\",\"keywords\":[\"AI inference server\",\"CUDA\",\"CUDA 13.1\",\"CUDA Toolkit\",\"Docker\",\"Docker setup\",\"GPU container\",\"GPU passthrough Docker\",\"Hardware FP8\",\"LLM inference setup\",\"Nouveau blacklist\",\"NVIDIA Container Toolkit\",\"NVIDIA driver\",\"nvidia-ctk\",\"server_setup.sh\",\"setup\",\"Ubuntu 24.04\",\"Ubuntu GPU server\"],\"articleSection\":[\"Hardware\",\"Large Language Models\",\"Top story\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/ai-box.eu\\\/en\\\/top-story-en\\\/preparing-an-ubuntu-24-04-server-for-ai-inference-cuda-docker-nvidia-container-toolkit\\\/2268\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/top-story-en\\\/preparing-an-ubuntu-24-04-server-for-ai-inference-cuda-docker-nvidia-container-toolkit\\\/2268\\\/\",\"url\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/top-story-en\\\/preparing-an-ubuntu-24-04-server-for-ai-inference-cuda-docker-nvidia-container-toolkit\\\/2268\\\/\",\"name\":\"Preparing an Ubuntu 24.04 Server for AI Inference: CUDA, Docker, NVIDIA Container Toolkit - Exploring the Future: Inside the AI Box\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/top-story-en\\\/preparing-an-ubuntu-24-04-server-for-ai-inference-cuda-docker-nvidia-container-toolkit\\\/2268\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/top-story-en\\\/preparing-an-ubuntu-24-04-server-for-ai-inference-cuda-docker-nvidia-container-toolkit\\\/2268\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/ai-box.eu\\\/wp-content\\\/uploads\\\/2026\\\/05\\\/Tensor_RT_LLM_container_test.jpg\",\"datePublished\":\"2026-05-16T10:23:22+00:00\",\"dateModified\":\"2026-05-16T11:31:06+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/#\\\/schema\\\/person\\\/cc91d08618b3feeef6926591b465eab1\"},\"description\":\"Ubuntu 24.04 AI inference setup with CUDA, Docker and NVIDIA Container Toolkit \u2014 the foundation for TensorRT-LLM, Ollama, vLLM and friends.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/top-story-en\\\/preparing-an-ubuntu-24-04-server-for-ai-inference-cuda-docker-nvidia-container-toolkit\\\/2268\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/ai-box.eu\\\/en\\\/top-story-en\\\/preparing-an-ubuntu-24-04-server-for-ai-inference-cuda-docker-nvidia-container-toolkit\\\/2268\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/top-story-en\\\/preparing-an-ubuntu-24-04-server-for-ai-inference-cuda-docker-nvidia-container-toolkit\\\/2268\\\/#primaryimage\",\"url\":\"https:\\\/\\\/ai-box.eu\\\/wp-content\\\/uploads\\\/2026\\\/05\\\/Tensor_RT_LLM_container_test.jpg\",\"contentUrl\":\"https:\\\/\\\/ai-box.eu\\\/wp-content\\\/uploads\\\/2026\\\/05\\\/Tensor_RT_LLM_container_test.jpg\",\"width\":781,\"height\":418,\"caption\":\"Tensor RT LLM - container test\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/top-story-en\\\/preparing-an-ubuntu-24-04-server-for-ai-inference-cuda-docker-nvidia-container-toolkit\\\/2268\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Start\",\"item\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Preparing an Ubuntu 24.04 Server for AI Inference: CUDA, Docker, NVIDIA Container Toolkit\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/#website\",\"url\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/\",\"name\":\"Exploring the Future: Inside the AI Box\",\"description\":\"Inside the AI Box, we share our experiences and discoveries in the world of artificial intelligence.\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/#\\\/schema\\\/person\\\/cc91d08618b3feeef6926591b465eab1\",\"name\":\"Maker\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g\",\"caption\":\"Maker\"},\"description\":\"I live in Bavaria near Munich. In my head I always have many topics and try out especially in the field of Internet new media much in my spare time. I write on the blog because it makes me fun to report about the things that inspire me. I am happy about every comment, about suggestion and very about questions.\",\"sameAs\":[\"https:\\\/\\\/ai-box.eu\"],\"url\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/author\\\/ingmars\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Preparing an Ubuntu 24.04 Server for AI Inference: CUDA, Docker, NVIDIA Container Toolkit - Exploring the Future: Inside the AI Box","description":"Ubuntu 24.04 AI inference setup with CUDA, Docker and NVIDIA Container Toolkit \u2014 the foundation for TensorRT-LLM, Ollama, vLLM and friends.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/ai-box.eu\/en\/top-story-en\/preparing-an-ubuntu-24-04-server-for-ai-inference-cuda-docker-nvidia-container-toolkit\/2268\/","og_locale":"en_US","og_type":"article","og_title":"Preparing an Ubuntu 24.04 Server for AI Inference: CUDA, Docker, NVIDIA Container Toolkit - Exploring the Future: Inside the AI Box","og_description":"Ubuntu 24.04 AI inference setup with CUDA, Docker and NVIDIA Container Toolkit \u2014 the foundation for TensorRT-LLM, Ollama, vLLM and friends.","og_url":"https:\/\/ai-box.eu\/en\/top-story-en\/preparing-an-ubuntu-24-04-server-for-ai-inference-cuda-docker-nvidia-container-toolkit\/2268\/","og_site_name":"Exploring the Future: Inside the AI Box","article_published_time":"2026-05-16T10:23:22+00:00","article_modified_time":"2026-05-16T11:31:06+00:00","og_image":[{"width":781,"height":418,"url":"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/Tensor_RT_LLM_container_test.jpg","type":"image\/jpeg"}],"author":"Maker","twitter_card":"summary_large_image","twitter_creator":"@Ingmar_Stapel","twitter_site":"@Ingmar_Stapel","twitter_misc":{"Written by":"Maker","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/ai-box.eu\/en\/top-story-en\/preparing-an-ubuntu-24-04-server-for-ai-inference-cuda-docker-nvidia-container-toolkit\/2268\/#article","isPartOf":{"@id":"https:\/\/ai-box.eu\/en\/top-story-en\/preparing-an-ubuntu-24-04-server-for-ai-inference-cuda-docker-nvidia-container-toolkit\/2268\/"},"author":{"name":"Maker","@id":"https:\/\/ai-box.eu\/en\/#\/schema\/person\/cc91d08618b3feeef6926591b465eab1"},"headline":"Preparing an Ubuntu 24.04 Server for AI Inference: CUDA, Docker, NVIDIA Container Toolkit","datePublished":"2026-05-16T10:23:22+00:00","dateModified":"2026-05-16T11:31:06+00:00","mainEntityOfPage":{"@id":"https:\/\/ai-box.eu\/en\/top-story-en\/preparing-an-ubuntu-24-04-server-for-ai-inference-cuda-docker-nvidia-container-toolkit\/2268\/"},"wordCount":1046,"commentCount":0,"image":{"@id":"https:\/\/ai-box.eu\/en\/top-story-en\/preparing-an-ubuntu-24-04-server-for-ai-inference-cuda-docker-nvidia-container-toolkit\/2268\/#primaryimage"},"thumbnailUrl":"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/Tensor_RT_LLM_container_test.jpg","keywords":["AI inference server","CUDA","CUDA 13.1","CUDA Toolkit","Docker","Docker setup","GPU container","GPU passthrough Docker","Hardware FP8","LLM inference setup","Nouveau blacklist","NVIDIA Container Toolkit","NVIDIA driver","nvidia-ctk","server_setup.sh","setup","Ubuntu 24.04","Ubuntu GPU server"],"articleSection":["Hardware","Large Language Models","Top story"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/ai-box.eu\/en\/top-story-en\/preparing-an-ubuntu-24-04-server-for-ai-inference-cuda-docker-nvidia-container-toolkit\/2268\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/ai-box.eu\/en\/top-story-en\/preparing-an-ubuntu-24-04-server-for-ai-inference-cuda-docker-nvidia-container-toolkit\/2268\/","url":"https:\/\/ai-box.eu\/en\/top-story-en\/preparing-an-ubuntu-24-04-server-for-ai-inference-cuda-docker-nvidia-container-toolkit\/2268\/","name":"Preparing an Ubuntu 24.04 Server for AI Inference: CUDA, Docker, NVIDIA Container Toolkit - Exploring the Future: Inside the AI Box","isPartOf":{"@id":"https:\/\/ai-box.eu\/en\/#website"},"primaryImageOfPage":{"@id":"https:\/\/ai-box.eu\/en\/top-story-en\/preparing-an-ubuntu-24-04-server-for-ai-inference-cuda-docker-nvidia-container-toolkit\/2268\/#primaryimage"},"image":{"@id":"https:\/\/ai-box.eu\/en\/top-story-en\/preparing-an-ubuntu-24-04-server-for-ai-inference-cuda-docker-nvidia-container-toolkit\/2268\/#primaryimage"},"thumbnailUrl":"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/Tensor_RT_LLM_container_test.jpg","datePublished":"2026-05-16T10:23:22+00:00","dateModified":"2026-05-16T11:31:06+00:00","author":{"@id":"https:\/\/ai-box.eu\/en\/#\/schema\/person\/cc91d08618b3feeef6926591b465eab1"},"description":"Ubuntu 24.04 AI inference setup with CUDA, Docker and NVIDIA Container Toolkit \u2014 the foundation for TensorRT-LLM, Ollama, vLLM and friends.","breadcrumb":{"@id":"https:\/\/ai-box.eu\/en\/top-story-en\/preparing-an-ubuntu-24-04-server-for-ai-inference-cuda-docker-nvidia-container-toolkit\/2268\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/ai-box.eu\/en\/top-story-en\/preparing-an-ubuntu-24-04-server-for-ai-inference-cuda-docker-nvidia-container-toolkit\/2268\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/ai-box.eu\/en\/top-story-en\/preparing-an-ubuntu-24-04-server-for-ai-inference-cuda-docker-nvidia-container-toolkit\/2268\/#primaryimage","url":"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/Tensor_RT_LLM_container_test.jpg","contentUrl":"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/Tensor_RT_LLM_container_test.jpg","width":781,"height":418,"caption":"Tensor RT LLM - container test"},{"@type":"BreadcrumbList","@id":"https:\/\/ai-box.eu\/en\/top-story-en\/preparing-an-ubuntu-24-04-server-for-ai-inference-cuda-docker-nvidia-container-toolkit\/2268\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Start","item":"https:\/\/ai-box.eu\/en\/"},{"@type":"ListItem","position":2,"name":"Preparing an Ubuntu 24.04 Server for AI Inference: CUDA, Docker, NVIDIA Container Toolkit"}]},{"@type":"WebSite","@id":"https:\/\/ai-box.eu\/en\/#website","url":"https:\/\/ai-box.eu\/en\/","name":"Exploring the Future: Inside the AI Box","description":"Inside the AI Box, we share our experiences and discoveries in the world of artificial intelligence.","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/ai-box.eu\/en\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/ai-box.eu\/en\/#\/schema\/person\/cc91d08618b3feeef6926591b465eab1","name":"Maker","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g","caption":"Maker"},"description":"I live in Bavaria near Munich. In my head I always have many topics and try out especially in the field of Internet new media much in my spare time. I write on the blog because it makes me fun to report about the things that inspire me. I am happy about every comment, about suggestion and very about questions.","sameAs":["https:\/\/ai-box.eu"],"url":"https:\/\/ai-box.eu\/en\/author\/ingmars\/"}]}},"_links":{"self":[{"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/posts\/2268","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/comments?post=2268"}],"version-history":[{"count":1,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/posts\/2268\/revisions"}],"predecessor-version":[{"id":2269,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/posts\/2268\/revisions\/2269"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/media\/2233"}],"wp:attachment":[{"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/media?parent=2268"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/categories?post=2268"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/tags?post=2268"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}