{"id":2257,"date":"2026-05-16T04:48:13","date_gmt":"2026-05-16T04:48:13","guid":{"rendered":"https:\/\/ai-box.eu\/?p=2257"},"modified":"2026-05-16T11:29:19","modified_gmt":"2026-05-16T11:29:19","slug":"tensorrt-llm-on-ubuntu-24-04-setup-with-docker-and-helper-scripts","status":"publish","type":"post","link":"https:\/\/ai-box.eu\/en\/top-story-en\/tensorrt-llm-on-ubuntu-24-04-setup-with-docker-and-helper-scripts\/2257\/","title":{"rendered":"TensorRT-LLM on Ubuntu 24.04: Setup with Docker and Helper Scripts"},"content":{"rendered":"<p>In the <a href=\"https:\/\/ai-box.eu\/en\/top-story-en\/tensorrt-llm-on-the-rtx-a6000-ada-preparing-for-the-edge-llm-ecosystem\/2255\/\" target=\"_blank\" rel=\"noopener\">previous part<\/a> I described why I&#8217;m tackling TensorRT-LLM on my RTX A6000 Ada: as practical preparation for the Edge-LLM ecosystem, so that later on Jetson Thor and friends I don&#8217;t have to start from scratch. In this part it&#8217;s all about the <strong>concrete installation and configuration<\/strong> on an Ubuntu 24.04 server. So exactly the setup that will later make it possible to build out the build pipeline.<\/p>\n<p>If you&#8217;d like to follow along with this little weekend project, you should have the following prerequisites in place:<\/p>\n<ul>\n<li>Ubuntu as the operating system<\/li>\n<li>NVIDIA driver 545.x or newer<\/li>\n<li>Docker, NVIDIA Container Toolkit<\/li>\n<li>at least 100 GB of free storage for the container image and models.<\/li>\n<\/ul>\n<p>I&#8217;m assuming the base setup is already in place; if not, my own <code>server_setup.sh<\/code>, which I described here (<a href=\"https:\/\/ai-box.eu\/en\/top-story-en\/preparing-an-ubuntu-24-04-server-for-ai-inference-cuda-docker-nvidia-container-toolkit\/2268\/\" target=\"_blank\" rel=\"noopener\">Preparing an Ubuntu 24.04 server for AI inference: CUDA, Docker, NVIDIA Container Toolkit<\/a>), takes care of the preparation in one go.<\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_83 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/ai-box.eu\/en\/top-story-en\/tensorrt-llm-on-ubuntu-24-04-setup-with-docker-and-helper-scripts\/2257\/#Why_the_container_route_instead_of_a_native_installation\" >Why the container route instead of a native installation<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/ai-box.eu\/en\/top-story-en\/tensorrt-llm-on-ubuntu-24-04-setup-with-docker-and-helper-scripts\/2257\/#Sanity_check_bringing_your_GPU_to_light\" >Sanity check: bringing your GPU to light<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/ai-box.eu\/en\/top-story-en\/tensorrt-llm-on-ubuntu-24-04-setup-with-docker-and-helper-scripts\/2257\/#Directory_layout\" >Directory layout<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/ai-box.eu\/en\/top-story-en\/tensorrt-llm-on-ubuntu-24-04-setup-with-docker-and-helper-scripts\/2257\/#Setting_up_the_HuggingFace_token\" >Setting up the HuggingFace token<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/ai-box.eu\/en\/top-story-en\/tensorrt-llm-on-ubuntu-24-04-setup-with-docker-and-helper-scripts\/2257\/#Helper_scripts_setup_trtllmsh_and_start_trtllmsh\" >Helper scripts: setup_trtllm.sh and start_trtllm.sh<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/ai-box.eu\/en\/top-story-en\/tensorrt-llm-on-ubuntu-24-04-setup-with-docker-and-helper-scripts\/2257\/#setup_trtllmsh_%E2%80%94_for_the_one-time_installation\" >setup_trtllm.sh \u2014 for the one-time installation<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/ai-box.eu\/en\/top-story-en\/tensorrt-llm-on-ubuntu-24-04-setup-with-docker-and-helper-scripts\/2257\/#start_trtllmsh_%E2%80%94_container_lifecycle\" >start_trtllm.sh \u2014 container lifecycle<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/ai-box.eu\/en\/top-story-en\/tensorrt-llm-on-ubuntu-24-04-setup-with-docker-and-helper-scripts\/2257\/#First_model_test_with_TinyLlama\" >First model test with TinyLlama<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/ai-box.eu\/en\/top-story-en\/tensorrt-llm-on-ubuntu-24-04-setup-with-docker-and-helper-scripts\/2257\/#The_MPI_trap_if_name_main_is_mandatory\" >The MPI trap: if __name__ == '__main__': is mandatory<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/ai-box.eu\/en\/top-story-en\/tensorrt-llm-on-ubuntu-24-04-setup-with-docker-and-helper-scripts\/2257\/#The_two_backends_in_TRT-LLM_1x\" >The two backends in TRT-LLM 1.x<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-11\" href=\"https:\/\/ai-box.eu\/en\/top-story-en\/tensorrt-llm-on-ubuntu-24-04-setup-with-docker-and-helper-scripts\/2257\/#Script_inventory_after_part_2\" >Script inventory after part 2<\/a><\/li><\/ul><\/nav><\/div>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Why_the_container_route_instead_of_a_native_installation\"><\/span>Why the container route instead of a native installation<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Before I start with the actual installation, a quick word on why I picked the container approach: TensorRT-LLM has very specific dependencies on the CUDA toolkit, cuDNN, the TensorRT version, and a PyTorch build with the matching ABI. The wheels on PyPI are built against PyTorch 2.9.1 with CUDA 13.0. Anyone trying to install this natively quickly ends up in a version swamp, where a single <code>pip install<\/code> tears the whole Python environment apart. Over time, and with quite a few new grey hairs, I decided to go the container route. It simply saves an enormous amount of nerves.<\/p>\n<p>The <strong>NGC release container<\/strong> solves this for you: NVIDIA packages everything that works together into a 20 GB Docker image. CUDA toolkit, cuDNN, TensorRT, PyTorch, TRT-LLM itself, all examples, the <code>trtllm-build<\/code> CLI. Everything we need for this project is inside that container. On your host machine all you need are the drivers, Docker, the NVIDIA Container Toolkit, and a GPU \u2014 ideally one with Ada support. The update strategy of the setup becomes &#8220;switch the image tag&#8221;, and it&#8217;s easily transferable from my system to your system if you follow this guide.<\/p>\n<p>The command is then:<\/p>\n<p>Command: <code>docker pull nvcr.io\/nvidia\/tensorrt-llm\/release:1.2.1<\/code><\/p>\n<p>Which tag is currently stable can be checked at <a href=\"https:\/\/catalog.ngc.nvidia.com\/orgs\/nvidia\/teams\/tensorrt-llm\/containers\/release\/tags\" target=\"_blank\" rel=\"noreferrer noopener\">catalog.ngc.nvidia.com<\/a>. Tags like <code>1.3.0rc14<\/code> are release candidates. With an rcXX tag: hands off for productive use. Plain tags like <code>1.2.1<\/code> without a suffix are the official releases that we want to use.<\/p>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Sanity_check_bringing_your_GPU_to_light\"><\/span>Sanity check: bringing your GPU to light<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Before the 20 GB pull, a quick check is worth doing to see whether everything is ready. Think back here to my blog post about setting up the server:<\/p>\n<pre class=\"wp-block-code\"><code>nvidia-smi                                          # GPU + driver version\r\ndocker --version                                    # Docker installed?\r\ndocker run --rm --gpus all ubuntu:24.04 nvidia-smi  # GPU visible in container?\r\ndf -h \/var\/lib\/docker                               # at least 50 GB free<\/code><\/pre>\n<p>The third command is the important one: it proves that the NVIDIA Container Toolkit is set up correctly and that Docker containers have access to the GPU. If this fails, something in your setup isn&#8217;t right yet.<\/p>\n<p>In my case the first command shows <code>NVIDIA RTX 6000 Ada Generation, 49140 MiB<\/code> \u2014 the professional Ada card with 48 GB. The driver version should be at least 545.x.<\/p>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Directory_layout\"><\/span>Directory layout<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Before I start the container, I create a clean directory structure on the host, which I&#8217;ll later mount into the container. Here everyone has to decide for themselves where the data lives. For simplicity I write everything directly onto the root partition at the top level:<\/p>\n<p>Command: <code>sudo mkdir -p \/data\/trtllm\/<br \/>\n<\/code><\/p>\n<p>Command: <code>sudo chown -R $USER:$USER \/data\/trtllm<\/code><\/p>\n<p>The reason I created this directory is twofold:<\/p>\n<p><strong>Persistence.<\/strong> When the container is stopped, model downloads and built engines would otherwise be lost. That would be especially annoying with model downloads. A Qwen-7B checkpoint alone is 14 GB, and a re-download takes another few minutes, depending on your connection.<\/p>\n<p><strong>Accessibility from the host.<\/strong> When the engines live on the host filesystem, I can easily inspect them through my operating system&#8217;s file manager, back them up, or push them over to another server with <code>scp<\/code> without even having to start and enter the container. That is exactly the build-vs-deploy pattern that Edge-LLM uses as well.<\/p>\n<p>The container later gets this mount mapping:<\/p>\n<pre class=\"wp-block-preformatted\">\/data\/trtllm (Host)   \u21c4   \/workspace (Container)<\/pre>\n<p>Inside the container everything therefore lives under <code>\/workspace<\/code>. Outside, on the host \u2014 that is, your filesystem \u2014 it lives under <code>\/data\/trtllm<\/code>. Same data, two perspectives.<\/p>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Setting_up_the_HuggingFace_token\"><\/span>Setting up the HuggingFace token<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>For most open models (Qwen, Mistral) no HuggingFace token is required. For Llama and Llama derivatives (including the FP8 variants from NVIDIA) you have to accept the respective model provider&#8217;s license once on HuggingFace and create a read token. In my case some of these date back a few years already, and I had completely forgotten about it until the HF token was demanded.<\/p>\n<p>Note: If you don&#8217;t yet have a token for the models, sign in once at HF and create one under <a href=\"https:\/\/huggingface.co\/settings\/tokens\" target=\"_blank\" rel=\"noreferrer noopener\">huggingface.co\/settings\/tokens<\/a>, then store it persistently:<\/p>\n<pre class=\"wp-block-code\"><code>echo 'export HF_TOKEN=\"hf_xxxxxxxxxxxxxxxxxxxxx\"' &gt;&gt; ~\/.env_trtllm\r\nchmod 600 ~\/.env_trtllm<\/code><\/pre>\n<p>Using a separate file instead of putting it directly into <code>.bashrc<\/code> makes it easier to rotate the token later, and protects against accidentally checking it into a repo. It&#8217;s simply the standard way to manage your keys.<\/p>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Helper_scripts_setup_trtllmsh_and_start_trtllmsh\"><\/span>Helper scripts: setup_trtllm.sh and start_trtllm.sh<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Running all the now-required commands in a single sequence quickly gets unwieldy. That&#8217;s why I built two helper scripts. I needed them myself to find my way through the many stumbling blocks and to always have the commands and their order in front of me:<\/p>\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"setup_trtllmsh_%E2%80%94_for_the_one-time_installation\"><\/span>setup_trtllm.sh \u2014 for the one-time installation<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>The <code>setup_trtllm.sh<\/code> automates the full first run:<\/p>\n<ol class=\"wp-block-list\">\n<li><strong>Sanity check:<\/strong> driver version, GPU detection (with an FP8 note if Ada\/Hopper), Docker, GPU pass-through into the container, disk space<\/li>\n<li><strong>HuggingFace token setup:<\/strong> if not set, interactive prompt, save to <code>~\/.env_trtllm<\/code> with <code>chmod 600<\/code><\/li>\n<li><strong>Directories:<\/strong> create <code>\/data\/trtllm\/<\/code> with the right permissions<\/li>\n<li><strong>Container pull:<\/strong> <code>docker pull<\/code> with the configured version<\/li>\n<\/ol>\n<p>Idempotent: by the time I got to where I wanted to be, I had run this script umpteen times. Meaning: it&#8217;s built in such a way that running it multiple times is safe. If the image is already there, no new pull is attempted, since depending on your internet connection that can take a very long time. If the directories exist, they aren&#8217;t overwritten.<\/p>\n<p>Now please download and run the script so that your server is set up and prepared accordingly.<\/p>\n<p>Here&#8217;s the link to the script on GitHub: <a href=\"https:\/\/github.com\/custom-build-robots\/tensorrt-llm-edge-prep\/tree\/main\/script\" target=\"_blank\" rel=\"noopener\">tensorrt-llm-edge-prep-script<\/a><\/p>\n<p>From here on I assume that all the scripts you download in the course of this post live in your working directory, which in my case is <code>\/data\/trtllm\/<\/code>.<\/p>\n<p>You run the script itself as follows:<\/p>\n<p>Command: <code>chmod +x setup_trtllm.sh<\/code><\/p>\n<p>Command: <code>.\/setup_trtllm.sh<\/code><\/p>\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"start_trtllmsh_%E2%80%94_container_lifecycle\"><\/span>start_trtllm.sh \u2014 container lifecycle<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>The <code>start_trtllm.sh<\/code> is intended for the day-to-day operation of the container and is executed again whenever the container isn&#8217;t running.<\/p>\n<p>Here&#8217;s the link to the script on GitHub: <a href=\"https:\/\/github.com\/custom-build-robots\/tensorrt-llm-edge-prep\/tree\/main\/script\" target=\"_blank\" rel=\"noopener\">tensorrt-llm-edge-prep-script<\/a><\/p>\n<p>Command: <code>chmod +x start_trtllm.sh<\/code><\/p>\n<p>Command: <code>.\/start_trtllm.sh<\/code><\/p>\n<pre class=\"wp-block-code\"><code>.\/start_trtllm.sh           # start detached (default)\r\n.\/start_trtllm.sh shell     # interactive (--rm, bash) for quick tests\r\n.\/start_trtllm.sh exec      # jump into the running container\r\n.\/start_trtllm.sh status    # container status + GPU utilization\r\n.\/start_trtllm.sh logs      # docker logs -f\r\n.\/start_trtllm.sh stop      # stop the container<\/code><\/pre>\n<p><strong>Note:<\/strong> The default starts the container <strong>detached<\/strong> with <code>--restart unless-stopped<\/code>. That way it survives server reboots, and I can simply run <code>.\/start_trtllm.sh exec<\/code> to enter the running container whenever I have something to test.<\/p>\n<p>The important Docker run options, briefly explained:<\/p>\n<pre class=\"wp-block-code\"><code>docker run -d \\\r\n  --gpus all \\\r\n  --ipc=host \\\r\n  --ulimit memlock=-1 \\\r\n  --ulimit stack=67108864 \\\r\n  -p 8000:8000 \\\r\n  -v \/data\/trtllm:\/workspace \\\r\n  -e HF_TOKEN=\"${HF_TOKEN}\" \\\r\n  -e HF_HOME=\/workspace\/cache \\\r\n  --name trtllm \\\r\n  --restart unless-stopped \\\r\n  nvcr.io\/nvidia\/tensorrt-llm\/release:1.2.1 \\\r\n  sleep infinity<\/code><\/pre>\n<p><code>--ipc=host<\/code> and the <code>--ulimit<\/code> options are important for MPI communication between TRT-LLM workers \u2014 even for single-GPU inference, TRT-LLM uses MPI internally. More on that in a moment. The port <code>-p 8000:8000<\/code> is for later (when we start <code>trtllm-serve<\/code> as an OpenAI-compatible server). <code>HF_HOME<\/code> inside the volume makes the model cache persistent, so that you don&#8217;t have to re-download the models from HF every time.<\/p>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"First_model_test_with_TinyLlama\"><\/span>First model test with TinyLlama<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>After the first <code>.\/setup_trtllm.sh<\/code> and <code>.\/start_trtllm.sh<\/code> I&#8217;m inside the container and can verify the setup. To do so, please run the following command:<\/p>\n<p>Command: <code>.\/start_trtllm.sh exec <\/code><\/p>\n<pre class=\"wp-block-code\"><code>python3 -c \"import tensorrt_llm; print(tensorrt_llm.__version__)\"\r\n# 1.2.1\r\nnvidia-smi\r\n# shows the A6000 Ada with 48 GB<\/code><\/pre>\n<p>You should now see something like the following in your terminal window.<\/p>\n<div id=\"attachment_2232\" style=\"width: 791px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/Tensor_RT_LLM_container_test.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-2232\" class=\"size-full wp-image-2232\" src=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/Tensor_RT_LLM_container_test.jpg\" alt=\"Tensor RT LLM - container test\" width=\"781\" height=\"418\" srcset=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/Tensor_RT_LLM_container_test.jpg 781w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/Tensor_RT_LLM_container_test-300x161.jpg 300w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/Tensor_RT_LLM_container_test-768x411.jpg 768w\" sizes=\"(max-width: 781px) 100vw, 781px\" \/><\/a><p id=\"caption-attachment-2232\" class=\"wp-caption-text\">Tensor RT LLM &#8211; container test<\/p><\/div>\n<p>As a smoke test we&#8217;ll now run a tiny model \u2014 namely TinyLlama with 1.1 billion parameters \u2014 inside the container. It&#8217;s about 2 GB in size and has to be downloaded. For this to work you now have to switch into the working directory &#8220;workspace&#8221; inside the container.<\/p>\n<p>Command: <code>cd \/workspace<\/code><\/p>\n<p>To be able to run the smoke test you also need the smoke.py Python program. You can find it here.<\/p>\n<p>Download: <a href=\"https:\/\/github.com\/custom-build-robots\/tensorrt-llm-edge-prep\/tree\/main\/script\" target=\"_blank\" rel=\"noopener\">tensorrt-llm-edge-prep-script<\/a><\/p>\n<p>Then execute the smoke.py Python file.<\/p>\n<p>Command: <code>python smoke.py<\/code><\/p>\n<p>On the first run of the script the model is downloaded, which is about 2 GB. Then it takes about a minute for the model init, after which two generated sentences are produced \u2014 anyone who has looked at the script will certainly have noticed: TinyLlama practically doesn&#8217;t speak German. So the outputs are often nonsense (&#8220;Peter und bin 25 Jahre alt\u2026&#8221;). That&#8217;s okay, the smoke test checks the <strong>pipeline<\/strong>, not the model quality. All we wanted to know was whether everything works.<\/p>\n<p>If the script ran through successfully, it prints the following success message at the end:<\/p>\n<p><code>=== Smoke Test erfolgreich abgeschlossen ===<\/code><\/p>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"The_MPI_trap_if_name_main_is_mandatory\"><\/span>The MPI trap: <code>if __name__ == '__main__':<\/code> is mandatory<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>One stumbling block I got stuck on during my first attempt was that TensorRT-LLM internally uses <strong>MPI<\/strong> (Message Passing Interface) for its worker processes. The MPI protocol is used even for single-GPU inference. That has corresponding implications for any Python script that uses the <code>LLM<\/code> API directly.<\/p>\n<p>If you write the code the naive way:<\/p>\n<pre class=\"wp-block-code\"><code># WRONG \u2014 doesn't work!\r\nfrom tensorrt_llm import LLM, SamplingParams\r\n\r\nllm = LLM(model=\"TinyLlama\/TinyLlama-1.1B-Chat-v1.0\")\r\nsp = SamplingParams(max_tokens=64)\r\n# ...<\/code><\/pre>\n<p>\u2026you get an abrupt abort when running it:<\/p>\n<pre class=\"wp-block-code\"><code>The main script or module attempted to spawn new MPI worker processes.\r\nThis probably means that you have forgotten to use the proper idiom...\r\n    if __name__ == '__main__':<\/code><\/pre>\n<p>The reason: when the <code>LLM<\/code> object is initialized, TRT-LLM starts worker processes using <code>multiprocessing.spawn<\/code>. If the main script isn&#8217;t protected by <code>if __name__ == '__main__':<\/code>, the workers try to re-execute the entire module code (including the <code>LLM(...)<\/code> call), which in turn leads to an infinite spawn loop. Fortunately, the Python MPI bindings detect this behavior and abort the process directly.<\/p>\n<p>The fix was then quickly done: wrap the Python for <code>llm...<\/code> into a <code>main()<\/code> function and start it with the guard:<\/p>\n<pre class=\"wp-block-code\"><code>def main():\r\n    llm = LLM(...)\r\n    # ...\r\n\r\nif __name__ == '__main__':\r\n    main()<\/code><\/pre>\n<p>That isn&#8217;t TRT-LLM-specific, it&#8217;s Python standard for anything that uses <code>multiprocessing<\/code>. But if you&#8217;re working through the topic for the first time, like me, and come from the Ollama-style user perspective, it&#8217;s easy to run into.<\/p>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"The_two_backends_in_TRT-LLM_1x\"><\/span>The two backends in TRT-LLM 1.x<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>While reading the logs, the following also caught my eye, and I had to read up on it a little more closely.<\/p>\n<pre class=\"wp-block-code\"><code>[TRT-LLM] [I] Using LLM with PyTorch backend<\/code><\/pre>\n<p>Meaning: TRT-LLM 1.x has <strong>two parallel backends<\/strong>, and since 1.x the default API uses the <strong>PyTorch backend<\/strong>, which doesn&#8217;t build a TensorRT engine at all. Instead it runs the model with PyTorch + optimized kernels \u2014 similar to <a href=\"https:\/\/github.com\/vllm-project\/vllm\" target=\"_blank\" rel=\"noreferrer noopener\">vLLM<\/a>. Quick to set up, good for iteration, but not a deployable engine artifact. That&#8217;s why I never found a file ending in <code>.engine<\/code>.<\/p>\n<p>The <strong>classical TensorRT backend<\/strong> \u2014 which is the one we want, the one Edge-LLM uses, and the one that actually produces an <code>.engine<\/code> file \u2014 can be obtained through a different import:<\/p>\n<pre class=\"wp-block-code\"><code>from tensorrt_llm._tensorrt_engine import LLM   # TensorRT backend<\/code><\/pre>\n<p># vs.<\/p>\n<pre class=\"wp-block-code\"><code>from tensorrt_llm import LLM                     # PyTorch backend (default)<\/code><\/pre>\n<p>That is an architectural fork I hadn&#8217;t noticed, and I really had to dig a bit for it. For the goal I&#8217;ve set myself \u2014 playing through the Edge-LLM-equivalent pipeline on the A6000 Ada \u2014 the TensorRT backend is exactly the right one. For quick experiments the PyTorch backend is enough.<\/p>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Script_inventory_after_part_2\"><\/span>Script inventory after part 2<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>After this part, you should \u2014 like me \u2014 have the following three scripts sitting on your server, all of which you&#8217;ve successfully executed:<\/p>\n<ul class=\"wp-block-list\">\n<li><code>\/data\/trtllm\/setup_trtllm.sh<\/code> \u2014 one-time setup<\/li>\n<li><code>\/data\/trtllm\/start_trtllm.sh<\/code> \u2014 container lifecycle<\/li>\n<li><code>\/data\/trtllm\/smoke.py<\/code> \u2014 validation test with TinyLlama<\/li>\n<\/ul>\n<p>Inside the container these scripts are visible \u2014 thanks to the persistent mount to your local filesystem \u2014 as <code>\/workspace\/setup_trtllm.sh<\/code>, <code>\/workspace\/start_trtllm.sh<\/code>, <code>\/workspace\/smoke.py<\/code>. The scripts are freely available under the MIT license and uploaded to GitHub.<\/p>\n<p>In the next part we get to the heart of it: the two-stage <strong>build pipeline<\/strong> with <code>convert_checkpoint.py<\/code> and <code>trtllm-build<\/code>, custom build scripts for FP16 and FP8, and the story of how my first FP8 engine turned into a perfectly fast, completely unreadable token-salad monster.<\/p>\n<br>\r\n\t<br>\r\n<h2>Article overview - TensorRT-LLM on the RTX A6000 Ada:<\/h2>\r\n<a title=\"Preparing an Ubuntu 24.04 Server for AI Inference: CUDA, Docker, NVIDIA Container Toolkit\" href=\"https:\/\/ai-box.eu\/en\/top-story-en\/preparing-an-ubuntu-24-04-server-for-ai-inference-cuda-docker-nvidia-container-toolkit\/2268\/\">Preparing an Ubuntu 24.04 Server for AI Inference: CUDA, Docker, NVIDIA Container Toolkit<\/a><br>\r\n<a title=\"TensorRT-LLM on the RTX A6000 Ada: Preparing for the Edge-LLM Ecosystem\" href=\"https:\/\/ai-box.eu\/en\/top-story-en\/tensorrt-llm-on-the-rtx-a6000-ada-preparing-for-the-edge-llm-ecosystem\/2255\/\">TensorRT-LLM on the RTX A6000 Ada: Preparing for the Edge-LLM Ecosystem<\/a><br>\r\n<a title=\"TensorRT-LLM on Ubuntu 24.04: Setup with Docker and Helper Scripts\" href=\"https:\/\/ai-box.eu\/en\/top-story-en\/tensorrt-llm-on-ubuntu-24-04-setup-with-docker-and-helper-scripts\/2257\/\">TensorRT-LLM on Ubuntu 24.04: Setup with Docker and Helper Scripts<\/a><br>\r\n<a title=\"TensorRT-LLM Pipeline: Building Persistent Engines with FP16 and FP8\" href=\"https:\/\/ai-box.eu\/en\/top-story-en\/tensorrt-llm-pipeline-building-persistent-engines-with-fp16-and-fp8\/2262\/\">TensorRT-LLM Pipeline: Building Persistent Engines with FP16 and FP8<\/a><br>\r\n<a title=\"TensorRT-LLM in Numbers: FP16 vs. FP8 on the RTX A6000 Ada\" href=\"https:\/\/ai-box.eu\/en\/top-story-en\/tensorrt-llm-in-numbers-fp16-vs-fp8-on-the-rtx-a6000-ada\/2266\/\">TensorRT-LLM in Numbers: FP16 vs. FP8 on the RTX A6000 Ada<\/a><br>\r\n\t\r\n\t<br>\r\n\t<br>\n","protected":false},"excerpt":{"rendered":"<p>In the previous part I described why I&#8217;m tackling TensorRT-LLM on my RTX A6000 Ada: as practical preparation for the Edge-LLM ecosystem, so that later on Jetson Thor and friends I don&#8217;t have to start from scratch. In this part it&#8217;s all about the concrete installation and configuration on an Ubuntu 24.04 server. So exactly [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":2233,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_et_pb_use_builder":"","_et_pb_old_content":"","_et_gb_content_width":"","footnotes":""},"categories":[162,50],"tags":[1187,1197,1192,1194,1188,1199,1195,1176,1189,1190,1196,1175,1191,1193,1198],"class_list":["post-2257","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-large-language-models-en","category-top-story-en","tag-docker-setup","tag-helper-scripts","tag-huggingface-token","tag-mpi-worker","tag-ngc-container","tag-nvidia-container-toolkit","tag-pytorch-backend","tag-rtx-a6000-ada","tag-setup_trtllm-sh","tag-start_trtllm-sh","tag-tensorrt-backend","tag-tensorrt-llm","tag-tensorrt-llm-installation","tag-tinyllama-smoke-test","tag-ubuntu-24-04","et-has-post-format-content","et_post_format-et-post-format-standard"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.6 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>TensorRT-LLM on Ubuntu 24.04: Setup with Docker and Helper Scripts - Exploring the Future: Inside the AI Box<\/title>\n<meta name=\"description\" content=\"Step-by-step TensorRT-LLM Docker setup on Ubuntu 24.04 with NGC container, helper scripts and TinyLlama smoke test \u2014 Part 2 of the series.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/ai-box.eu\/en\/top-story-en\/tensorrt-llm-on-ubuntu-24-04-setup-with-docker-and-helper-scripts\/2257\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"TensorRT-LLM on Ubuntu 24.04: Setup with Docker and Helper Scripts - Exploring the Future: Inside the AI Box\" \/>\n<meta property=\"og:description\" content=\"Step-by-step TensorRT-LLM Docker setup on Ubuntu 24.04 with NGC container, helper scripts and TinyLlama smoke test \u2014 Part 2 of the series.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/ai-box.eu\/en\/top-story-en\/tensorrt-llm-on-ubuntu-24-04-setup-with-docker-and-helper-scripts\/2257\/\" \/>\n<meta property=\"og:site_name\" content=\"Exploring the Future: Inside the AI Box\" \/>\n<meta property=\"article:published_time\" content=\"2026-05-16T04:48:13+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-05-16T11:29:19+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/Tensor_RT_LLM_container_test.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"781\" \/>\n\t<meta property=\"og:image:height\" content=\"418\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Maker\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Ingmar_Stapel\" \/>\n<meta name=\"twitter:site\" content=\"@Ingmar_Stapel\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Maker\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"11 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/top-story-en\\\/tensorrt-llm-on-ubuntu-24-04-setup-with-docker-and-helper-scripts\\\/2257\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/top-story-en\\\/tensorrt-llm-on-ubuntu-24-04-setup-with-docker-and-helper-scripts\\\/2257\\\/\"},\"author\":{\"name\":\"Maker\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/#\\\/schema\\\/person\\\/cc91d08618b3feeef6926591b465eab1\"},\"headline\":\"TensorRT-LLM on Ubuntu 24.04: Setup with Docker and Helper Scripts\",\"datePublished\":\"2026-05-16T04:48:13+00:00\",\"dateModified\":\"2026-05-16T11:29:19+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/top-story-en\\\/tensorrt-llm-on-ubuntu-24-04-setup-with-docker-and-helper-scripts\\\/2257\\\/\"},\"wordCount\":1880,\"commentCount\":0,\"image\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/top-story-en\\\/tensorrt-llm-on-ubuntu-24-04-setup-with-docker-and-helper-scripts\\\/2257\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/ai-box.eu\\\/wp-content\\\/uploads\\\/2026\\\/05\\\/Tensor_RT_LLM_container_test.jpg\",\"keywords\":[\"Docker setup\",\"helper scripts\",\"HuggingFace token\",\"MPI worker\",\"NGC container\",\"NVIDIA Container Toolkit\",\"PyTorch backend\",\"RTX A6000 Ada\",\"setup_trtllm.sh\",\"start_trtllm.sh\",\"TensorRT backend\",\"TensorRT-LLM\",\"TensorRT-LLM installation\",\"TinyLlama smoke test\",\"Ubuntu 24.04\"],\"articleSection\":[\"Large Language Models\",\"Top story\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/ai-box.eu\\\/en\\\/top-story-en\\\/tensorrt-llm-on-ubuntu-24-04-setup-with-docker-and-helper-scripts\\\/2257\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/top-story-en\\\/tensorrt-llm-on-ubuntu-24-04-setup-with-docker-and-helper-scripts\\\/2257\\\/\",\"url\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/top-story-en\\\/tensorrt-llm-on-ubuntu-24-04-setup-with-docker-and-helper-scripts\\\/2257\\\/\",\"name\":\"TensorRT-LLM on Ubuntu 24.04: Setup with Docker and Helper Scripts - Exploring the Future: Inside the AI Box\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/top-story-en\\\/tensorrt-llm-on-ubuntu-24-04-setup-with-docker-and-helper-scripts\\\/2257\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/top-story-en\\\/tensorrt-llm-on-ubuntu-24-04-setup-with-docker-and-helper-scripts\\\/2257\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/ai-box.eu\\\/wp-content\\\/uploads\\\/2026\\\/05\\\/Tensor_RT_LLM_container_test.jpg\",\"datePublished\":\"2026-05-16T04:48:13+00:00\",\"dateModified\":\"2026-05-16T11:29:19+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/#\\\/schema\\\/person\\\/cc91d08618b3feeef6926591b465eab1\"},\"description\":\"Step-by-step TensorRT-LLM Docker setup on Ubuntu 24.04 with NGC container, helper scripts and TinyLlama smoke test \u2014 Part 2 of the series.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/top-story-en\\\/tensorrt-llm-on-ubuntu-24-04-setup-with-docker-and-helper-scripts\\\/2257\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/ai-box.eu\\\/en\\\/top-story-en\\\/tensorrt-llm-on-ubuntu-24-04-setup-with-docker-and-helper-scripts\\\/2257\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/top-story-en\\\/tensorrt-llm-on-ubuntu-24-04-setup-with-docker-and-helper-scripts\\\/2257\\\/#primaryimage\",\"url\":\"https:\\\/\\\/ai-box.eu\\\/wp-content\\\/uploads\\\/2026\\\/05\\\/Tensor_RT_LLM_container_test.jpg\",\"contentUrl\":\"https:\\\/\\\/ai-box.eu\\\/wp-content\\\/uploads\\\/2026\\\/05\\\/Tensor_RT_LLM_container_test.jpg\",\"width\":781,\"height\":418,\"caption\":\"Tensor RT LLM - container test\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/top-story-en\\\/tensorrt-llm-on-ubuntu-24-04-setup-with-docker-and-helper-scripts\\\/2257\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Start\",\"item\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"TensorRT-LLM on Ubuntu 24.04: Setup with Docker and Helper Scripts\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/#website\",\"url\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/\",\"name\":\"Exploring the Future: Inside the AI Box\",\"description\":\"Inside the AI Box, we share our experiences and discoveries in the world of artificial intelligence.\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/#\\\/schema\\\/person\\\/cc91d08618b3feeef6926591b465eab1\",\"name\":\"Maker\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g\",\"caption\":\"Maker\"},\"description\":\"I live in Bavaria near Munich. In my head I always have many topics and try out especially in the field of Internet new media much in my spare time. I write on the blog because it makes me fun to report about the things that inspire me. I am happy about every comment, about suggestion and very about questions.\",\"sameAs\":[\"https:\\\/\\\/ai-box.eu\"],\"url\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/author\\\/ingmars\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"TensorRT-LLM on Ubuntu 24.04: Setup with Docker and Helper Scripts - Exploring the Future: Inside the AI Box","description":"Step-by-step TensorRT-LLM Docker setup on Ubuntu 24.04 with NGC container, helper scripts and TinyLlama smoke test \u2014 Part 2 of the series.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/ai-box.eu\/en\/top-story-en\/tensorrt-llm-on-ubuntu-24-04-setup-with-docker-and-helper-scripts\/2257\/","og_locale":"en_US","og_type":"article","og_title":"TensorRT-LLM on Ubuntu 24.04: Setup with Docker and Helper Scripts - Exploring the Future: Inside the AI Box","og_description":"Step-by-step TensorRT-LLM Docker setup on Ubuntu 24.04 with NGC container, helper scripts and TinyLlama smoke test \u2014 Part 2 of the series.","og_url":"https:\/\/ai-box.eu\/en\/top-story-en\/tensorrt-llm-on-ubuntu-24-04-setup-with-docker-and-helper-scripts\/2257\/","og_site_name":"Exploring the Future: Inside the AI Box","article_published_time":"2026-05-16T04:48:13+00:00","article_modified_time":"2026-05-16T11:29:19+00:00","og_image":[{"width":781,"height":418,"url":"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/Tensor_RT_LLM_container_test.jpg","type":"image\/jpeg"}],"author":"Maker","twitter_card":"summary_large_image","twitter_creator":"@Ingmar_Stapel","twitter_site":"@Ingmar_Stapel","twitter_misc":{"Written by":"Maker","Est. reading time":"11 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/ai-box.eu\/en\/top-story-en\/tensorrt-llm-on-ubuntu-24-04-setup-with-docker-and-helper-scripts\/2257\/#article","isPartOf":{"@id":"https:\/\/ai-box.eu\/en\/top-story-en\/tensorrt-llm-on-ubuntu-24-04-setup-with-docker-and-helper-scripts\/2257\/"},"author":{"name":"Maker","@id":"https:\/\/ai-box.eu\/en\/#\/schema\/person\/cc91d08618b3feeef6926591b465eab1"},"headline":"TensorRT-LLM on Ubuntu 24.04: Setup with Docker and Helper Scripts","datePublished":"2026-05-16T04:48:13+00:00","dateModified":"2026-05-16T11:29:19+00:00","mainEntityOfPage":{"@id":"https:\/\/ai-box.eu\/en\/top-story-en\/tensorrt-llm-on-ubuntu-24-04-setup-with-docker-and-helper-scripts\/2257\/"},"wordCount":1880,"commentCount":0,"image":{"@id":"https:\/\/ai-box.eu\/en\/top-story-en\/tensorrt-llm-on-ubuntu-24-04-setup-with-docker-and-helper-scripts\/2257\/#primaryimage"},"thumbnailUrl":"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/Tensor_RT_LLM_container_test.jpg","keywords":["Docker setup","helper scripts","HuggingFace token","MPI worker","NGC container","NVIDIA Container Toolkit","PyTorch backend","RTX A6000 Ada","setup_trtllm.sh","start_trtllm.sh","TensorRT backend","TensorRT-LLM","TensorRT-LLM installation","TinyLlama smoke test","Ubuntu 24.04"],"articleSection":["Large Language Models","Top story"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/ai-box.eu\/en\/top-story-en\/tensorrt-llm-on-ubuntu-24-04-setup-with-docker-and-helper-scripts\/2257\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/ai-box.eu\/en\/top-story-en\/tensorrt-llm-on-ubuntu-24-04-setup-with-docker-and-helper-scripts\/2257\/","url":"https:\/\/ai-box.eu\/en\/top-story-en\/tensorrt-llm-on-ubuntu-24-04-setup-with-docker-and-helper-scripts\/2257\/","name":"TensorRT-LLM on Ubuntu 24.04: Setup with Docker and Helper Scripts - Exploring the Future: Inside the AI Box","isPartOf":{"@id":"https:\/\/ai-box.eu\/en\/#website"},"primaryImageOfPage":{"@id":"https:\/\/ai-box.eu\/en\/top-story-en\/tensorrt-llm-on-ubuntu-24-04-setup-with-docker-and-helper-scripts\/2257\/#primaryimage"},"image":{"@id":"https:\/\/ai-box.eu\/en\/top-story-en\/tensorrt-llm-on-ubuntu-24-04-setup-with-docker-and-helper-scripts\/2257\/#primaryimage"},"thumbnailUrl":"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/Tensor_RT_LLM_container_test.jpg","datePublished":"2026-05-16T04:48:13+00:00","dateModified":"2026-05-16T11:29:19+00:00","author":{"@id":"https:\/\/ai-box.eu\/en\/#\/schema\/person\/cc91d08618b3feeef6926591b465eab1"},"description":"Step-by-step TensorRT-LLM Docker setup on Ubuntu 24.04 with NGC container, helper scripts and TinyLlama smoke test \u2014 Part 2 of the series.","breadcrumb":{"@id":"https:\/\/ai-box.eu\/en\/top-story-en\/tensorrt-llm-on-ubuntu-24-04-setup-with-docker-and-helper-scripts\/2257\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/ai-box.eu\/en\/top-story-en\/tensorrt-llm-on-ubuntu-24-04-setup-with-docker-and-helper-scripts\/2257\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/ai-box.eu\/en\/top-story-en\/tensorrt-llm-on-ubuntu-24-04-setup-with-docker-and-helper-scripts\/2257\/#primaryimage","url":"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/Tensor_RT_LLM_container_test.jpg","contentUrl":"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/05\/Tensor_RT_LLM_container_test.jpg","width":781,"height":418,"caption":"Tensor RT LLM - container test"},{"@type":"BreadcrumbList","@id":"https:\/\/ai-box.eu\/en\/top-story-en\/tensorrt-llm-on-ubuntu-24-04-setup-with-docker-and-helper-scripts\/2257\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Start","item":"https:\/\/ai-box.eu\/en\/"},{"@type":"ListItem","position":2,"name":"TensorRT-LLM on Ubuntu 24.04: Setup with Docker and Helper Scripts"}]},{"@type":"WebSite","@id":"https:\/\/ai-box.eu\/en\/#website","url":"https:\/\/ai-box.eu\/en\/","name":"Exploring the Future: Inside the AI Box","description":"Inside the AI Box, we share our experiences and discoveries in the world of artificial intelligence.","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/ai-box.eu\/en\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/ai-box.eu\/en\/#\/schema\/person\/cc91d08618b3feeef6926591b465eab1","name":"Maker","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g","caption":"Maker"},"description":"I live in Bavaria near Munich. In my head I always have many topics and try out especially in the field of Internet new media much in my spare time. I write on the blog because it makes me fun to report about the things that inspire me. I am happy about every comment, about suggestion and very about questions.","sameAs":["https:\/\/ai-box.eu"],"url":"https:\/\/ai-box.eu\/en\/author\/ingmars\/"}]}},"_links":{"self":[{"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/posts\/2257","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/comments?post=2257"}],"version-history":[{"count":2,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/posts\/2257\/revisions"}],"predecessor-version":[{"id":2271,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/posts\/2257\/revisions\/2271"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/media\/2233"}],"wp:attachment":[{"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/media?parent=2257"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/categories?post=2257"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/tags?post=2257"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}