{"id":1803,"date":"2025-11-28T19:01:54","date_gmt":"2025-11-28T19:01:54","guid":{"rendered":"https:\/\/ai-box.eu\/?p=1803"},"modified":"2025-11-28T19:22:26","modified_gmt":"2025-11-28T19:22:26","slug":"how-to-scale-ollama-with-two-or-more-gpus","status":"publish","type":"post","link":"https:\/\/ai-box.eu\/en\/large-language-models-en\/empowering-process-automation-with-n8n-ollama-and-open-source-llms\/how-to-scale-ollama-with-two-or-more-gpus\/1803\/","title":{"rendered":"How to Scale Ollama with Two or More GPUs \u2013 Parallel Instances for Maximum Performance"},"content":{"rendered":"<p data-path-to-node=\"3\">Anyone operating a powerful server with two NVIDIA GPUs (such as two RTX A6000s) quickly encounters a problem with the standard installation of <b>Ollama<\/b>: Ollama manages requests in a queue by default. Even with 96 GB of VRAM, parallel jobs often wait for each other instead of being calculated simultaneously.<\/p>\n<p data-path-to-node=\"4\">The solution? We run <b>two separate Ollama instances<\/b>!<\/p>\n<p data-path-to-node=\"5\">In this tutorial, I will show you how to configure Ollama so that Instance A runs exclusively on GPU 0 (Port 11434) and Instance B exclusively on GPU 1 (Port 11435). The result: True parallelism and clean memory separation.<\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_82_2 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/ai-box.eu\/en\/large-language-models-en\/empowering-process-automation-with-n8n-ollama-and-open-source-llms\/how-to-scale-ollama-with-two-or-more-gpus\/1803\/#Prerequisites\" >Prerequisites<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/ai-box.eu\/en\/large-language-models-en\/empowering-process-automation-with-n8n-ollama-and-open-source-llms\/how-to-scale-ollama-with-two-or-more-gpus\/1803\/#Step_1_Identify_GPUs\" >Step 1: Identify GPUs<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/ai-box.eu\/en\/large-language-models-en\/empowering-process-automation-with-n8n-ollama-and-open-source-llms\/how-to-scale-ollama-with-two-or-more-gpus\/1803\/#Step_2_Restrict_the_First_Instance_GPU_0\" >Step 2: Restrict the First Instance (GPU 0)<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/ai-box.eu\/en\/large-language-models-en\/empowering-process-automation-with-n8n-ollama-and-open-source-llms\/how-to-scale-ollama-with-two-or-more-gpus\/1803\/#Step_3_Create_the_Second_Instance_GPU_1\" >Step 3: Create the Second Instance (GPU 1)<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/ai-box.eu\/en\/large-language-models-en\/empowering-process-automation-with-n8n-ollama-and-open-source-llms\/how-to-scale-ollama-with-two-or-more-gpus\/1803\/#Step_4_Activation\" >Step 4: Activation<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/ai-box.eu\/en\/large-language-models-en\/empowering-process-automation-with-n8n-ollama-and-open-source-llms\/how-to-scale-ollama-with-two-or-more-gpus\/1803\/#Step_5_Testing_Both_Ollama_Instances\" >Step 5: Testing Both Ollama Instances<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/ai-box.eu\/en\/large-language-models-en\/empowering-process-automation-with-n8n-ollama-and-open-source-llms\/how-to-scale-ollama-with-two-or-more-gpus\/1803\/#Pro_Tip_for_Verification\" >Pro Tip for Verification<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/ai-box.eu\/en\/large-language-models-en\/empowering-process-automation-with-n8n-ollama-and-open-source-llms\/how-to-scale-ollama-with-two-or-more-gpus\/1803\/#Conclusion\" >Conclusion<\/a><\/li><\/ul><\/nav><\/div>\n<h2><span class=\"ez-toc-section\" id=\"Prerequisites\"><\/span>Prerequisites<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<ul data-path-to-node=\"7\">\n<li>\n<p data-path-to-node=\"7,0,0\">An Ubuntu server with installed NVIDIA drivers.<\/p>\n<\/li>\n<li>\n<p data-path-to-node=\"7,1,0\">Two NVIDIA GPUs.<\/p>\n<\/li>\n<li>\n<p data-path-to-node=\"7,2,0\">Ollama is already installed (standard installation).<\/p>\n<\/li>\n<li>\n<p data-path-to-node=\"7,3,0\">Optional: A dedicated path for models (to save storage space).<\/p>\n<\/li>\n<\/ul>\n<h2><span class=\"ez-toc-section\" id=\"Step_1_Identify_GPUs\"><\/span>Step 1: Identify GPUs<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p data-path-to-node=\"10\">First, let&#8217;s get an overview of the hardware IDs.<\/p>\n<p style=\"padding-left: 40px;\" data-path-to-node=\"10\"><strong>Command: <\/strong><code>nvidia-smi<\/code><\/p>\n<p data-path-to-node=\"12\">Usually, the GPUs are numbered from <code>0<\/code> to <code>1<\/code>. Our goal:<\/p>\n<ul data-path-to-node=\"13\">\n<li>\n<p data-path-to-node=\"13,0,0\"><b>Ollama 1:<\/b> GPU 0<\/p>\n<\/li>\n<li>\n<p data-path-to-node=\"13,1,0\"><b>Ollama 2:<\/b> GPU 1<\/p>\n<\/li>\n<\/ul>\n<div id=\"attachment_1789\" style=\"width: 807px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2025\/11\/Ollama_multi_instance_01.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-1789\" class=\"wp-image-1789 size-full\" src=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2025\/11\/Ollama_multi_instance_01.jpg\" alt=\"Ollama multi instance\" width=\"797\" height=\"578\" srcset=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2025\/11\/Ollama_multi_instance_01.jpg 797w, https:\/\/ai-box.eu\/wp-content\/uploads\/2025\/11\/Ollama_multi_instance_01-300x218.jpg 300w, https:\/\/ai-box.eu\/wp-content\/uploads\/2025\/11\/Ollama_multi_instance_01-768x557.jpg 768w\" sizes=\"(max-width: 797px) 100vw, 797px\" \/><\/a><p id=\"caption-attachment-1789\" class=\"wp-caption-text\">Ollama multi instance<\/p><\/div>\n<h2><span class=\"ez-toc-section\" id=\"Step_2_Restrict_the_First_Instance_GPU_0\"><\/span>Step 2: Restrict the First Instance (GPU 0)<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p data-path-to-node=\"16\">The standard Ollama service (<code>ollama.service<\/code>) often tries to grab all resources. We force it onto the first card using a <code>systemd override<\/code>.<\/p>\n<p data-path-to-node=\"17\">Run the following command:<\/p>\n<p style=\"padding-left: 40px;\" data-path-to-node=\"17\"><strong>Command:<\/strong> <code>sudo systemctl edit ollama.service<\/code><\/p>\n<p data-path-to-node=\"17\">Paste the following content into the editor. This overrides the environment variables without touching the original file:<\/p>\n<p data-path-to-node=\"17\"><code>[Service]<\/code><br \/>\n<code># Make only GPU 0 visible<\/code><br \/>\n<code>Environment=\"CUDA_VISIBLE_DEVICES=0\"<\/code><br \/>\n<code># Explicitly set standard port (optional, but clean)<\/code><br \/>\n<code>Environment=\"OLLAMA_HOST=0.0.0.0:11434\"<\/code><br \/>\n<code># Optional: If you have a custom model path<\/code><br \/>\n<code>Environment=\"OLLAMA_MODELS=\/mnt\/your\/path\/to\/models\"<\/code><\/p>\n<div id=\"attachment_1791\" style=\"width: 1034px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2025\/11\/Ollama_multi_instance_02-1024x342.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-1791\" class=\"wp-image-1791 size-large\" src=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2025\/11\/Ollama_multi_instance_02-1024x342.jpg\" alt=\"Ollama multi instance set variables\" width=\"1024\" height=\"342\" srcset=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2025\/11\/Ollama_multi_instance_02-1024x342.jpg 1024w, https:\/\/ai-box.eu\/wp-content\/uploads\/2025\/11\/Ollama_multi_instance_02-300x100.jpg 300w, https:\/\/ai-box.eu\/wp-content\/uploads\/2025\/11\/Ollama_multi_instance_02-768x256.jpg 768w, https:\/\/ai-box.eu\/wp-content\/uploads\/2025\/11\/Ollama_multi_instance_02-1536x512.jpg 1536w, https:\/\/ai-box.eu\/wp-content\/uploads\/2025\/11\/Ollama_multi_instance_02-1080x360.jpg 1080w, https:\/\/ai-box.eu\/wp-content\/uploads\/2025\/11\/Ollama_multi_instance_02.jpg 1829w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/a><p id=\"caption-attachment-1791\" class=\"wp-caption-text\">Ollama multi instance set variables<\/p><\/div>\n<h2><span class=\"ez-toc-section\" id=\"Step_3_Create_the_Second_Instance_GPU_1\"><\/span>Step 3: Create the Second Instance (GPU 1)<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p data-path-to-node=\"24\">For the second GPU, we create a completely new service. We basically copy the logic of the first one, but change the port and the GPU ID.<\/p>\n<p data-path-to-node=\"25\">Create the file <code>\/etc\/systemd\/system\/ollama-2.service<\/code>:<\/p>\n<p style=\"padding-left: 40px;\" data-path-to-node=\"25\"><strong>Command: <\/strong><code>sudo nano \/etc\/systemd\/system\/ollama-2.service<\/code><\/p>\n<p><code>[Unit]<\/code><br \/>\n<code>Description=Ollama Service (GPU 1 - Port 11435)<\/code><br \/>\n<code>After=network-online.target<\/code><\/p>\n<p><code>[Service]<\/code><br \/>\n<code>ExecStart=\/usr\/local\/bin\/ollama serve<\/code><br \/>\n<code>User=ollama<\/code><br \/>\n<code>Group=ollama<\/code><br \/>\n<code>Type=simple<\/code><br \/>\n<code>Restart=always<\/code><br \/>\n<code>RestartSec=3<\/code><\/p>\n<p><code># IMPORTANT: New port and second GPU<\/code><br \/>\n<code>Environment=\"OLLAMA_HOST=0.0.0.0:11435\"<\/code><br \/>\n<code>Environment=\"CUDA_VISIBLE_DEVICES=1\"<\/code><\/p>\n<p><code># IMPORTANT: Use the same model path as Instance 1!<\/code><br \/>\n<code># This way, models only need to be downloaded once.<\/code><br \/>\n<code>Environment=\"OLLAMA_MODELS=\/mnt\/your\/path\/to\/models\"<\/code><\/p>\n<p><code># Optional: Path variable (if necessary)<\/code><br \/>\n<code>Environment=\"PATH=\/usr\/local\/sbin:\/usr\/local\/bin:\/usr\/sbin:\/usr\/bin:\/sbin:\/bin\"<\/code><\/p>\n<p><code>[Install]<\/code><br \/>\n<code>WantedBy=default.target<\/code><\/p>\n<p><b>Note:<\/b> Make sure that the path for <code>OLLAMA_MODELS<\/code> is identical in both services so that the instances share the storage space.<\/p>\n<div id=\"attachment_1793\" style=\"width: 959px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2025\/11\/Ollama_multi_instance_03.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-1793\" class=\"wp-image-1793 size-full\" src=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2025\/11\/Ollama_multi_instance_03.jpg\" alt=\"Ollama second instance set variables\" width=\"949\" height=\"514\" srcset=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2025\/11\/Ollama_multi_instance_03.jpg 949w, https:\/\/ai-box.eu\/wp-content\/uploads\/2025\/11\/Ollama_multi_instance_03-300x162.jpg 300w, https:\/\/ai-box.eu\/wp-content\/uploads\/2025\/11\/Ollama_multi_instance_03-768x416.jpg 768w\" sizes=\"(max-width: 949px) 100vw, 949px\" \/><\/a><p id=\"caption-attachment-1793\" class=\"wp-caption-text\">Ollama second instance set variables<\/p><\/div>\n<h2><span class=\"ez-toc-section\" id=\"Step_4_Activation\"><\/span>Step 4: Activation<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p data-path-to-node=\"32\">Now we need to reload Systemd and start the services.<\/p>\n<p><code># Reload Systemd configs<\/code><br \/>\n<code>sudo systemctl daemon-reload<\/code><\/p>\n<p><code># Restart first instance (so it releases GPU 1)<\/code><br \/>\n<code>sudo systemctl restart ollama<\/code><\/p>\n<p><code># Enable second instance (autostart) and start<\/code><br \/>\n<code>sudo systemctl enable ollama-2<\/code><br \/>\n<code>sudo systemctl start ollama-2<\/code><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Step_5_Testing_Both_Ollama_Instances\"><\/span>Step 5: Testing Both Ollama Instances<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p data-path-to-node=\"36\">Are both instances running?<\/p>\n<p style=\"padding-left: 40px;\" data-path-to-node=\"36\"><strong>Command:<\/strong> <code>sudo systemctl status ollama ollama-2<\/code><\/p>\n<p data-path-to-node=\"38\">If both are green (<code>active<\/code>), you can test them.<\/p>\n<div id=\"attachment_1796\" style=\"width: 1034px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2025\/11\/Ollama_multi_instance_04-1024x646.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-1796\" class=\"wp-image-1796 size-large\" src=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2025\/11\/Ollama_multi_instance_04-1024x646.jpg\" alt=\"Ollama check if all instances are running\" width=\"1024\" height=\"646\" srcset=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2025\/11\/Ollama_multi_instance_04-1024x646.jpg 1024w, https:\/\/ai-box.eu\/wp-content\/uploads\/2025\/11\/Ollama_multi_instance_04-300x189.jpg 300w, https:\/\/ai-box.eu\/wp-content\/uploads\/2025\/11\/Ollama_multi_instance_04-768x485.jpg 768w, https:\/\/ai-box.eu\/wp-content\/uploads\/2025\/11\/Ollama_multi_instance_04-1080x682.jpg 1080w, https:\/\/ai-box.eu\/wp-content\/uploads\/2025\/11\/Ollama_multi_instance_04.jpg 1245w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/a><p id=\"caption-attachment-1796\" class=\"wp-caption-text\">Ollama check if all instances are running<\/p><\/div>\n<p data-path-to-node=\"39\"><b>Address Instance 1 (GPU 0):<\/b> As normal via the standard command:<\/p>\n<p style=\"padding-left: 40px;\" data-path-to-node=\"39\"><strong>Command:<\/strong> <code>ollama run llama3<\/code><\/p>\n<p data-path-to-node=\"39\"><b>Address Instance 2 (GPU 1):<\/b> Here we pass the host\/port along:<\/p>\n<p style=\"padding-left: 40px;\" data-path-to-node=\"39\"><strong>Command:<\/strong> <code>OLLAMA_HOST=127.0.0.1:11435 ollama run llama3<\/code><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Pro_Tip_for_Verification\"><\/span>Pro Tip for Verification<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p data-path-to-node=\"44\">Open a second terminal and start <code>watch -n 0.5 nvidia-smi<\/code>. If you now send requests to the different ports, you will see how the VRAM and GPU load are distributed exactly to the card you addressed.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Conclusion\"><\/span>Conclusion<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p data-path-to-node=\"46\">With just a few lines of configuration, you have transformed your server into a parallel inference cluster. You can now run agents, RAG pipelines, or training jobs clearly separated from each other without them blocking one another. Another advantage is that you can now specifically distribute the load across the two or more GPUs in your system.<\/p>\n<p data-path-to-node=\"47\">Good luck with the setup!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Anyone operating a powerful server with two NVIDIA GPUs (such as two RTX A6000s) quickly encounters a problem with the standard installation of Ollama: Ollama manages requests in a queue by default. Even with 96 GB of VRAM, parallel jobs often wait for each other instead of being calculated simultaneously. The solution? We run two [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":1802,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_et_pb_use_builder":"","_et_pb_old_content":"","_et_gb_content_width":"","footnotes":""},"categories":[567,50],"tags":[393,306],"class_list":["post-1803","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-empowering-process-automation-with-n8n-ollama-and-open-source-llms","category-top-story-en","tag-cuda-en","tag-ollama-en","et-has-post-format-content","et_post_format-et-post-format-standard"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>How to Scale Ollama with Two or More GPUs \u2013 Parallel Instances for Maximum Performance - Exploring the Future: Inside the AI Box<\/title>\n<meta name=\"description\" content=\"Ollama with two GPUs in parallelUnlock Maximum Performance: Setting Up Ollama for Dual NVIDIA GPUs. Includes Separate Instances, Custom Ports, and Real Concurrency.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/ai-box.eu\/en\/large-language-models-en\/empowering-process-automation-with-n8n-ollama-and-open-source-llms\/how-to-scale-ollama-with-two-or-more-gpus\/1803\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"How to Scale Ollama with Two or More GPUs \u2013 Parallel Instances for Maximum Performance - Exploring the Future: Inside the AI Box\" \/>\n<meta property=\"og:description\" content=\"Ollama with two GPUs in parallelUnlock Maximum Performance: Setting Up Ollama for Dual NVIDIA GPUs. Includes Separate Instances, Custom Ports, and Real Concurrency.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/ai-box.eu\/en\/large-language-models-en\/empowering-process-automation-with-n8n-ollama-and-open-source-llms\/how-to-scale-ollama-with-two-or-more-gpus\/1803\/\" \/>\n<meta property=\"og:site_name\" content=\"Exploring the Future: Inside the AI Box\" \/>\n<meta property=\"article:published_time\" content=\"2025-11-28T19:01:54+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-11-28T19:22:26+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2025\/11\/Ollama_multi_instance_logo-scaled.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"2560\" \/>\n\t<meta property=\"og:image:height\" content=\"1396\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Maker\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Ingmar_Stapel\" \/>\n<meta name=\"twitter:site\" content=\"@Ingmar_Stapel\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Maker\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/large-language-models-en\\\/empowering-process-automation-with-n8n-ollama-and-open-source-llms\\\/how-to-scale-ollama-with-two-or-more-gpus\\\/1803\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/large-language-models-en\\\/empowering-process-automation-with-n8n-ollama-and-open-source-llms\\\/how-to-scale-ollama-with-two-or-more-gpus\\\/1803\\\/\"},\"author\":{\"name\":\"Maker\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/#\\\/schema\\\/person\\\/cc91d08618b3feeef6926591b465eab1\"},\"headline\":\"How to Scale Ollama with Two or More GPUs \u2013 Parallel Instances for Maximum Performance\",\"datePublished\":\"2025-11-28T19:01:54+00:00\",\"dateModified\":\"2025-11-28T19:22:26+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/large-language-models-en\\\/empowering-process-automation-with-n8n-ollama-and-open-source-llms\\\/how-to-scale-ollama-with-two-or-more-gpus\\\/1803\\\/\"},\"wordCount\":463,\"commentCount\":0,\"image\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/large-language-models-en\\\/empowering-process-automation-with-n8n-ollama-and-open-source-llms\\\/how-to-scale-ollama-with-two-or-more-gpus\\\/1803\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/ai-box.eu\\\/wp-content\\\/uploads\\\/2025\\\/11\\\/Ollama_multi_instance_logo-scaled.jpg\",\"keywords\":[\"CUDA\",\"Ollama\"],\"articleSection\":[\"Process Automation with n8n, ollama &amp; Open Source LLMs\",\"Top story\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/ai-box.eu\\\/en\\\/large-language-models-en\\\/empowering-process-automation-with-n8n-ollama-and-open-source-llms\\\/how-to-scale-ollama-with-two-or-more-gpus\\\/1803\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/large-language-models-en\\\/empowering-process-automation-with-n8n-ollama-and-open-source-llms\\\/how-to-scale-ollama-with-two-or-more-gpus\\\/1803\\\/\",\"url\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/large-language-models-en\\\/empowering-process-automation-with-n8n-ollama-and-open-source-llms\\\/how-to-scale-ollama-with-two-or-more-gpus\\\/1803\\\/\",\"name\":\"How to Scale Ollama with Two or More GPUs \u2013 Parallel Instances for Maximum Performance - Exploring the Future: Inside the AI Box\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/large-language-models-en\\\/empowering-process-automation-with-n8n-ollama-and-open-source-llms\\\/how-to-scale-ollama-with-two-or-more-gpus\\\/1803\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/large-language-models-en\\\/empowering-process-automation-with-n8n-ollama-and-open-source-llms\\\/how-to-scale-ollama-with-two-or-more-gpus\\\/1803\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/ai-box.eu\\\/wp-content\\\/uploads\\\/2025\\\/11\\\/Ollama_multi_instance_logo-scaled.jpg\",\"datePublished\":\"2025-11-28T19:01:54+00:00\",\"dateModified\":\"2025-11-28T19:22:26+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/#\\\/schema\\\/person\\\/cc91d08618b3feeef6926591b465eab1\"},\"description\":\"Ollama with two GPUs in parallelUnlock Maximum Performance: Setting Up Ollama for Dual NVIDIA GPUs. Includes Separate Instances, Custom Ports, and Real Concurrency.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/large-language-models-en\\\/empowering-process-automation-with-n8n-ollama-and-open-source-llms\\\/how-to-scale-ollama-with-two-or-more-gpus\\\/1803\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/ai-box.eu\\\/en\\\/large-language-models-en\\\/empowering-process-automation-with-n8n-ollama-and-open-source-llms\\\/how-to-scale-ollama-with-two-or-more-gpus\\\/1803\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/large-language-models-en\\\/empowering-process-automation-with-n8n-ollama-and-open-source-llms\\\/how-to-scale-ollama-with-two-or-more-gpus\\\/1803\\\/#primaryimage\",\"url\":\"https:\\\/\\\/ai-box.eu\\\/wp-content\\\/uploads\\\/2025\\\/11\\\/Ollama_multi_instance_logo-scaled.jpg\",\"contentUrl\":\"https:\\\/\\\/ai-box.eu\\\/wp-content\\\/uploads\\\/2025\\\/11\\\/Ollama_multi_instance_logo-scaled.jpg\",\"width\":2560,\"height\":1396,\"caption\":\"Ollama multi instance logo\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/large-language-models-en\\\/empowering-process-automation-with-n8n-ollama-and-open-source-llms\\\/how-to-scale-ollama-with-two-or-more-gpus\\\/1803\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Start\",\"item\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"How to Scale Ollama with Two or More GPUs \u2013 Parallel Instances for Maximum Performance\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/#website\",\"url\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/\",\"name\":\"Exploring the Future: Inside the AI Box\",\"description\":\"Inside the AI Box, we share our experiences and discoveries in the world of artificial intelligence.\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/#\\\/schema\\\/person\\\/cc91d08618b3feeef6926591b465eab1\",\"name\":\"Maker\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g\",\"caption\":\"Maker\"},\"description\":\"I live in Bavaria near Munich. In my head I always have many topics and try out especially in the field of Internet new media much in my spare time. I write on the blog because it makes me fun to report about the things that inspire me. I am happy about every comment, about suggestion and very about questions.\",\"sameAs\":[\"https:\\\/\\\/ai-box.eu\"],\"url\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/author\\\/ingmars\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"How to Scale Ollama with Two or More GPUs \u2013 Parallel Instances for Maximum Performance - Exploring the Future: Inside the AI Box","description":"Ollama with two GPUs in parallelUnlock Maximum Performance: Setting Up Ollama for Dual NVIDIA GPUs. Includes Separate Instances, Custom Ports, and Real Concurrency.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/ai-box.eu\/en\/large-language-models-en\/empowering-process-automation-with-n8n-ollama-and-open-source-llms\/how-to-scale-ollama-with-two-or-more-gpus\/1803\/","og_locale":"en_US","og_type":"article","og_title":"How to Scale Ollama with Two or More GPUs \u2013 Parallel Instances for Maximum Performance - Exploring the Future: Inside the AI Box","og_description":"Ollama with two GPUs in parallelUnlock Maximum Performance: Setting Up Ollama for Dual NVIDIA GPUs. Includes Separate Instances, Custom Ports, and Real Concurrency.","og_url":"https:\/\/ai-box.eu\/en\/large-language-models-en\/empowering-process-automation-with-n8n-ollama-and-open-source-llms\/how-to-scale-ollama-with-two-or-more-gpus\/1803\/","og_site_name":"Exploring the Future: Inside the AI Box","article_published_time":"2025-11-28T19:01:54+00:00","article_modified_time":"2025-11-28T19:22:26+00:00","og_image":[{"width":2560,"height":1396,"url":"https:\/\/ai-box.eu\/wp-content\/uploads\/2025\/11\/Ollama_multi_instance_logo-scaled.jpg","type":"image\/jpeg"}],"author":"Maker","twitter_card":"summary_large_image","twitter_creator":"@Ingmar_Stapel","twitter_site":"@Ingmar_Stapel","twitter_misc":{"Written by":"Maker","Est. reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/ai-box.eu\/en\/large-language-models-en\/empowering-process-automation-with-n8n-ollama-and-open-source-llms\/how-to-scale-ollama-with-two-or-more-gpus\/1803\/#article","isPartOf":{"@id":"https:\/\/ai-box.eu\/en\/large-language-models-en\/empowering-process-automation-with-n8n-ollama-and-open-source-llms\/how-to-scale-ollama-with-two-or-more-gpus\/1803\/"},"author":{"name":"Maker","@id":"https:\/\/ai-box.eu\/en\/#\/schema\/person\/cc91d08618b3feeef6926591b465eab1"},"headline":"How to Scale Ollama with Two or More GPUs \u2013 Parallel Instances for Maximum Performance","datePublished":"2025-11-28T19:01:54+00:00","dateModified":"2025-11-28T19:22:26+00:00","mainEntityOfPage":{"@id":"https:\/\/ai-box.eu\/en\/large-language-models-en\/empowering-process-automation-with-n8n-ollama-and-open-source-llms\/how-to-scale-ollama-with-two-or-more-gpus\/1803\/"},"wordCount":463,"commentCount":0,"image":{"@id":"https:\/\/ai-box.eu\/en\/large-language-models-en\/empowering-process-automation-with-n8n-ollama-and-open-source-llms\/how-to-scale-ollama-with-two-or-more-gpus\/1803\/#primaryimage"},"thumbnailUrl":"https:\/\/ai-box.eu\/wp-content\/uploads\/2025\/11\/Ollama_multi_instance_logo-scaled.jpg","keywords":["CUDA","Ollama"],"articleSection":["Process Automation with n8n, ollama &amp; Open Source LLMs","Top story"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/ai-box.eu\/en\/large-language-models-en\/empowering-process-automation-with-n8n-ollama-and-open-source-llms\/how-to-scale-ollama-with-two-or-more-gpus\/1803\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/ai-box.eu\/en\/large-language-models-en\/empowering-process-automation-with-n8n-ollama-and-open-source-llms\/how-to-scale-ollama-with-two-or-more-gpus\/1803\/","url":"https:\/\/ai-box.eu\/en\/large-language-models-en\/empowering-process-automation-with-n8n-ollama-and-open-source-llms\/how-to-scale-ollama-with-two-or-more-gpus\/1803\/","name":"How to Scale Ollama with Two or More GPUs \u2013 Parallel Instances for Maximum Performance - Exploring the Future: Inside the AI Box","isPartOf":{"@id":"https:\/\/ai-box.eu\/en\/#website"},"primaryImageOfPage":{"@id":"https:\/\/ai-box.eu\/en\/large-language-models-en\/empowering-process-automation-with-n8n-ollama-and-open-source-llms\/how-to-scale-ollama-with-two-or-more-gpus\/1803\/#primaryimage"},"image":{"@id":"https:\/\/ai-box.eu\/en\/large-language-models-en\/empowering-process-automation-with-n8n-ollama-and-open-source-llms\/how-to-scale-ollama-with-two-or-more-gpus\/1803\/#primaryimage"},"thumbnailUrl":"https:\/\/ai-box.eu\/wp-content\/uploads\/2025\/11\/Ollama_multi_instance_logo-scaled.jpg","datePublished":"2025-11-28T19:01:54+00:00","dateModified":"2025-11-28T19:22:26+00:00","author":{"@id":"https:\/\/ai-box.eu\/en\/#\/schema\/person\/cc91d08618b3feeef6926591b465eab1"},"description":"Ollama with two GPUs in parallelUnlock Maximum Performance: Setting Up Ollama for Dual NVIDIA GPUs. Includes Separate Instances, Custom Ports, and Real Concurrency.","breadcrumb":{"@id":"https:\/\/ai-box.eu\/en\/large-language-models-en\/empowering-process-automation-with-n8n-ollama-and-open-source-llms\/how-to-scale-ollama-with-two-or-more-gpus\/1803\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/ai-box.eu\/en\/large-language-models-en\/empowering-process-automation-with-n8n-ollama-and-open-source-llms\/how-to-scale-ollama-with-two-or-more-gpus\/1803\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/ai-box.eu\/en\/large-language-models-en\/empowering-process-automation-with-n8n-ollama-and-open-source-llms\/how-to-scale-ollama-with-two-or-more-gpus\/1803\/#primaryimage","url":"https:\/\/ai-box.eu\/wp-content\/uploads\/2025\/11\/Ollama_multi_instance_logo-scaled.jpg","contentUrl":"https:\/\/ai-box.eu\/wp-content\/uploads\/2025\/11\/Ollama_multi_instance_logo-scaled.jpg","width":2560,"height":1396,"caption":"Ollama multi instance logo"},{"@type":"BreadcrumbList","@id":"https:\/\/ai-box.eu\/en\/large-language-models-en\/empowering-process-automation-with-n8n-ollama-and-open-source-llms\/how-to-scale-ollama-with-two-or-more-gpus\/1803\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Start","item":"https:\/\/ai-box.eu\/en\/"},{"@type":"ListItem","position":2,"name":"How to Scale Ollama with Two or More GPUs \u2013 Parallel Instances for Maximum Performance"}]},{"@type":"WebSite","@id":"https:\/\/ai-box.eu\/en\/#website","url":"https:\/\/ai-box.eu\/en\/","name":"Exploring the Future: Inside the AI Box","description":"Inside the AI Box, we share our experiences and discoveries in the world of artificial intelligence.","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/ai-box.eu\/en\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/ai-box.eu\/en\/#\/schema\/person\/cc91d08618b3feeef6926591b465eab1","name":"Maker","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g","caption":"Maker"},"description":"I live in Bavaria near Munich. In my head I always have many topics and try out especially in the field of Internet new media much in my spare time. I write on the blog because it makes me fun to report about the things that inspire me. I am happy about every comment, about suggestion and very about questions.","sameAs":["https:\/\/ai-box.eu"],"url":"https:\/\/ai-box.eu\/en\/author\/ingmars\/"}]}},"_links":{"self":[{"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/posts\/1803","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/comments?post=1803"}],"version-history":[{"count":0,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/posts\/1803\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/media\/1802"}],"wp:attachment":[{"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/media?parent=1803"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/categories?post=1803"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/tags?post=1803"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}