{"id":2556,"date":"2026-06-14T12:16:07","date_gmt":"2026-06-14T12:16:07","guid":{"rendered":"https:\/\/ai-box.eu\/?p=2556"},"modified":"2026-06-14T12:35:41","modified_gmt":"2026-06-14T12:35:41","slug":"nvidia-nim-locally-running-german-speech-recognition-as-a-microservice","status":"publish","type":"post","link":"https:\/\/ai-box.eu\/en\/news\/nvidia-nim-locally-running-german-speech-recognition-as-a-microservice\/2556\/","title":{"rendered":"NVIDIA NIM locally: running German speech recognition as a microservice"},"content":{"rendered":"<p>In my last post I ran <a href=\"https:\/\/ai-box.eu\/en\/news\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\/2530\/\" target=\"_blank\" rel=\"noopener\">NVIDIA Nemotron ASR Streaming directly with NeMo<\/a> locally. That was the &#8220;bare&#8221; route via the framework. In this post I go one step further and dive into <strong>NVIDIA NIM<\/strong>. NIM stands for NVIDIA Inference Microservices, the microservice variant NVIDIA uses to ship its models as ready-made, optimized containers. The goal: run a <strong>German speech recognition service<\/strong> as a local microservice that later slots cleanly into a complete voice agent. And, as always, the aim is for everything to run on my own hardware, locally, without the cloud.<\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_85 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/ai-box.eu\/en\/news\/nvidia-nim-locally-running-german-speech-recognition-as-a-microservice\/2556\/#What_is_NVIDIA_NIM\" >What is NVIDIA NIM?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/ai-box.eu\/en\/news\/nvidia-nim-locally-running-german-speech-recognition-as-a-microservice\/2556\/#The_goal_of_this_post\" >The goal of this post<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/ai-box.eu\/en\/news\/nvidia-nim-locally-running-german-speech-recognition-as-a-microservice\/2556\/#Does_it_really_stay_local_Riva_%E2%89%A0_cloud\" >Does it really stay local? Riva \u2260 cloud<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/ai-box.eu\/en\/news\/nvidia-nim-locally-running-german-speech-recognition-as-a-microservice\/2556\/#Requirements\" >Requirements<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/ai-box.eu\/en\/news\/nvidia-nim-locally-running-german-speech-recognition-as-a-microservice\/2556\/#Step_1_Create_an_NGC_account_and_API_key\" >Step 1: Create an NGC account and API key<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/ai-box.eu\/en\/news\/nvidia-nim-locally-running-german-speech-recognition-as-a-microservice\/2556\/#Step_2_Log_Docker_in_to_the_NVIDIA_registry\" >Step 2: Log Docker in to the NVIDIA registry<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/ai-box.eu\/en\/news\/nvidia-nim-locally-running-german-speech-recognition-as-a-microservice\/2556\/#Step_3_Prepare_the_API_key_and_cache\" >Step 3: Prepare the API key and cache<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/ai-box.eu\/en\/news\/nvidia-nim-locally-running-german-speech-recognition-as-a-microservice\/2556\/#Step_4_Start_the_German_ASR_NIM_in_streaming_mode\" >Step 4: Start the German ASR NIM in streaming mode<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/ai-box.eu\/en\/news\/nvidia-nim-locally-running-german-speech-recognition-as-a-microservice\/2556\/#Step_5_Check_the_container_status\" >Step 5: Check the container status<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/ai-box.eu\/en\/news\/nvidia-nim-locally-running-german-speech-recognition-as-a-microservice\/2556\/#Step_6_Quick_functional_test_on_the_command_line\" >Step 6: Quick functional test on the command line<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-11\" href=\"https:\/\/ai-box.eu\/en\/news\/nvidia-nim-locally-running-german-speech-recognition-as-a-microservice\/2556\/#Step_7_Dock_the_live_app_to_the_NIM\" >Step 7: Dock the live app to the NIM<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-12\" href=\"https:\/\/ai-box.eu\/en\/news\/nvidia-nim-locally-running-german-speech-recognition-as-a-microservice\/2556\/#Step_8_The_NIM_profiles_at_a_glance\" >Step 8: The NIM profiles at a glance<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-13\" href=\"https:\/\/ai-box.eu\/en\/news\/nvidia-nim-locally-running-german-speech-recognition-as-a-microservice\/2556\/#Tips_and_troubleshooting\" >Tips and troubleshooting<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-14\" href=\"https:\/\/ai-box.eu\/en\/news\/nvidia-nim-locally-running-german-speech-recognition-as-a-microservice\/2556\/#Conclusion\" >Conclusion<\/a><\/li><\/ul><\/nav><\/div>\n<h2><span class=\"ez-toc-section\" id=\"What_is_NVIDIA_NIM\"><\/span>What is NVIDIA NIM?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><strong>NIM<\/strong> (NVIDIA Inference Microservices) packages a model together with an optimized inference engine into a single Docker container. Instead of building an environment with PyTorch, NeMo and the matching dependencies yourself like in Part 1, you pull a ready-made container from the NVIDIA registry (`nvcr.io`) and start it. The container ships a standardized API (HTTP and gRPC), so you can address the service just like a cloud API only locally.<\/p>\n<p><strong>For me this is exciting for two reasons:<\/strong><\/p>\n<ul>\n<li>First, bringing it up is much faster and more reproducible than a manual framework build.<\/li>\n<li>Second, NIM is the building block NVIDIA also uses as the deployment layer in its own voice-agent architecture.<\/li>\n<\/ul>\n<p>So anyone who wants to build a complete voice agent later will hardly get around NIM.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"The_goal_of_this_post\"><\/span>The goal of this post<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>We run an <strong>ASR NIM for German<\/strong> based on the streaming-capable, multilingual <strong>Parakeet<\/strong> model. Unlike a plain file upload, this time it is about real <strong>live streaming<\/strong>: in the end you speak into the microphone and watch your transcript appear live in the browser and I will stress it once more, completely local, that is my goal.<\/p>\n<p>This ties directly into Part 1, and that is exactly where the interesting difference lies:<\/p>\n<ul>\n<li><strong>In Part 1<\/strong> the model ran <em>inside<\/em> your Python process (NeMo, cache-aware). The small Gradio web app loaded the model itself.<\/li>\n<li><strong>In this part<\/strong> the model sits behind a <strong>microservice<\/strong>. Our web app becomes a <strong>thin streaming client<\/strong> that sends the microphone audio chunk by chunk to the local NIM endpoint.<\/li>\n<\/ul>\n<p>Same UX in the browser, a completely different foundation and that is exactly what makes the microservice idea tangible. We wire up the app for it (<code>nim_asr_gradio_app.py<\/code>) to the service at the end of the post.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Does_it_really_stay_local_Riva_%E2%89%A0_cloud\"><\/span>Does it really stay local? Riva \u2260 cloud<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>A question that comes up right away: we are about to talk to the <strong>Riva client<\/strong> and an important point for me. Does using Riva send my voice data out of my network? <strong>No.<\/strong> &#8220;Riva&#8221; is not a cloud service but NVIDIA&#8217;s speech framework, i.e. the <strong>API layer<\/strong>. There is a Riva <em>client<\/em> and a Riva <em>server<\/em>. Where the data goes depends solely on which server the client points to:<\/p>\n<ul>\n<li><code>--server 0.0.0.0:50051<\/code> \u2192 your <strong>local NIM container<\/strong> on your own GPU. The audio runs over localhost, it does not leave the machine.<\/li>\n<li>If you point the client at an NVIDIA-hosted endpoint instead, <em>then<\/em> the audio would go to the cloud. We deliberately do not do that here.<\/li>\n<\/ul>\n<p>Once the container is running, you could even disconnect the machine from the internet (air-gapped) and transcription would keep working. The only outbound traffic is the one-time model download from <code>nvcr.io<\/code> on first start and the license check via the API key. Never your voice material.<\/p>\n<p><strong>Note:<\/strong> Parakeet is a great starting point because it supports streaming and is therefore closer to the later agent. If you need maximum accuracy or additional translation, you can use the <strong>Canary<\/strong> NIM instead. The steps are almost identical, only the container ID and profile change. That is exactly the topic of Part 3.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Requirements\"><\/span>Requirements<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<ul>\n<li><strong>Operating system:<\/strong> Linux (I use Ubuntu Server)<\/li>\n<li><strong>GPU:<\/strong> NVIDIA GPU with compute capability \u2265 8.0 and at least 16 GB VRAM. My RTX A6000 Ada (48 GB) clears that easily.<\/li>\n<li><strong>Docker<\/strong> and the <strong>NVIDIA Container Toolkit<\/strong> (you already have this if you run containers with GPU access)<\/li>\n<li>A <strong>current NVIDIA driver<\/strong><\/li>\n<li>A free <strong>NGC account<\/strong> and an <strong>NGC API key<\/strong> to pull the NIM containers<\/li>\n<li>A <strong>Python venv<\/strong> for the client and the web app (as in Part 1)<\/li>\n<\/ul>\n<h2><span class=\"ez-toc-section\" id=\"Step_1_Create_an_NGC_account_and_API_key\"><\/span>Step 1: Create an NGC account and API key<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>The NIM containers live in the NVIDIA registry and are access-restricted. You therefore need a free NGC account and a personal API key.<\/p>\n<p>First, register at <code>ngc.nvidia.com<\/code> so you can then generate an API key.<\/p>\n<p><strong>URL:<\/strong> <code>ngc.nvidia.com<\/code><\/p>\n<p>After successful registration, open the following page.<\/p>\n<p><strong>URL:<\/strong> <code>org.ngc.nvidia.com<\/code><\/p>\n<p>You should now see the following dashboard.<\/p>\n<div id=\"attachment_2546\" style=\"width: 794px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_ngc_dashboard_01-1024x737.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-2546\" class=\" wp-image-2546\" src=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_ngc_dashboard_01-1024x737.jpg\" alt=\"NVIDIA NGC dashboard organization management\" width=\"784\" height=\"564\" srcset=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_ngc_dashboard_01-1024x737.jpg 1024w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_ngc_dashboard_01-300x216.jpg 300w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_ngc_dashboard_01-768x553.jpg 768w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_ngc_dashboard_01-1080x777.jpg 1080w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_ngc_dashboard_01.jpg 1156w\" sizes=\"(max-width: 784px) 100vw, 784px\" \/><\/a><p id=\"caption-attachment-2546\" class=\"wp-caption-text\">NVIDIA NGC dashboard organization management<\/p><\/div>\n<p>Now open the following URL to create the personal key we need.<\/p>\n<p><strong>URL:<\/strong> <code>org.ngc.nvidia.com\/setup\/personal-keys<\/code><\/p>\n<p>On the Setup page you will see the button for the <strong>Personal Key<\/strong>. For our local NIM use you take exactly this one, so you click the <strong>&#8220;Generate Personal Key&#8221;<\/strong> button. In the dialog:<\/p>\n<ul>\n<li>Give it a <strong>name<\/strong> and choose a generous <strong>expiration<\/strong><\/li>\n<li>Under <strong>Services Included<\/strong> make sure to tick <strong>NGC Catalog<\/strong> (otherwise the container pull will not work later)<\/li>\n<\/ul>\n<div id=\"attachment_2548\" style=\"width: 1034px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_ngc_dashboard_personal_key_config_03-1024x676.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-2548\" class=\"size-large wp-image-2548\" src=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_ngc_dashboard_personal_key_config_03-1024x676.jpg\" alt=\"NVIDIA NGC dashboard personal key config\" width=\"1024\" height=\"676\" srcset=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_ngc_dashboard_personal_key_config_03-1024x676.jpg 1024w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_ngc_dashboard_personal_key_config_03-300x198.jpg 300w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_ngc_dashboard_personal_key_config_03-768x507.jpg 768w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_ngc_dashboard_personal_key_config_03-1536x1013.jpg 1536w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_ngc_dashboard_personal_key_config_03-1080x712.jpg 1080w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_ngc_dashboard_personal_key_config_03.jpg 1769w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/a><p id=\"caption-attachment-2548\" class=\"wp-caption-text\">NVIDIA NGC dashboard personal key config<\/p><\/div>\n<p>You do not need the <strong>Secrets Manager<\/strong> entry for our purpose, so I did not select it.<\/p>\n<p>Now continue and yes, store the key safely.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Step_2_Log_Docker_in_to_the_NVIDIA_registry\"><\/span>Step 2: Log Docker in to the NVIDIA registry<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>You use the API key to log Docker in to the registry <code>nvcr.io<\/code>. The username is the fixed value <code>$oauthtoken<\/code>, the password is your API key.<\/p>\n<p><strong>Command:<\/strong> <code>docker login nvcr.io<\/code><\/p>\n<p>At the prompt, enter <code>$oauthtoken<\/code> as the username and your NGC API key as the password.<\/p>\n<p>For me the terminal showed the following:<\/p>\n<blockquote><p>ingmar@A6000Ada:~$ docker login nvcr.io<br \/>\nUsername: $oauthtoken<br \/>\nPassword:<\/p>\n<p>WARNING! Your credentials are stored unencrypted in &#8216;\/home\/ingmar\/.docker\/config.json&#8217;.<br \/>\nConfigure a credential helper to remove this warning. See<br \/>\nhttps:\/\/docs.docker.com\/go\/credential-store\/<\/p>\n<p>Login Succeeded<br \/>\ningmar@A6000Ada:~$<\/p><\/blockquote>\n<p><strong>Note:<\/strong> You only really need the stored login to <strong>pull the container image<\/strong>. Once the image is on disk locally (after the first <code>docker run<\/code>), you can remove the credentials again with the following command.<\/p>\n<p><strong>Command:<\/strong> <code>docker logout nvcr.io<\/code><\/p>\n<p>This removes the credentials from the file. The image stays in the local cache, and the running container authenticates via the <code>NGC_API_KEY<\/code> environment variable, not via the Docker login. Only when you want to update the image do you log in again briefly.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Step_3_Prepare_the_API_key_and_cache\"><\/span>Step 3: Prepare the API key and cache<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>So the container can use the key, we store it as an environment variable. We also set up a local cache directory so the model is not downloaded again on every start.<\/p>\n<p><strong>Command:<\/strong> <code>export NGC_API_KEY=\"nvapi-xxxxxxxxxxxxxxxxxxxxx\"<\/code><\/p>\n<p>In addition we create a local cache directory. The background: on first start the NIM downloads the model (several GB) from the NGC registry. Since we start the container with <code>--rm<\/code>, it is completely deleted when it stops. If the cache lived only inside the container, the model would be re-downloaded on every restart. That is why in Step 4 we mount this host directory into the container (<code>-v ~\/.cache\/nim:\/opt\/nim\/.cache<\/code>). The model then stays on your machine&#8217;s disk, and the service is ready in seconds on the second start instead of after a minutes-long download.<\/p>\n<p>We create the directory ourselves beforehand so it belongs to your user and is not created by Docker as <code>root<\/code>:<\/p>\n<p><strong>Command:<\/strong> <code>mkdir -p ~\/.cache\/nim<\/code><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Step_4_Start_the_German_ASR_NIM_in_streaming_mode\"><\/span>Step 4: Start the German ASR NIM in streaming mode<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Now we start the actual microservice. The basic structure of a NIM start looks like this: we pass the GPU through, set the API key, open the HTTP and gRPC ports and pick a model profile via <code>NIM_TAGS_SELECTOR<\/code>.<\/p>\n<p>Since our goal is the <strong>live voice path<\/strong>, we take the <strong>streaming profile with low latency<\/strong>: <code>mode=str<\/code>. As the model we use the multilingual Parakeet, which includes German out of the box:<\/p>\n<p><strong>Command:<\/strong> <code>export CONTAINER_ID=parakeet-1-1b-rnnt-multilingual<\/code><\/p>\n<p><strong>Command:<\/strong> <code>export NIM_TAGS_SELECTOR=\"mode=str\"<\/code><\/p>\n<p><strong>Command:<\/strong> <code>docker run -it --rm --name=$CONTAINER_ID --runtime=nvidia --gpus '\"device=0\"' --shm-size=8GB -e NGC_API_KEY -e NIM_HTTP_API_PORT=9000 -e NIM_GRPC_API_PORT=50051 -p 9000:9000 -p 50051:50051 -e NIM_TAGS_SELECTOR -v ~\/.cache\/nim:\/opt\/nim\/.cache nvcr.io\/nim\/nvidia\/$CONTAINER_ID:latest<\/code><\/p>\n<p>The first start takes a while because the container downloads the model and prepares the inference engine. The service is ready once a corresponding &#8220;running&#8221; line appears in the logs.<\/p>\n<div id=\"attachment_2550\" style=\"width: 999px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_nim_container_setup_04.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-2550\" class=\"size-full wp-image-2550\" src=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_nim_container_setup_04.jpg\" alt=\"NVIDIA NIM container setup\" width=\"989\" height=\"530\" srcset=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_nim_container_setup_04.jpg 989w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_nim_container_setup_04-300x161.jpg 300w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_nim_container_setup_04-768x412.jpg 768w\" sizes=\"(max-width: 989px) 100vw, 989px\" \/><\/a><p id=\"caption-attachment-2550\" class=\"wp-caption-text\">NVIDIA NIM container setup<\/p><\/div>\n<p><strong>Note on container ID and profile:<\/strong> Both come from the current NIM ASR support matrix. Should NVIDIA rename the model or change profiles, you will find the valid values there or on the model page at <code>build.nvidia.com<\/code>. We look at the available profiles (<code>str<\/code>, <code>str-thr<\/code>, <code>ofl<\/code>) more closely in Step 8.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Step_5_Check_the_container_status\"><\/span>Step 5: Check the container status<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>In a second terminal you can check whether the service is running.<\/p>\n<p><strong>Command:<\/strong> <code>docker ps<\/code><\/p>\n<p><strong>Command:<\/strong> <code>curl http:\/\/localhost:9000\/v1\/health\/ready<\/code><\/p>\n<p>If the health check answers as shown below, your ASR microservice is up.<\/p>\n<blockquote><p><code>ingmar@A6000Ada:~$ curl http:\/\/localhost:9000\/v1\/health\/ready<\/code><br \/>\n<code>{\"object\":\"health.response\",\"message\":\"ready\",\"status\":\"ready\"}ingmar@A6000Ada:~$<\/code><\/p><\/blockquote>\n<h2><span class=\"ez-toc-section\" id=\"Step_6_Quick_functional_test_on_the_command_line\"><\/span>Step 6: Quick functional test on the command line<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Before we hook up the web app, we quickly check with the official Riva client whether the service transcribes correctly. Since the service runs locally, we do not need cloud authentication &#8211; we simply point at <code>localhost<\/code> and the gRPC port.<\/p>\n<p>First we create another virtual environment with the following command.<\/p>\n<p><strong>Command:<\/strong> <code>python3 -m venv ~\/venvs\/riva-client<\/code><\/p>\n<p>Now we activate this virtual environment.<\/p>\n<p><strong>Command:<\/strong> <code>source ~\/venvs\/riva-client\/bin\/activate<\/code><\/p>\n<p>Important: from now on we always run the following commands inside this venv. In other words, it must always be active, or be re-activated when, for example, the machine was rebooted or the SSH connection had to be rebuilt.<\/p>\n<p><strong>Command:<\/strong> <code>pip install nvidia-riva-client<\/code><\/p>\n<p>The following command downloads the nvidia-riva repository onto the machine.<\/p>\n<p><strong>Command:<\/strong> <code>git clone https:\/\/github.com\/nvidia-riva\/python-clients.git<\/code><\/p>\n<p>Because we run the streaming profile, we use the <strong>streaming script<\/strong> <code>transcribe_file.py<\/code> that is already included. It streams the file chunk by chunk to the service, which is exactly the behavior we need live in a moment.<\/p>\n<p>Note: as in Part 1, the rule is <strong>mono WAV, 16 kHz<\/strong> &#8211; convert with ffmpeg beforehand if in doubt.<\/p>\n<p>Make sure that a file <code>audio.wav<\/code>, recorded in mono and not in stereo, sits in a folder such as <code>\/home\/ingmar\/audio\/<\/code>.<\/p>\n<p><strong>Command:<\/strong> <code>python python-clients\/scripts\/asr\/transcribe_file.py --server 0.0.0.0:50051 --language-code de-DE --automatic-punctuation --input-file \/home\/ingmar\/audio\/audio.wav<\/code><\/p>\n<p>For me the spoken text was then printed in the terminal.<\/p>\n<p><strong>Check the language code:<\/strong> The multilingual Parakeet partly detects the spoken language automatically, and the exact expected language codes depend on the loaded profile. The following call lists which models and codes your container actually offers. We then enter the matching value in one place in the web app:<\/p>\n<pre><code class=\"language-bash\">python python-clients\/scripts\/asr\/transcribe_file.py \\\r\n    --server 0.0.0.0:50051 --list-models<\/code><\/pre>\n<p>If a clean German transcript with punctuation comes back, the foundation is done and we can dock the live app.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Step_7_Dock_the_live_app_to_the_NIM\"><\/span>Step 7: Dock the live app to the NIM<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Now comes the bridge to Part 1. Instead of loading the model itself like before, our Gradio app becomes a <strong>streaming client<\/strong>: it captures the microphone, converts the audio to mono\/16 kHz and sends it chunk by chunk via the Riva gRPC streaming API to <code>localhost:50051<\/code>. The transcripts come back live into the browser.<\/p>\n<p>We install the required packages in the <code>riva-client<\/code> venv:<\/p>\n<p><strong>Command:<\/strong> <code>pip install gradio nvidia-riva-client numpy scipy<\/code><\/p>\n<p>Then you start the app. Enter the language code from Step 6 at the top of the file under <code>LANGUAGE_CODE<\/code>; <code>NIM_SERVER<\/code> stays at <code>localhost:50051<\/code>.<\/p>\n<p><strong>Command:<\/strong> <code>python nim_asr_gradio_app.py<\/code><\/p>\n<p><strong>Download:<\/strong> <a href=\"https:\/\/github.com\/custom-build-robots\/nemotron-asr-local-streaming-demo\" target=\"_blank\" rel=\"noopener\">You can download the Python program here.<\/a><\/p>\n<p>As in Part 1, for microphone access you need either an <strong>SSH tunnel<\/strong> to the server or the <strong>Gradio share URL<\/strong> (<code>share=True<\/code>), because without HTTPS the browser will not grant microphone access.<\/p>\n<p><strong>Web app screenshot:<\/strong><\/p>\n<div id=\"attachment_2553\" style=\"width: 1034px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_nim_web_app_05-1024x702.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-2553\" class=\"size-large wp-image-2553\" src=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_nim_web_app_05-1024x702.jpg\" alt=\"NVIDIA NIM ASR web app\" width=\"1024\" height=\"702\" srcset=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_nim_web_app_05-1024x702.jpg 1024w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_nim_web_app_05-300x206.jpg 300w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_nim_web_app_05-768x526.jpg 768w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_nim_web_app_05-1080x740.jpg 1080w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_nim_web_app_05.jpg 1112w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/a><p id=\"caption-attachment-2553\" class=\"wp-caption-text\">NVIDIA NIM ASR web app<\/p><\/div>\n<h2><span class=\"ez-toc-section\" id=\"Step_8_The_NIM_profiles_at_a_glance\"><\/span>Step 8: The NIM profiles at a glance<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>NIM provides several profiles for a model, which you select via <code>NIM_TAGS_SELECTOR<\/code>. For the later voice agent the low-latency streaming profile is decisive. That is why I chose <code>mode=str<\/code> above.<\/p>\n<table>\n<thead>\n<tr>\n<th>NIM_TAGS_SELECTOR<\/th>\n<th>Mode<\/th>\n<th>When to use?<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><code>mode=str<\/code><\/td>\n<td>Streaming, low latency<\/td>\n<td>Live transcription, voice agent <strong>(our choice)<\/strong><\/td>\n<\/tr>\n<tr>\n<td><code>mode=str-thr<\/code><\/td>\n<td>Streaming, high throughput<\/td>\n<td>Many parallel streams at once<\/td>\n<\/tr>\n<tr>\n<td><code>mode=ofl<\/code><\/td>\n<td>Offline \/ batch<\/td>\n<td>Whole files at once, highest accuracy<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2><span class=\"ez-toc-section\" id=\"Tips_and_troubleshooting\"><\/span>Tips and troubleshooting<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<ul>\n<li><strong>Keep an eye on VRAM:<\/strong> Use <code>nvidia-smi<\/code> to check that the container occupies the GPU and that enough memory is free.<\/li>\n<li><strong>Wrong GPU:<\/strong> With <code>--gpus '\"device=0\"'<\/code> you select a specific card. On a multi-GPU machine that matters.<\/li>\n<li><strong>Audio format:<\/strong> Mono, 16 kHz &#8211; the most common source of errors, just like with Nemotron in Part 1. The web app normalizes the microphone audio automatically before sending.<\/li>\n<li><strong>Microphone in the browser:<\/strong> No microphone without HTTPS. Either set up an SSH tunnel to the server or use the Gradio share URL.<\/li>\n<li><strong>Ports in use:<\/strong> If 9000 or 50051 are already taken, simply remap them to free ports in the <code>docker run<\/code> command.<\/li>\n<li><strong>Empty transcript despite a working microphone:<\/strong> If you adapt the app, note that Gradio delivers the microphone audio already on the <strong>int16 scale<\/strong> (values up to \u00b132767), often as a float dtype, <em>not<\/em> in the [-1, 1] range. Clipping it to [-1, 1] before sending crushes the signal into full-scale noise and the service recognizes nothing. The audio must stay on the int16 scale (exactly what the <code>to_pcm16_16k<\/code> function in the app does).<\/li>\n<\/ul>\n<h2><span class=\"ez-toc-section\" id=\"Conclusion\"><\/span>Conclusion<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>With NIM, the manual framework build from Part 1 becomes a clean, reproducible microservice: one container, a standardized API, German live speech recognition on your own hardware. The same web app as in Part 1, but with a swapped foundation. The key change is that the model is now a service and the app is merely its client. This fits exactly the idea of <strong>sovereign AI<\/strong> that runs through my entire stack. The data is mine and stays with me, and so does control.<\/p>\n<p>In the next part I take on the <strong>sister model<\/strong>: <strong>NVIDIA Canary<\/strong> as a NIM. Canary shines at accuracy and can additionally translate, producing English text directly from German audio. After that, Magpie TTS adds the voice output, and step by step the complete local voice agent grows out of it.<\/p>\n<p>If you rebuild the setup: drop me a comment about which model profile gives you the best balance of latency and accuracy on which hardware.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In my last post I ran NVIDIA Nemotron ASR Streaming directly with NeMo locally. That was the &#8220;bare&#8221; route via the framework. In this post I go one step further and dive into NVIDIA NIM. NIM stands for NVIDIA Inference Microservices, the microservice variant NVIDIA uses to ship its models as ready-made, optimized containers. The [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":2554,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_et_pb_use_builder":"","_et_pb_old_content":"","_et_gb_content_width":"","footnotes":""},"categories":[162,8,50],"tags":[353,1613,1614,692,1031,1585,1418,1623,1625,1620,1617,1612,1621,1618,1619,1616,315,1032,1615,1622,1624],"class_list":["post-2556","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-large-language-models-en","category-news","category-top-story-en","tag-docker","tag-german-speech-recognition","tag-german-speech-to-text","tag-gradio","tag-local-ai","tag-local-asr","tag-lokale-ki","tag-ngc-api-key","tag-nvcr-io","tag-nvidia-inference-microservices","tag-nvidia-nim","tag-nvidia-nim-locally","tag-nvidia-riva","tag-parakeet","tag-parakeet-nim","tag-real-time-transcription","tag-rtx-a6000-en","tag-sovereign-ai","tag-speech-recognition-microservice","tag-streaming-asr","tag-voice-agent","et-has-post-format-content","et_post_format-et-post-format-standard"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.8 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>NVIDIA NIM locally: running German speech recognition as a microservice - Exploring the Future: Inside the AI Box<\/title>\n<meta name=\"description\" content=\"Run NVIDIA NIM locally: German speech recognition as a microservice with Parakeet \u2014 streaming-capable, live in the browser and entirely on your own hardware.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/ai-box.eu\/en\/news\/nvidia-nim-locally-running-german-speech-recognition-as-a-microservice\/2556\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"NVIDIA NIM locally: running German speech recognition as a microservice - Exploring the Future: Inside the AI Box\" \/>\n<meta property=\"og:description\" content=\"Run NVIDIA NIM locally: German speech recognition as a microservice with Parakeet \u2014 streaming-capable, live in the browser and entirely on your own hardware.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/ai-box.eu\/en\/news\/nvidia-nim-locally-running-german-speech-recognition-as-a-microservice\/2556\/\" \/>\n<meta property=\"og:site_name\" content=\"Exploring the Future: Inside the AI Box\" \/>\n<meta property=\"article:published_time\" content=\"2026-06-14T12:16:07+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-06-14T12:35:41+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_nim_web_app_05.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1112\" \/>\n\t<meta property=\"og:image:height\" content=\"762\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Maker\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Ingmar_Stapel\" \/>\n<meta name=\"twitter:site\" content=\"@Ingmar_Stapel\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Maker\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"12 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/nvidia-nim-locally-running-german-speech-recognition-as-a-microservice\\\/2556\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/nvidia-nim-locally-running-german-speech-recognition-as-a-microservice\\\/2556\\\/\"},\"author\":{\"name\":\"Maker\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/#\\\/schema\\\/person\\\/cc91d08618b3feeef6926591b465eab1\"},\"headline\":\"NVIDIA NIM locally: running German speech recognition as a microservice\",\"datePublished\":\"2026-06-14T12:16:07+00:00\",\"dateModified\":\"2026-06-14T12:35:41+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/nvidia-nim-locally-running-german-speech-recognition-as-a-microservice\\\/2556\\\/\"},\"wordCount\":2208,\"commentCount\":0,\"image\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/nvidia-nim-locally-running-german-speech-recognition-as-a-microservice\\\/2556\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/ai-box.eu\\\/wp-content\\\/uploads\\\/2026\\\/06\\\/NVIDIA_nim_web_app_05.jpg\",\"keywords\":[\"Docker\",\"German speech recognition\",\"German speech-to-text\",\"Gradio\",\"local AI\",\"local ASR\",\"lokale KI\",\"NGC API-Key\",\"nvcr.io\",\"NVIDIA Inference Microservices\",\"NVIDIA NIM\",\"NVIDIA NIM locally\",\"NVIDIA Riva\",\"Parakeet\",\"Parakeet NIM\",\"real-time transcription\",\"RTX A6000\",\"sovereign AI\",\"speech recognition microservice\",\"Streaming ASR\",\"Voice Agent\"],\"articleSection\":[\"Large Language Models\",\"News\",\"Top story\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/nvidia-nim-locally-running-german-speech-recognition-as-a-microservice\\\/2556\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/nvidia-nim-locally-running-german-speech-recognition-as-a-microservice\\\/2556\\\/\",\"url\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/nvidia-nim-locally-running-german-speech-recognition-as-a-microservice\\\/2556\\\/\",\"name\":\"NVIDIA NIM locally: running German speech recognition as a microservice - Exploring the Future: Inside the AI Box\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/nvidia-nim-locally-running-german-speech-recognition-as-a-microservice\\\/2556\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/nvidia-nim-locally-running-german-speech-recognition-as-a-microservice\\\/2556\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/ai-box.eu\\\/wp-content\\\/uploads\\\/2026\\\/06\\\/NVIDIA_nim_web_app_05.jpg\",\"datePublished\":\"2026-06-14T12:16:07+00:00\",\"dateModified\":\"2026-06-14T12:35:41+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/#\\\/schema\\\/person\\\/cc91d08618b3feeef6926591b465eab1\"},\"description\":\"Run NVIDIA NIM locally: German speech recognition as a microservice with Parakeet \u2014 streaming-capable, live in the browser and entirely on your own hardware.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/nvidia-nim-locally-running-german-speech-recognition-as-a-microservice\\\/2556\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/nvidia-nim-locally-running-german-speech-recognition-as-a-microservice\\\/2556\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/nvidia-nim-locally-running-german-speech-recognition-as-a-microservice\\\/2556\\\/#primaryimage\",\"url\":\"https:\\\/\\\/ai-box.eu\\\/wp-content\\\/uploads\\\/2026\\\/06\\\/NVIDIA_nim_web_app_05.jpg\",\"contentUrl\":\"https:\\\/\\\/ai-box.eu\\\/wp-content\\\/uploads\\\/2026\\\/06\\\/NVIDIA_nim_web_app_05.jpg\",\"width\":1112,\"height\":762,\"caption\":\"NVIDIA NIM ASR Web-App\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/nvidia-nim-locally-running-german-speech-recognition-as-a-microservice\\\/2556\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Start\",\"item\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"NVIDIA NIM locally: running German speech recognition as a microservice\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/#website\",\"url\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/\",\"name\":\"Exploring the Future: Inside the AI Box\",\"description\":\"Inside the AI Box, we share our experiences and discoveries in the world of artificial intelligence.\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/#\\\/schema\\\/person\\\/cc91d08618b3feeef6926591b465eab1\",\"name\":\"Maker\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g\",\"caption\":\"Maker\"},\"description\":\"I live in Bavaria near Munich. In my head I always have many topics and try out especially in the field of Internet new media much in my spare time. I write on the blog because it makes me fun to report about the things that inspire me. I am happy about every comment, about suggestion and very about questions.\",\"sameAs\":[\"https:\\\/\\\/ai-box.eu\"],\"url\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/author\\\/ingmars\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"NVIDIA NIM locally: running German speech recognition as a microservice - Exploring the Future: Inside the AI Box","description":"Run NVIDIA NIM locally: German speech recognition as a microservice with Parakeet \u2014 streaming-capable, live in the browser and entirely on your own hardware.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/ai-box.eu\/en\/news\/nvidia-nim-locally-running-german-speech-recognition-as-a-microservice\/2556\/","og_locale":"en_US","og_type":"article","og_title":"NVIDIA NIM locally: running German speech recognition as a microservice - Exploring the Future: Inside the AI Box","og_description":"Run NVIDIA NIM locally: German speech recognition as a microservice with Parakeet \u2014 streaming-capable, live in the browser and entirely on your own hardware.","og_url":"https:\/\/ai-box.eu\/en\/news\/nvidia-nim-locally-running-german-speech-recognition-as-a-microservice\/2556\/","og_site_name":"Exploring the Future: Inside the AI Box","article_published_time":"2026-06-14T12:16:07+00:00","article_modified_time":"2026-06-14T12:35:41+00:00","og_image":[{"width":1112,"height":762,"url":"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_nim_web_app_05.jpg","type":"image\/jpeg"}],"author":"Maker","twitter_card":"summary_large_image","twitter_creator":"@Ingmar_Stapel","twitter_site":"@Ingmar_Stapel","twitter_misc":{"Written by":"Maker","Est. reading time":"12 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/ai-box.eu\/en\/news\/nvidia-nim-locally-running-german-speech-recognition-as-a-microservice\/2556\/#article","isPartOf":{"@id":"https:\/\/ai-box.eu\/en\/news\/nvidia-nim-locally-running-german-speech-recognition-as-a-microservice\/2556\/"},"author":{"name":"Maker","@id":"https:\/\/ai-box.eu\/en\/#\/schema\/person\/cc91d08618b3feeef6926591b465eab1"},"headline":"NVIDIA NIM locally: running German speech recognition as a microservice","datePublished":"2026-06-14T12:16:07+00:00","dateModified":"2026-06-14T12:35:41+00:00","mainEntityOfPage":{"@id":"https:\/\/ai-box.eu\/en\/news\/nvidia-nim-locally-running-german-speech-recognition-as-a-microservice\/2556\/"},"wordCount":2208,"commentCount":0,"image":{"@id":"https:\/\/ai-box.eu\/en\/news\/nvidia-nim-locally-running-german-speech-recognition-as-a-microservice\/2556\/#primaryimage"},"thumbnailUrl":"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_nim_web_app_05.jpg","keywords":["Docker","German speech recognition","German speech-to-text","Gradio","local AI","local ASR","lokale KI","NGC API-Key","nvcr.io","NVIDIA Inference Microservices","NVIDIA NIM","NVIDIA NIM locally","NVIDIA Riva","Parakeet","Parakeet NIM","real-time transcription","RTX A6000","sovereign AI","speech recognition microservice","Streaming ASR","Voice Agent"],"articleSection":["Large Language Models","News","Top story"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/ai-box.eu\/en\/news\/nvidia-nim-locally-running-german-speech-recognition-as-a-microservice\/2556\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/ai-box.eu\/en\/news\/nvidia-nim-locally-running-german-speech-recognition-as-a-microservice\/2556\/","url":"https:\/\/ai-box.eu\/en\/news\/nvidia-nim-locally-running-german-speech-recognition-as-a-microservice\/2556\/","name":"NVIDIA NIM locally: running German speech recognition as a microservice - Exploring the Future: Inside the AI Box","isPartOf":{"@id":"https:\/\/ai-box.eu\/en\/#website"},"primaryImageOfPage":{"@id":"https:\/\/ai-box.eu\/en\/news\/nvidia-nim-locally-running-german-speech-recognition-as-a-microservice\/2556\/#primaryimage"},"image":{"@id":"https:\/\/ai-box.eu\/en\/news\/nvidia-nim-locally-running-german-speech-recognition-as-a-microservice\/2556\/#primaryimage"},"thumbnailUrl":"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_nim_web_app_05.jpg","datePublished":"2026-06-14T12:16:07+00:00","dateModified":"2026-06-14T12:35:41+00:00","author":{"@id":"https:\/\/ai-box.eu\/en\/#\/schema\/person\/cc91d08618b3feeef6926591b465eab1"},"description":"Run NVIDIA NIM locally: German speech recognition as a microservice with Parakeet \u2014 streaming-capable, live in the browser and entirely on your own hardware.","breadcrumb":{"@id":"https:\/\/ai-box.eu\/en\/news\/nvidia-nim-locally-running-german-speech-recognition-as-a-microservice\/2556\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/ai-box.eu\/en\/news\/nvidia-nim-locally-running-german-speech-recognition-as-a-microservice\/2556\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/ai-box.eu\/en\/news\/nvidia-nim-locally-running-german-speech-recognition-as-a-microservice\/2556\/#primaryimage","url":"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_nim_web_app_05.jpg","contentUrl":"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_nim_web_app_05.jpg","width":1112,"height":762,"caption":"NVIDIA NIM ASR Web-App"},{"@type":"BreadcrumbList","@id":"https:\/\/ai-box.eu\/en\/news\/nvidia-nim-locally-running-german-speech-recognition-as-a-microservice\/2556\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Start","item":"https:\/\/ai-box.eu\/en\/"},{"@type":"ListItem","position":2,"name":"NVIDIA NIM locally: running German speech recognition as a microservice"}]},{"@type":"WebSite","@id":"https:\/\/ai-box.eu\/en\/#website","url":"https:\/\/ai-box.eu\/en\/","name":"Exploring the Future: Inside the AI Box","description":"Inside the AI Box, we share our experiences and discoveries in the world of artificial intelligence.","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/ai-box.eu\/en\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/ai-box.eu\/en\/#\/schema\/person\/cc91d08618b3feeef6926591b465eab1","name":"Maker","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g","caption":"Maker"},"description":"I live in Bavaria near Munich. In my head I always have many topics and try out especially in the field of Internet new media much in my spare time. I write on the blog because it makes me fun to report about the things that inspire me. I am happy about every comment, about suggestion and very about questions.","sameAs":["https:\/\/ai-box.eu"],"url":"https:\/\/ai-box.eu\/en\/author\/ingmars\/"}]}},"_links":{"self":[{"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/posts\/2556","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/comments?post=2556"}],"version-history":[{"count":1,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/posts\/2556\/revisions"}],"predecessor-version":[{"id":2557,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/posts\/2556\/revisions\/2557"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/media\/2554"}],"wp:attachment":[{"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/media?parent=2556"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/categories?post=2556"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/tags?post=2556"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}