{"id":2562,"date":"2026-06-14T14:46:17","date_gmt":"2026-06-14T14:46:17","guid":{"rendered":"https:\/\/ai-box.eu\/?p=2562"},"modified":"2026-06-14T14:52:20","modified_gmt":"2026-06-14T14:52:20","slug":"nvidia-canary-locally-multilingual-speech-recognition-and-translation-as-a-nim","status":"publish","type":"post","link":"https:\/\/ai-box.eu\/en\/news\/nvidia-canary-locally-multilingual-speech-recognition-and-translation-as-a-nim\/2562\/","title":{"rendered":"NVIDIA Canary locally: multilingual speech recognition and translation as a NIM"},"content":{"rendered":"<p>In <a href=\"https:\/\/ai-box.eu\/en\/news\/nvidia-nim-locally-running-german-speech-recognition-as-a-microservice\/2556\/\" target=\"_blank\" rel=\"noopener\">Part 2<\/a> I set up Parakeet as a streaming-capable ASR NIM for German and ran it as a live service with low latency. In this post I take on the <strong>sister model<\/strong>: <strong>NVIDIA Canary<\/strong> as a NIM. Canary does not shine at latency but at <strong>accuracy<\/strong>, and it can do something Parakeet cannot: <strong>translate<\/strong>. Produce English text directly from German audio, without a detour. This rounds out the ASR side of my local stack natively on NVIDIA. As always in this little series: everything local, on my own hardware, without the cloud.<\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_85 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/ai-box.eu\/en\/news\/nvidia-canary-locally-multilingual-speech-recognition-and-translation-as-a-nim\/2562\/#What_is_NVIDIA_Canary\" >What is NVIDIA Canary?<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/ai-box.eu\/en\/news\/nvidia-canary-locally-multilingual-speech-recognition-and-translation-as-a-nim\/2562\/#Canary_or_Parakeet_%E2%80%93_when_to_use_which\" >Canary or Parakeet &#8211; when to use which?<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/ai-box.eu\/en\/news\/nvidia-canary-locally-multilingual-speech-recognition-and-translation-as-a-nim\/2562\/#The_goal_of_this_post\" >The goal of this post<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/ai-box.eu\/en\/news\/nvidia-canary-locally-multilingual-speech-recognition-and-translation-as-a-nim\/2562\/#Requirements\" >Requirements<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/ai-box.eu\/en\/news\/nvidia-canary-locally-multilingual-speech-recognition-and-translation-as-a-nim\/2562\/#Step_1_Start_the_Canary_NIM\" >Step 1: Start the Canary NIM<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/ai-box.eu\/en\/news\/nvidia-canary-locally-multilingual-speech-recognition-and-translation-as-a-nim\/2562\/#Step_2_Check_the_container_status\" >Step 2: Check the container status<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/ai-box.eu\/en\/news\/nvidia-canary-locally-multilingual-speech-recognition-and-translation-as-a-nim\/2562\/#Step_3_Transcribe_German_offline\" >Step 3: Transcribe German (offline)<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/ai-box.eu\/en\/news\/nvidia-canary-locally-multilingual-speech-recognition-and-translation-as-a-nim\/2562\/#Step_4_Translate_instead_of_just_transcribing_AST\" >Step 4: Translate instead of just transcribing (AST)<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/ai-box.eu\/en\/news\/nvidia-canary-locally-multilingual-speech-recognition-and-translation-as-a-nim\/2562\/#Tips_and_troubleshooting\" >Tips and troubleshooting<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/ai-box.eu\/en\/news\/nvidia-canary-locally-multilingual-speech-recognition-and-translation-as-a-nim\/2562\/#Conclusion\" >Conclusion<\/a><\/li><\/ul><\/nav><\/div>\n<h2><span class=\"ez-toc-section\" id=\"What_is_NVIDIA_Canary\"><\/span>What is NVIDIA Canary?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Canary is an <strong>attention encoder-decoder model<\/strong> (FastConformer encoder plus Transformer decoder), multilingual and <strong>multi-task<\/strong>: in a single model it handles both speech recognition (ASR) and speech translation (AST, Automatic Speech Translation). Its strengths are high accuracy and, precisely, translation.<\/p>\n<p>An important difference from Parakeet: <strong>Canary runs in offline mode only.<\/strong> So there is no continuous streaming like with Parakeet. Canary takes the whole audio file (or the fully recorded buffer) and delivers the result afterwards. That is no good for a live voice agent, but it is ideal for maximum accuracy and for translation.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Canary_or_Parakeet_%E2%80%93_when_to_use_which\"><\/span>Canary or Parakeet &#8211; when to use which?<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>A quick classification so it is clear when you take which model:<\/p>\n<table>\n<thead>\n<tr>\n<th>Criterion<\/th>\n<th>Parakeet (Part 2)<\/th>\n<th>Canary (this part)<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Mode<\/td>\n<td>Streaming + offline<\/td>\n<td><strong>offline \/ batch only<\/strong><\/td>\n<\/tr>\n<tr>\n<td>Latency<\/td>\n<td>very low (live)<\/td>\n<td>higher (whole file)<\/td>\n<\/tr>\n<tr>\n<td>Strength<\/td>\n<td>live transcription<\/td>\n<td>accuracy<\/td>\n<\/tr>\n<tr>\n<td>Translation<\/td>\n<td>no<\/td>\n<td><strong>yes (AST, de\u2194en)<\/strong><\/td>\n<\/tr>\n<tr>\n<td>Typical use<\/td>\n<td>voice agent, live captions<\/td>\n<td>transcripts, highest accuracy, translation<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2><span class=\"ez-toc-section\" id=\"The_goal_of_this_post\"><\/span>The goal of this post<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>We run the Canary NIM locally and use it to translate a German recording directly into <strong>English<\/strong>.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Requirements\"><\/span>Requirements<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>If you have been through Part 2, the groundwork is already in place and we only reference it briefly:<\/p>\n<ul>\n<li>NGC account and API key, <code>docker login<\/code> to <code>nvcr.io<\/code> (see Part 2)<\/li>\n<li>the <code>riva-client<\/code> venv with <code>nvidia-riva-client<\/code> installed<\/li>\n<li>the cloned <code>python-clients<\/code> repository<\/li>\n<li>GPU \u2265 compute capability 8.0 \u2013 Canary 1B is small, so little VRAM is plenty<\/li>\n<\/ul>\n<p>So you don&#8217;t need to install anything new. <strong>One important point:<\/strong> Canary and Parakeet use the same ports (9000\/50051). You therefore cannot run both at the same time. Before this step, stop the Parakeet container (<code>Ctrl + C<\/code> in its terminal) or give Canary different ports.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Step_1_Start_the_Canary_NIM\"><\/span>Step 1: Start the Canary NIM<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>If your API key is no longer set in the current terminal session, set it again:<\/p>\n<p><strong>Command:<\/strong> <code>export NGC_API_KEY=\"nvapi-xxxxxxxxxxxxxxxxxxxxx\"<\/code><\/p>\n<p>Then you pick the container and profile. Unlike Parakeet (<code>mode=str<\/code>), Canary only offers the offline mode <code>mode=ofl<\/code>:<\/p>\n<p><strong>Command:<\/strong> <code>export CONTAINER_ID=canary-1b<\/code><\/p>\n<p><strong>Command:<\/strong> <code>export NIM_TAGS_SELECTOR=\"mode=ofl\"<\/code><\/p>\n<p><strong>Command:<\/strong> <code>docker run -it --rm --name=$CONTAINER_ID --runtime=nvidia --gpus '\"device=0\"' --shm-size=8GB -e NGC_API_KEY -e NIM_HTTP_API_PORT=9000 -e NIM_GRPC_API_PORT=50051 -p 9000:9000 -p 50051:50051 -e NIM_TAGS_SELECTOR -v ~\/.cache\/nim:\/opt\/nim\/.cache nvcr.io\/nim\/nvidia\/$CONTAINER_ID:latest<\/code><\/p>\n<p>You can reuse the cache directory <code>~\/.cache\/nim<\/code> from Part 2. The first start takes a while here too (model download and TensorRT engine build, up to ~30 minutes depending on network and GPU). The service is ready once the &#8220;running&#8221; line, or rather &#8220;Application is ready to receive API requests&#8221;, appears in the logs.<\/p>\n<p>After you submit the <code>docker run<\/code>, a lot happens in sequence on the very first start, and that is exactly what scrolls past in the (admittedly very black) terminal. In order:<\/p>\n<ul>\n<li><strong>Image download:<\/strong> Docker pulls the Canary container from <code>nvcr.io<\/code> (the many &#8220;Pull complete&#8221; lines).<\/li>\n<li><strong>Profile selection:<\/strong> the NIM detects my GPU (RTX 6000 Ada, compute capability 8.9) and automatically picks the matching offline profile (<code>mode=ofl<\/code>, RMIR format).<\/li>\n<li><strong>Model download:<\/strong> it downloads the actual model <code>asr-canary-1b-flash-multi-offline-trtllm.rmir<\/code> (about 3.4 GB) into the cache.<\/li>\n<li><strong>Engine build:<\/strong> from the RMIR the container builds the optimized inference engines. For Canary this runs via <strong>TensorRT-LLM<\/strong> (encoder as a TensorRT engine, decoder via TensorRT-LLM) \u2013 this is the step that takes longest.<\/li>\n<li><strong>Server start:<\/strong> Riva loads the models into Triton (&#8220;\u2026waiting for Triton server\u2026&#8221;) and then starts the gRPC server on port 50051 and the HTTP server on port 9000.<\/li>\n<\/ul>\n<p>The service is ready once the line <strong>&#8220;Welcome! Application is ready to receive API requests.&#8221;<\/strong> appears. For me the complete first start took about <strong>ten minutes<\/strong> (download ~5 min, engine build ~5 min). The next start is much faster because the model is already cached.<\/p>\n<p>The many warnings that scroll by (e.g. about <code>pynvml<\/code>, <code>transformers<\/code>\/<code>modelopt<\/code> or a skipped TRT tactic) are cosmetic \u2013 all that matters is that the &#8220;ready&#8221; line comes at the end.<\/p>\n<div id=\"attachment_2559\" style=\"width: 999px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_nim_container_Canary_setup_06.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-2559\" class=\"size-full wp-image-2559\" src=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_nim_container_Canary_setup_06.jpg\" alt=\"NVIDIA NIM container Canary setup\" width=\"989\" height=\"530\" srcset=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_nim_container_Canary_setup_06.jpg 989w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_nim_container_Canary_setup_06-300x161.jpg 300w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_nim_container_Canary_setup_06-768x412.jpg 768w\" sizes=\"(max-width: 989px) 100vw, 989px\" \/><\/a><p id=\"caption-attachment-2559\" class=\"wp-caption-text\">NVIDIA NIM container Canary setup<\/p><\/div>\n<p><strong>Note on container ID and profile:<\/strong><\/p>\n<p>Both containers we have met so far come from the current NIM ASR support matrix. Should NVIDIA rename the model or change profiles, you will find the valid values there or on the model page at <code>build.nvidia.com<\/code>.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Step_2_Check_the_container_status\"><\/span>Step 2: Check the container status<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>In a second terminal you check as usual whether the service is running.<\/p>\n<p><strong>Command:<\/strong> <code>docker ps<\/code><\/p>\n<p><strong>Command:<\/strong> <code>curl http:\/\/localhost:9000\/v1\/health\/ready<\/code><\/p>\n<p>If the health check answers with <code>{\"object\":\"health.response\",\"message\":\"ready\",\"status\":\"ready\"}<\/code>, your Canary microservice is up.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Step_3_Transcribe_German_offline\"><\/span>Step 3: Transcribe German (offline)<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Because Canary works offline, we use the <strong>offline script<\/strong> <code>transcribe_file_offline.py<\/code>, not the streaming script <code>transcribe_file.py<\/code> from Part 2. Two Canary specifics: the <strong>input language must be specified explicitly<\/strong>, and punctuation is on by default.<\/p>\n<p>First activate the venv again (if needed):<\/p>\n<p><strong>Command:<\/strong> <code>source ~\/venvs\/riva-client\/bin\/activate<\/code><\/p>\n<p>Then transcribe your German WAV (mono, 16 kHz):<\/p>\n<p><strong>Command:<\/strong> <code>python python-clients\/scripts\/asr\/transcribe_file_offline.py --server 0.0.0.0:50051 --language-code de-DE --input-file \/home\/ingmar\/audio\/audio.wav<\/code><\/p>\n<p><strong>Note on the language flag:<\/strong> Depending on the client version, the flag is <code>--language-code<\/code> or <code>--language<\/code>. If in doubt, <code>python python-clients\/scripts\/asr\/transcribe_file_offline.py --help<\/code> shows the exact names, and <code>--list-models<\/code> shows the loaded models and languages.<\/p>\n<p><strong>Comparison with Parakeet:<\/strong> Run the same German WAV once through Parakeet (Part 2) and once through Canary and compare the transcripts. On clean audio the two are close \u2013 on difficult material (technical terms, accent, background noise) Canary usually delivers the cleaner output.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Step_4_Translate_instead_of_just_transcribing_AST\"><\/span>Step 4: Translate instead of just transcribing (AST)<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Now for what makes Canary special: <strong>translation<\/strong>. Canary translates between the supported non-English languages and English in both directions, so for us: German recording in, <strong>English text<\/strong> out, directly and without a separate intermediate transcription.<\/p>\n<p>The most convenient route is the <strong>OpenAI-compatible HTTP interface<\/strong> of the NIM. Besides <code>\/v1\/audio\/transcriptions<\/code>, the service also provides <code>\/v1\/audio\/translations<\/code>, and the latter translates to English.<\/p>\n<p>As a cross-check, first the plain transcription via HTTP:<\/p>\n<p><strong>Command:<\/strong> <code>curl -s http:\/\/0.0.0.0:9000\/v1\/audio\/transcriptions -F language=de-DE -F file=\"@\/home\/ingmar\/audio\/audio.wav\"<\/code><\/p>\n<p>And now the translation of the German recording into English:<\/p>\n<p><strong>Command:<\/strong> <code>curl -s http:\/\/0.0.0.0:9000\/v1\/audio\/translations -F language=de-DE -F file=\"@\/home\/ingmar\/audio\/audio.wav\"<\/code><\/p>\n<p>This way you get the same spoken content once as a German transcript and once as an English translation.<\/p>\n<blockquote><p>Here is the plain transcript:<br \/>\n<code>(riva-client) ingmar@A6000Ada:~$ curl -s http:\/\/0.0.0.0:9000\/v1\/audio\/transcriptions -F language=de-DE -F file=\"@\/home\/ingmar\/audio\/audio.wav\"<\/code><br \/>\n<code>{\"text\":\"ein Buch zu schreiben, ist gar nicht so eine leichte Aufgabe. Man muss sich das Thema gut \u00fcberlegen, sich einen roten Faden ausdenken und die Kapitelstruktur aufsetzen. Dann beginnt die eigentliche Arbeit, die einzelnen Kapitel mit spannenden Inhalten zu bef\u00fcllen, die den Leser auch wirklich mitreissen. Auf dem Weg durch das Abenteuer, durch die Geschichte, durch die Bastelanleitung oder auch super spannend und wichtig. Und ich finde dadurch, dass wenn ich B\u00fccher schreibe, lerne ich auch f\u00fcr mich selbst sehr viel dazu. \",\"language_code\":\"de-DE\"}<\/code><\/p>\n<p>Here is the translation:<br \/>\n<code>(riva-client) ingmar@A6000Ada:~$ curl -s http:\/\/0.0.0.0:9000\/v1\/audio\/translations -F language=de-DE -F file=\"@\/home\/ingmar\/audio\/audio.wav\"<\/code><br \/>\n<code>{\"text\":\"Writing a book is not an easy task. You have to think about the subject, think about a red thread and put up the chapter structure. Then the actual work begins to fill the individual chapters with exciting content that really carry the reader along. I find that when I write books, I learn a lot about it for myself. \",\"language_code\":\"de-DE\"}<\/code><br \/>\n<code>(riva-client) ingmar@A6000Ada:~$<\/code><\/p><\/blockquote>\n<p>I&#8217;m skipping a screenshot of the terminal here because the output is much easier to read as plain text.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Tips_and_troubleshooting\"><\/span>Tips and troubleshooting<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<ul>\n<li><strong>Port conflict with Parakeet:<\/strong> Both NIMs occupy 9000\/50051. Stop the other model before you start this one \u2013 or assign different ports in the <code>docker run<\/code> command.<\/li>\n<li><strong>Offline means higher latency:<\/strong> The result only arrives after the complete file has been processed. That is expected behavior, not an error.<\/li>\n<li><strong>Input language is mandatory:<\/strong> Canary needs the source language explicitly (e.g. <code>de-DE<\/code>). Without it you get an error.<\/li>\n<li><strong>Audio format:<\/strong> Mono, 16 kHz (WAV, OPUS or FLAC). The most common source of errors \u2013 convert with ffmpeg beforehand if in doubt.<\/li>\n<li><strong>Wrong flag:<\/strong> If a flag is not recognized, <code>--help<\/code> of the respective script gives you the exact names.<\/li>\n<li><strong>VRAM:<\/strong> Canary 1B is small; use <code>nvidia-smi<\/code> to check the GPU usage.<\/li>\n<\/ul>\n<h2><span class=\"ez-toc-section\" id=\"Conclusion\"><\/span>Conclusion<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>With <strong>Parakeet<\/strong> (streaming, Part 2) and <strong>Canary<\/strong> (accuracy plus translation, this part), the ASR side of my stack is covered natively on NVIDIA: live and offline, transcription and translation, all local on my own hardware. The two models complement each other rather than replacing each other.<\/p>\n<p>In the next part comes the opposite direction: <strong>voice output<\/strong> via the German <strong>Magpie TTS NIM<\/strong>. After that I have ASR and TTS as NVIDIA microservices together. That forms the basis for connecting them step by step into a complete local voice agent with the orchestrator (NeMo Agent Toolkit) and Pipecat.<\/p>\n<p>If you rebuild the setup: drop me a comment about how big the accuracy and quality difference between Parakeet and Canary turns out to be on your own audio material.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In Part 2 I set up Parakeet as a streaming-capable ASR NIM for German and ran it as a live service with low latency. In this post I take on the sister model: NVIDIA Canary as a NIM. Canary does not shine at latency but at accuracy, and it can do something Parakeet cannot: translate. [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":2560,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_et_pb_use_builder":"","_et_pb_old_content":"","_et_gb_content_width":"","footnotes":""},"categories":[162,8,50],"tags":[1643,1641,1645,353,1638,1031,1418,1639,1608,1623,1625,1640,1636,1620,1617,1621,1644,315,1032,1637,1642,1646],"class_list":["post-2562","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-large-language-models-en","category-news","category-top-story-en","tag-ast","tag-canary-nim","tag-canary-1b","tag-docker","tag-german-to-english-translation","tag-local-ai","tag-lokale-ki","tag-multilingual-speech-recognition","tag-ngc-api-key","tag-nvcr-io","tag-nvidia-canary","tag-nvidia-canary-locally","tag-nvidia-inference-microservices","tag-nvidia-nim","tag-nvidia-riva","tag-offline-asr","tag-rtx-a6000-en","tag-sovereign-ai","tag-speech-translation","tag-speech-to-text-translation","tag-transcribe_file_offline","et-has-post-format-content","et_post_format-et-post-format-standard"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.8 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>NVIDIA Canary locally: multilingual speech recognition and translation as a NIM - Exploring the Future: Inside the AI Box<\/title>\n<meta name=\"description\" content=\"Run NVIDIA Canary locally: transcribe German audio as a NIM and translate it straight into English offline, accurate and entirely on your own hardware.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/ai-box.eu\/en\/news\/nvidia-canary-locally-multilingual-speech-recognition-and-translation-as-a-nim\/2562\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"NVIDIA Canary locally: multilingual speech recognition and translation as a NIM - Exploring the Future: Inside the AI Box\" \/>\n<meta property=\"og:description\" content=\"Run NVIDIA Canary locally: transcribe German audio as a NIM and translate it straight into English offline, accurate and entirely on your own hardware.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/ai-box.eu\/en\/news\/nvidia-canary-locally-multilingual-speech-recognition-and-translation-as-a-nim\/2562\/\" \/>\n<meta property=\"og:site_name\" content=\"Exploring the Future: Inside the AI Box\" \/>\n<meta property=\"article:published_time\" content=\"2026-06-14T14:46:17+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-06-14T14:52:20+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_nim_container_Canary_setup_06.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"989\" \/>\n\t<meta property=\"og:image:height\" content=\"530\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Maker\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Ingmar_Stapel\" \/>\n<meta name=\"twitter:site\" content=\"@Ingmar_Stapel\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Maker\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"8 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/nvidia-canary-locally-multilingual-speech-recognition-and-translation-as-a-nim\\\/2562\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/nvidia-canary-locally-multilingual-speech-recognition-and-translation-as-a-nim\\\/2562\\\/\"},\"author\":{\"name\":\"Maker\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/#\\\/schema\\\/person\\\/cc91d08618b3feeef6926591b465eab1\"},\"headline\":\"NVIDIA Canary locally: multilingual speech recognition and translation as a NIM\",\"datePublished\":\"2026-06-14T14:46:17+00:00\",\"dateModified\":\"2026-06-14T14:52:20+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/nvidia-canary-locally-multilingual-speech-recognition-and-translation-as-a-nim\\\/2562\\\/\"},\"wordCount\":1247,\"commentCount\":0,\"image\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/nvidia-canary-locally-multilingual-speech-recognition-and-translation-as-a-nim\\\/2562\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/ai-box.eu\\\/wp-content\\\/uploads\\\/2026\\\/06\\\/NVIDIA_nim_container_Canary_setup_06.jpg\",\"keywords\":[\"AST\",\"Canary NIM\",\"canary-1b\",\"Docker\",\"German to English translation\",\"local AI\",\"lokale KI\",\"multilingual speech recognition\",\"NGC API-Key\",\"NGC API-Key\",\"nvcr.io\",\"NVIDIA Canary\",\"NVIDIA Canary locally\",\"NVIDIA Inference Microservices\",\"NVIDIA NIM\",\"NVIDIA Riva\",\"Offline ASR\",\"RTX A6000\",\"sovereign AI\",\"speech translation\",\"Speech-to-Text Translation\",\"transcribe_file_offline\"],\"articleSection\":[\"Large Language Models\",\"News\",\"Top story\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/nvidia-canary-locally-multilingual-speech-recognition-and-translation-as-a-nim\\\/2562\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/nvidia-canary-locally-multilingual-speech-recognition-and-translation-as-a-nim\\\/2562\\\/\",\"url\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/nvidia-canary-locally-multilingual-speech-recognition-and-translation-as-a-nim\\\/2562\\\/\",\"name\":\"NVIDIA Canary locally: multilingual speech recognition and translation as a NIM - Exploring the Future: Inside the AI Box\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/nvidia-canary-locally-multilingual-speech-recognition-and-translation-as-a-nim\\\/2562\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/nvidia-canary-locally-multilingual-speech-recognition-and-translation-as-a-nim\\\/2562\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/ai-box.eu\\\/wp-content\\\/uploads\\\/2026\\\/06\\\/NVIDIA_nim_container_Canary_setup_06.jpg\",\"datePublished\":\"2026-06-14T14:46:17+00:00\",\"dateModified\":\"2026-06-14T14:52:20+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/#\\\/schema\\\/person\\\/cc91d08618b3feeef6926591b465eab1\"},\"description\":\"Run NVIDIA Canary locally: transcribe German audio as a NIM and translate it straight into English offline, accurate and entirely on your own hardware.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/nvidia-canary-locally-multilingual-speech-recognition-and-translation-as-a-nim\\\/2562\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/nvidia-canary-locally-multilingual-speech-recognition-and-translation-as-a-nim\\\/2562\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/nvidia-canary-locally-multilingual-speech-recognition-and-translation-as-a-nim\\\/2562\\\/#primaryimage\",\"url\":\"https:\\\/\\\/ai-box.eu\\\/wp-content\\\/uploads\\\/2026\\\/06\\\/NVIDIA_nim_container_Canary_setup_06.jpg\",\"contentUrl\":\"https:\\\/\\\/ai-box.eu\\\/wp-content\\\/uploads\\\/2026\\\/06\\\/NVIDIA_nim_container_Canary_setup_06.jpg\",\"width\":989,\"height\":530,\"caption\":\"NVIDIA NIM Container Canary setup\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/nvidia-canary-locally-multilingual-speech-recognition-and-translation-as-a-nim\\\/2562\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Start\",\"item\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"NVIDIA Canary locally: multilingual speech recognition and translation as a NIM\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/#website\",\"url\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/\",\"name\":\"Exploring the Future: Inside the AI Box\",\"description\":\"Inside the AI Box, we share our experiences and discoveries in the world of artificial intelligence.\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/#\\\/schema\\\/person\\\/cc91d08618b3feeef6926591b465eab1\",\"name\":\"Maker\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g\",\"caption\":\"Maker\"},\"description\":\"I live in Bavaria near Munich. In my head I always have many topics and try out especially in the field of Internet new media much in my spare time. I write on the blog because it makes me fun to report about the things that inspire me. I am happy about every comment, about suggestion and very about questions.\",\"sameAs\":[\"https:\\\/\\\/ai-box.eu\"],\"url\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/author\\\/ingmars\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"NVIDIA Canary locally: multilingual speech recognition and translation as a NIM - Exploring the Future: Inside the AI Box","description":"Run NVIDIA Canary locally: transcribe German audio as a NIM and translate it straight into English offline, accurate and entirely on your own hardware.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/ai-box.eu\/en\/news\/nvidia-canary-locally-multilingual-speech-recognition-and-translation-as-a-nim\/2562\/","og_locale":"en_US","og_type":"article","og_title":"NVIDIA Canary locally: multilingual speech recognition and translation as a NIM - Exploring the Future: Inside the AI Box","og_description":"Run NVIDIA Canary locally: transcribe German audio as a NIM and translate it straight into English offline, accurate and entirely on your own hardware.","og_url":"https:\/\/ai-box.eu\/en\/news\/nvidia-canary-locally-multilingual-speech-recognition-and-translation-as-a-nim\/2562\/","og_site_name":"Exploring the Future: Inside the AI Box","article_published_time":"2026-06-14T14:46:17+00:00","article_modified_time":"2026-06-14T14:52:20+00:00","og_image":[{"width":989,"height":530,"url":"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_nim_container_Canary_setup_06.jpg","type":"image\/jpeg"}],"author":"Maker","twitter_card":"summary_large_image","twitter_creator":"@Ingmar_Stapel","twitter_site":"@Ingmar_Stapel","twitter_misc":{"Written by":"Maker","Est. reading time":"8 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/ai-box.eu\/en\/news\/nvidia-canary-locally-multilingual-speech-recognition-and-translation-as-a-nim\/2562\/#article","isPartOf":{"@id":"https:\/\/ai-box.eu\/en\/news\/nvidia-canary-locally-multilingual-speech-recognition-and-translation-as-a-nim\/2562\/"},"author":{"name":"Maker","@id":"https:\/\/ai-box.eu\/en\/#\/schema\/person\/cc91d08618b3feeef6926591b465eab1"},"headline":"NVIDIA Canary locally: multilingual speech recognition and translation as a NIM","datePublished":"2026-06-14T14:46:17+00:00","dateModified":"2026-06-14T14:52:20+00:00","mainEntityOfPage":{"@id":"https:\/\/ai-box.eu\/en\/news\/nvidia-canary-locally-multilingual-speech-recognition-and-translation-as-a-nim\/2562\/"},"wordCount":1247,"commentCount":0,"image":{"@id":"https:\/\/ai-box.eu\/en\/news\/nvidia-canary-locally-multilingual-speech-recognition-and-translation-as-a-nim\/2562\/#primaryimage"},"thumbnailUrl":"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_nim_container_Canary_setup_06.jpg","keywords":["AST","Canary NIM","canary-1b","Docker","German to English translation","local AI","lokale KI","multilingual speech recognition","NGC API-Key","NGC API-Key","nvcr.io","NVIDIA Canary","NVIDIA Canary locally","NVIDIA Inference Microservices","NVIDIA NIM","NVIDIA Riva","Offline ASR","RTX A6000","sovereign AI","speech translation","Speech-to-Text Translation","transcribe_file_offline"],"articleSection":["Large Language Models","News","Top story"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/ai-box.eu\/en\/news\/nvidia-canary-locally-multilingual-speech-recognition-and-translation-as-a-nim\/2562\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/ai-box.eu\/en\/news\/nvidia-canary-locally-multilingual-speech-recognition-and-translation-as-a-nim\/2562\/","url":"https:\/\/ai-box.eu\/en\/news\/nvidia-canary-locally-multilingual-speech-recognition-and-translation-as-a-nim\/2562\/","name":"NVIDIA Canary locally: multilingual speech recognition and translation as a NIM - Exploring the Future: Inside the AI Box","isPartOf":{"@id":"https:\/\/ai-box.eu\/en\/#website"},"primaryImageOfPage":{"@id":"https:\/\/ai-box.eu\/en\/news\/nvidia-canary-locally-multilingual-speech-recognition-and-translation-as-a-nim\/2562\/#primaryimage"},"image":{"@id":"https:\/\/ai-box.eu\/en\/news\/nvidia-canary-locally-multilingual-speech-recognition-and-translation-as-a-nim\/2562\/#primaryimage"},"thumbnailUrl":"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_nim_container_Canary_setup_06.jpg","datePublished":"2026-06-14T14:46:17+00:00","dateModified":"2026-06-14T14:52:20+00:00","author":{"@id":"https:\/\/ai-box.eu\/en\/#\/schema\/person\/cc91d08618b3feeef6926591b465eab1"},"description":"Run NVIDIA Canary locally: transcribe German audio as a NIM and translate it straight into English offline, accurate and entirely on your own hardware.","breadcrumb":{"@id":"https:\/\/ai-box.eu\/en\/news\/nvidia-canary-locally-multilingual-speech-recognition-and-translation-as-a-nim\/2562\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/ai-box.eu\/en\/news\/nvidia-canary-locally-multilingual-speech-recognition-and-translation-as-a-nim\/2562\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/ai-box.eu\/en\/news\/nvidia-canary-locally-multilingual-speech-recognition-and-translation-as-a-nim\/2562\/#primaryimage","url":"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_nim_container_Canary_setup_06.jpg","contentUrl":"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_nim_container_Canary_setup_06.jpg","width":989,"height":530,"caption":"NVIDIA NIM Container Canary setup"},{"@type":"BreadcrumbList","@id":"https:\/\/ai-box.eu\/en\/news\/nvidia-canary-locally-multilingual-speech-recognition-and-translation-as-a-nim\/2562\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Start","item":"https:\/\/ai-box.eu\/en\/"},{"@type":"ListItem","position":2,"name":"NVIDIA Canary locally: multilingual speech recognition and translation as a NIM"}]},{"@type":"WebSite","@id":"https:\/\/ai-box.eu\/en\/#website","url":"https:\/\/ai-box.eu\/en\/","name":"Exploring the Future: Inside the AI Box","description":"Inside the AI Box, we share our experiences and discoveries in the world of artificial intelligence.","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/ai-box.eu\/en\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/ai-box.eu\/en\/#\/schema\/person\/cc91d08618b3feeef6926591b465eab1","name":"Maker","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g","caption":"Maker"},"description":"I live in Bavaria near Munich. In my head I always have many topics and try out especially in the field of Internet new media much in my spare time. I write on the blog because it makes me fun to report about the things that inspire me. I am happy about every comment, about suggestion and very about questions.","sameAs":["https:\/\/ai-box.eu"],"url":"https:\/\/ai-box.eu\/en\/author\/ingmars\/"}]}},"_links":{"self":[{"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/posts\/2562","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/comments?post=2562"}],"version-history":[{"count":1,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/posts\/2562\/revisions"}],"predecessor-version":[{"id":2563,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/posts\/2562\/revisions\/2563"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/media\/2560"}],"wp:attachment":[{"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/media?parent=2562"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/categories?post=2562"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/tags?post=2562"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}