{"id":2530,"date":"2026-06-13T19:25:49","date_gmt":"2026-06-13T19:25:49","guid":{"rendered":"https:\/\/ai-box.eu\/?p=2530"},"modified":"2026-06-13T19:31:08","modified_gmt":"2026-06-13T19:31:08","slug":"install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide","status":"publish","type":"post","link":"https:\/\/ai-box.eu\/en\/news\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\/2530\/","title":{"rendered":"Install NVIDIA Nemotron ASR Streaming Locally &#8211; Step-by-Step Guide"},"content":{"rendered":"<p>Real-time speech recognition is one of the building blocks I absolutely want to self-host for sovereign voice agents. My vision is always to run everything locally &#8211; without a cloud API, without my audio recording ever leaving my network. With the model updated in March 2026, <strong>NVIDIA Nemotron ASR Streaming (0.6B)<\/strong>, there is now a very attractive option for exactly that: a compact 600-million-parameter model that transcribes English speech with <strong>punctuation and capitalization<\/strong> and is designed for both <strong>low-latency streaming<\/strong> and <strong>batch processing<\/strong>.<\/p>\n<p>In this post I&#8217;ll show you step by step how I got the model running on my local inference server (dual <strong>RTX A6000<\/strong>, Ubuntu). Conveniently, NVIDIA even officially lists the A6000 as tested hardware. That makes the ASR model a perfect fit for my homelab setup &#8211; or your workstation, if it has comparable hardware.<\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_85 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/ai-box.eu\/en\/news\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\/2530\/#What_is_Nemotron_ASR_Streaming\" >What is Nemotron ASR Streaming?<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/ai-box.eu\/en\/news\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\/2530\/#Key_facts_at_a_glance\" >Key facts at a glance<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/ai-box.eu\/en\/news\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\/2530\/#Requirements\" >Requirements<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/ai-box.eu\/en\/news\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\/2530\/#Step_1_Install_system_packages\" >Step 1: Install system packages<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/ai-box.eu\/en\/news\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\/2530\/#Step_2_Create_a_Python_venv\" >Step 2: Create a Python venv<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/ai-box.eu\/en\/news\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\/2530\/#Step_3_Install_PyTorch_and_NeMo\" >Step 3: Install PyTorch and NeMo<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/ai-box.eu\/en\/news\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\/2530\/#Step_4_First_test_%E2%80%93_load_the_model_and_transcribe\" >Step 4: First test &#8211; load the model and transcribe<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/ai-box.eu\/en\/news\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\/2530\/#Step_5_Prepare_the_audio_correctly\" >Step 5: Prepare the audio correctly<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/ai-box.eu\/en\/news\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\/2530\/#Step_6_Streaming_inference_with_the_cache-aware_script\" >Step 6: Streaming inference with the cache-aware script<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/ai-box.eu\/en\/news\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\/2530\/#Step_7_Understanding_chunk_sizes_and_the_latency_control\" >Step 7: Understanding chunk sizes and the latency control<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-11\" href=\"https:\/\/ai-box.eu\/en\/news\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\/2530\/#Step_8_End-to-end_pipeline_with_punctuation_ITN_and_translation\" >Step 8: End-to-end pipeline with punctuation, ITN and translation<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-12\" href=\"https:\/\/ai-box.eu\/en\/news\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\/2530\/#Performance_What_can_you_expect\" >Performance: What can you expect?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-13\" href=\"https:\/\/ai-box.eu\/en\/news\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\/2530\/#Tips_for_production_use\" >Tips for production use<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-14\" href=\"https:\/\/ai-box.eu\/en\/news\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\/2530\/#Live_Demo_Web_App\" >Live Demo Web App<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-15\" href=\"https:\/\/ai-box.eu\/en\/news\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\/2530\/#Option_1_%E2%80%93_SSH_tunnel\" >Option 1 &#8211; SSH tunnel:<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-16\" href=\"https:\/\/ai-box.eu\/en\/news\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\/2530\/#Option_2_%E2%80%93_Gradio_public_share_URL\" >Option 2 &#8211; Gradio public share URL:<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-17\" href=\"https:\/\/ai-box.eu\/en\/news\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\/2530\/#Starting_the_web_app\" >Starting the web app<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-18\" href=\"https:\/\/ai-box.eu\/en\/news\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\/2530\/#Conclusion\" >Conclusion<\/a><\/li><\/ul><\/nav><\/div>\n<h2><span class=\"ez-toc-section\" id=\"What_is_Nemotron_ASR_Streaming\"><\/span>What is Nemotron ASR Streaming?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Nemotron-ASR-Streaming is based on the <strong>Cache-Aware FastConformer-RNNT<\/strong> architecture with 24 encoder layers and an RNN-Transducer decoder. The key difference from classic &#8220;buffered&#8221; streaming: instead of repeatedly recomputing overlapping audio windows, the model keeps <strong>caches for all self-attention and convolution layers<\/strong>. Each new audio chunk is therefore processed exactly once. This approach saves compute time and reduces latency without sacrificing accuracy.<\/p>\n<p>What I find especially exciting is the <strong>runtime flexibility<\/strong>: the chunk size can be chosen at inference time, with no retraining at all. So you move freely along the Pareto curve between latency and accuracy. This makes it possible to tune the model &#8211; and your own setup &#8211; anywhere from <strong>80 ms<\/strong> for highly reactive voice agents up to <strong>1120 ms<\/strong> for maximum transcription quality.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Key_facts_at_a_glance\"><\/span>Key facts at a glance<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<ul>\n<li><strong>Architecture:<\/strong> FastConformer-CacheAware-RNNT (24 encoder layers, RNNT decoder)<\/li>\n<li><strong>Parameters:<\/strong> 600M<\/li>\n<li><strong>Language:<\/strong> English (en-US), trained on roughly 530k hours of audio<\/li>\n<li><strong>Chunk sizes:<\/strong> 80 ms, 160 ms, 560 ms, 1120 ms<\/li>\n<li><strong>Features:<\/strong> punctuation &amp; capitalization natively, optional ITN and translation via the pipeline<\/li>\n<li><strong>License:<\/strong> NVIDIA Open Model License (usable commercially and non-commercially)<\/li>\n<\/ul>\n<h2><span class=\"ez-toc-section\" id=\"Requirements\"><\/span>Requirements<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Before we get started, a quick look at the requirements. You don&#8217;t need much hardware, because the model is small:<\/p>\n<ul>\n<li><strong>Operating system:<\/strong> Linux (I use Ubuntu Server)<\/li>\n<li><strong>GPU:<\/strong> NVIDIA GPU of the Volta, Ampere, Hopper or Blackwell architecture. Tested on V100, A100, A6000 and DGX Spark, among others. With 600M parameters, a few GB of VRAM are easily enough.<\/li>\n<li><strong>Driver &amp; CUDA:<\/strong> a current NVIDIA driver, matching the installed PyTorch CUDA version<\/li>\n<li><strong>Python:<\/strong> 3.10 or 3.11 (I recommend a clean venv environment)<\/li>\n<li><strong>Runtime:<\/strong> NeMo 25.11 or newer<\/li>\n<\/ul>\n<h2><span class=\"ez-toc-section\" id=\"Step_1_Install_system_packages\"><\/span>Step 1: Install system packages<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>NeMo needs <strong>libsndfile<\/strong> and <strong>ffmpeg<\/strong> for audio processing. We install these first at the system level. We also need the <code>venv<\/code> module for our virtual Python environment. It is very likely already present, but on many distributions it lives in the <code>python3-venv<\/code> package:<\/p>\n<p><strong>Command: <\/strong><code class=\"language-bash\">sudo apt-get update &amp;&amp; sudo apt-get install -y libsndfile1 ffmpeg python3-venv python3-pip<\/code><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Step_2_Create_a_Python_venv\"><\/span>Step 2: Create a Python venv<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>So that the NeMo dependencies don&#8217;t collide with other projects or the system Python, I always work in an isolated <strong>virtual environment (venv)<\/strong>. This keeps the setup clean and can be removed without any residue at any time &#8211; just delete the folder.<\/p>\n<p>First create the venv with Python 3.11 (adjust the path as you like) and then activate it:<\/p>\n<pre>Next we create a virtual environment named nemotron-asr in the ~\/venv\/ folder of the home directory using Python 3.11.<\/pre>\n<p><strong>Command: <\/strong><code>python3.11 -m venv ~\/venvs\/nemotron-asr<\/code><\/p>\n<p>Now the environment that has been set up still needs to be activated with the following command.<\/p>\n<p><strong>Command: <\/strong><code>source ~\/venvs\/nemotron-asr\/bin\/activate<\/code><\/p>\n<p>Once the venv is active, you&#8217;ll see its name in your prompt, e.g. <code>(nemotron-asr)<\/code>. First, we bring <strong>pip<\/strong> and the build tools up to date &#8211; this saves you many a build error later on:<\/p>\n<p><strong>Command: <\/strong><code class=\"language-bash\">pip install --upgrade pip setuptools wheel<\/code><\/p>\n<p><strong>Tip:<\/strong> For every new terminal session you have to activate the environment again with <code>source ~\/venvs\/nemotron-asr\/bin\/activate<\/code>. You can leave it at any time with <code>deactivate<\/code>.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Step_3_Install_PyTorch_and_NeMo\"><\/span>Step 3: Install PyTorch and NeMo<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>NVIDIA explicitly recommends this order: <strong>first Cython and a current PyTorch<\/strong>, then the NeMo toolkit. Install PyTorch with the CUDA version that matches your driver (example here: CUDA 130).<\/p>\n<p>All of the following commands run inside the activated venv:<\/p>\n<p>First, Cython and packaging are installed.<br \/>\n<strong>Command:<\/strong> <code>pip install Cython packaging<\/code><\/p>\n<p>Install a current PyTorch with CUDA support inside the virtual environment. Important: please watch the CUDA version and adjust it if necessary!<br \/>\n<strong>Command: <\/strong><code>pip install torch torchvision torchaudio --index-url https:\/\/download.pytorch.org\/whl\/cu130<\/code><\/p>\n<p>NeMo toolkit with ASR extras directly from the main branch.<br \/>\n<strong>Command: <\/strong><code>pip install \"nemo_toolkit[asr] @ git+https:\/\/github.com\/NVIDIA\/NeMo.git@main\"<\/code><\/p>\n<p><strong>Important note:<\/strong><br \/>\nThe updated checkpoint from March 2026 requires the <code>main<\/code> branch of NeMo. A release package installed via <code>pip<\/code> is often not current enough here. That&#8217;s why I prefer installing directly from GitHub.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Step_4_First_test_%E2%80%93_load_the_model_and_transcribe\"><\/span>Step 4: First test &#8211; load the model and transcribe<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Now the exciting moment. With just a few lines of Python, the model loads directly from Hugging Face and transcribes an audio file in offline mode (the whole file at once). This is the fastest way to check whether everything is installed correctly.<\/p>\n<p>Download the small Python program here and unzip it. I saved it locally in a folder <code>~\/asr\/<\/code>.<\/p>\n<p><strong>Download:<\/strong> <a href=\"https:\/\/github.com\/custom-build-robots\/nemotron-asr-local-streaming-demo\" target=\"_blank\" rel=\"noopener\">https:\/\/github.com\/custom-build-robots\/nemotron-asr-local-streaming-demo<\/a><\/p>\n<p>So that HF doesn&#8217;t throw an error, set the HF token once in the shell here.<\/p>\n<p><strong>Command:<\/strong> <code>export HF_TOKEN=\"hf_xxxxxxxxxxxxxxxxxxxxx\"<\/code><\/p>\n<p>Now run the previously created script and pass the audio file directly in the command.<\/p>\n<p><strong>Command:<\/strong> <code>python transcribe.py audio.wav<\/code><\/p>\n<p>If a correctly transcribed text output with punctuation appears here, your setup is working. With that, the real hurdle &#8211; the installation &#8211; is already cleared.<\/p>\n<p>I made the small mistake of using a stereo WAV file and got the following error.<\/p>\n<p><code>Input shape expected = (batch, time)<\/code><br \/>\n<code>Input shape found : torch.Size([1, 2, 631520])<\/code><\/p>\n<p>More on that in the following Step 5, because my stereo file that caused this error cannot be processed.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Step_5_Prepare_the_audio_correctly\"><\/span>Step 5: Prepare the audio correctly<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>The model expects <strong>mono audio in WAV format<\/strong>. You can convert stereo recordings or other sample rates most easily with ffmpeg. <strong>16 kHz, mono, 16-bit PCM<\/strong> has proven reliable:<\/p>\n<pre><code class=\"language-bash\">ffmpeg -i input.mp3 -ar 16000 -ac 1 -c:a pcm_s16le audio.wav<\/code><\/pre>\n<p>There is no fixed maximum length &#8211; the upper limit depends solely on the available GPU memory. On my A6000 with 48 GB this is not an issue in practice.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Step_6_Streaming_inference_with_the_cache-aware_script\"><\/span>Step 6: Streaming inference with the cache-aware script<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>For real streaming, i.e. continuous processing in chunks, NeMo ships a ready-made example script. Important upfront: this script needs <strong>neither a screen nor a microphone<\/strong> and can be used via SSH. It reads audio files from disk and writes the transcripts to a file. So it&#8217;s exactly right for my headless A6000 Ada server, on which I set everything up via SSH. The script simulates <strong>streaming on already existing files<\/strong> &#8211; it feeds them in chunk by chunk to test the streaming behavior &#8211; and is not a live microphone capture.<\/p>\n<p>First we clone the NeMo repository:<\/p>\n<p><strong>Command:<\/strong> <code>cd ~<\/code><\/p>\n<p><strong>Command:<\/strong> <code>git clone https:\/\/github.com\/NVIDIA\/NeMo.git<\/code><\/p>\n<p><strong>Command:<\/strong> <code>cd NeMo<\/code><\/p>\n<p>Then you start the cache-aware streaming inference. The central parameter is <code>att_context_size<\/code>, which controls the latency:<\/p>\n<pre><code class=\"language-bash\">python examples\/asr\/asr_cache_aware_streaming\/speech_to_text_cache_aware_streaming_infer.py \\\r\n    model_path=&lt;model_path&gt; \\\r\n    dataset_manifest=&lt;dataset_manifest&gt; \\\r\n    batch_size=&lt;batch_size&gt; \\\r\n    att_context_size=\"[70,13]\" \\\r\n    output_path=&lt;output_folder&gt;<\/code><\/pre>\n<p>This command looks more cryptic at first glance than it is. The <code>&lt;...&gt;<\/code> are pure <strong>placeholders<\/strong> that you have to replace with real values:<\/p>\n<ul>\n<li><strong>model_path<\/strong> &#8211; path to the local <code>.nemo<\/code> model file (exactly the file that was downloaded by <code>from_pretrained<\/code> in Step 4).<\/li>\n<li><strong>dataset_manifest<\/strong> &#8211; a JSONL file listing which audio files should be transcribed (NeMo manifest format).<\/li>\n<li><strong>batch_size<\/strong> &#8211; simply a number, e.g. <code>16<\/code>. On the A6000 Ada (48 GB) you can go higher.<\/li>\n<li><strong>att_context_size<\/strong> &#8211; your latency control (more on that in Step 7), here <code>[70,13]<\/code> = 1.12 s chunk size.<\/li>\n<li><strong>output_path<\/strong> &#8211; a folder into which the results are written.<\/li>\n<\/ul>\n<p>To make this concrete, I&#8217;ll walk through it step by step.<\/p>\n<p><strong>1. Place the .nemo file locally.<\/strong> If you don&#8217;t want to search for the path in the HF cache, you can download the model file directly:<\/p>\n<pre><code class=\"language-bash\">hf download nvidia\/nemotron-speech-streaming-en-0.6b \\\r\n    nemotron-speech-streaming-en-0.6b.nemo \\\r\n    --local-dir ~\/asr\/models<\/code><\/pre>\n<p>For me, though, the file was quite easy to find in my home directory in the folder <code>~\/.cache\/huggingface\/...<\/code>, as shown in the path below:<\/p>\n<p><strong>Path:<\/strong> <code>~\/.cache\/huggingface\/hub\/models--nvidia--nemotron-speech-streaming-en-0.6b\/snapshots\/&lt;commit-hash&gt;\/<\/code><\/p>\n<p><strong>2. Build a manifest.<\/strong><\/p>\n<p>The manifest is a simple text file in which each line contains a JSON object with the audio path:<\/p>\n<p><strong>Command:<\/strong> <code class=\"language-bash\">nano ~\/asr\/manifest.jsonl<\/code><\/p>\n<p>Then the following is inserted into the file, here using my home directory as an example.<\/p>\n<p><code>{\"audio_filepath\": \"\/home\/ingmar\/asr\/audio.wav\", \"duration\": 25.0, \"text\": \"\"}<\/code><\/p>\n<p>Save the file with Ctrl + x followed by y.<\/p>\n<p>The value for <code>duration<\/code> can be rough, <code>text<\/code> stays empty for pure inference. For multiple files, simply append more lines.<\/p>\n<p><strong>3. The finished command with real values.<\/strong><\/p>\n<p>Now you replace the placeholders with your actual paths. For me it looked like this.<\/p>\n<p><strong>Command:<\/strong> <code>python ~\/NeMo\/examples\/asr\/asr_cache_aware_streaming\/speech_to_text_cache_aware_streaming_infer.py \\<\/code><\/p>\n<p><code>model_path=\/home\/ingmar\/.cache\/huggingface\/hub\/models--nvidia--nemotron-speech-streaming-en-0.6b\/snapshots\/7a9b763e6c5fb103da690219c049fac917aa50b1\/nemotron-speech-streaming-en-0.6b.nemo\\<\/code><\/p>\n<p><code>dataset_manifest=\/home\/ingmar\/asr\/manifest.jsonl \\<\/code><\/p>\n<p><code>batch_size=16 att_context_size=\"[70,13]\"\\<\/code><\/p>\n<p><code>output_path=\/home\/ingmar\/asr\/results \\<\/code><\/p>\n<p>It&#8217;s best to write the command <strong>on a single line without backslashes<\/strong> so that nothing can go wrong when pasting:<\/p>\n<p><strong>Command:<\/strong> <code>python ~\/NeMo\/examples\/asr\/asr_cache_aware_streaming\/speech_to_text_cache_aware_streaming_infer.py model_path=\/home\/ingmar\/.cache\/huggingface\/hub\/models--nvidia--nemotron-speech-streaming-en-0.6b\/snapshots\/7a9b763e6c5fb103da690219c049fac917aa50b1\/nemotron-speech-streaming-en-0.6b.nemo dataset_manifest=\/home\/ingmar\/asr\/manifest.jsonl batch_size=16 att_context_size=\"[70,13]\" output_path=\/home\/ingmar\/asr\/results<\/code><\/p>\n<p>You&#8217;ll then find the finished transcripts in the folder <code>~\/asr\/results<\/code>.<\/p>\n<p><strong>Note on the NeMo version:<\/strong> Which Hydra keys the script expects exactly (<code>model_path=<\/code> versus possibly <code>pretrained_name=<\/code>) and whether <code>duration<\/code> is mandatory in the manifest depends on the current state of the <code>main<\/code> branch. Before you guess, it&#8217;s worth a quick look at the script&#8217;s built-in help:<\/p>\n<p><strong>Command:<\/strong> <code>python examples\/asr\/asr_cache_aware_streaming\/speech_to_text_cache_aware_streaming_infer.py --help<\/code><\/p>\n<p>The second value in <code>att_context_size<\/code> is the <strong>right context<\/strong> and may take one of the values <code>{0, 1, 6, 13}<\/code>. That is exactly your latency control.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Step_7_Understanding_chunk_sizes_and_the_latency_control\"><\/span>Step 7: Understanding chunk sizes and the latency control<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>The latency is defined via <code>att_context_size = {left_context, right_context}<\/code>, measured in <strong>80 ms frames<\/strong>. The chunk size results from the current frame plus the right context:<\/p>\n<table>\n<thead>\n<tr>\n<th>att_context_size<\/th>\n<th>Chunk size<\/th>\n<th>Latency<\/th>\n<th>Recommended use<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>[70, 0]<\/td>\n<td>1 frame<\/td>\n<td><strong>0.08 s<\/strong><\/td>\n<td>Maximum reactivity (voice agent)<\/td>\n<\/tr>\n<tr>\n<td>[70, 1]<\/td>\n<td>2 frames<\/td>\n<td><strong>0.16 s<\/strong><\/td>\n<td>Highly reactive live applications<\/td>\n<\/tr>\n<tr>\n<td>[70, 6]<\/td>\n<td>7 frames<\/td>\n<td><strong>0.56 s<\/strong><\/td>\n<td>Good compromise<\/td>\n<\/tr>\n<tr>\n<td>[70, 13]<\/td>\n<td>14 frames<\/td>\n<td><strong>1.12 s<\/strong><\/td>\n<td>Best accuracy \/ live captioning<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Each chunk is processed strictly <strong>non-overlapping<\/strong> &#8211; this is the efficiency gain of the cache-aware architecture compared to classic buffering.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Step_8_End-to-end_pipeline_with_punctuation_ITN_and_translation\"><\/span>Step 8: End-to-end pipeline with punctuation, ITN and translation<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>If you want more than just plain transcription, use the <strong>pipeline method<\/strong>. It builds on the configuration file <code>cache_aware_rnnt.yaml<\/code> and provides complete workflows including <strong>punctuation &amp; capitalization (PnC)<\/strong>, <strong>inverse text normalization (ITN)<\/strong> and optional translation.<\/p>\n<p>You&#8217;ll find the configuration file in the NeMo repository under <code>examples\/asr\/conf\/asr_streaming_inference\/cache_aware_rnnt.yaml<\/code>. After that, the following code is enough:<\/p>\n<pre><code class=\"language-python\">from nemo.collections.asr.inference.factory.pipeline_builder import PipelineBuilder\r\nfrom omegaconf import OmegaConf\r\n\r\n# Path to the cache_aware_rnnt.yaml\r\ncfg_path = 'cache_aware_rnnt.yaml'\r\ncfg = OmegaConf.load(cfg_path)\r\n\r\n# Paths of all audio files to be transcribed\r\naudios = ['\/path\/to\/your\/audio.wav']\r\n\r\n# Create the pipeline object and run inference\r\npipeline = PipelineBuilder.build_pipeline(cfg)\r\noutput = pipeline.run(audios)\r\n\r\n# Output\r\nfor entry in output:\r\n    print(entry['text'])<\/code><\/pre>\n<h2><span class=\"ez-toc-section\" id=\"Performance_What_can_you_expect\"><\/span>Performance: What can you expect?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>NVIDIA measures accuracy via the <strong>Word Error Rate (WER)<\/strong> on the datasets of the HuggingFace OpenASR leaderboard. The latency-accuracy trade-off is clearly visible here: the larger the chunk, the lower the error rate.<\/p>\n<table>\n<thead>\n<tr>\n<th>Chunk size<\/th>\n<th>\u00d8 WER<\/th>\n<th>LS test-clean<\/th>\n<th>LS test-other<\/th>\n<th>AMI<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>1.12 s<\/strong><\/td>\n<td><strong>6.93 %<\/strong><\/td>\n<td>2.32 %<\/td>\n<td>4.84 %<\/td>\n<td>11.73 %<\/td>\n<\/tr>\n<tr>\n<td><strong>0.56 s<\/strong><\/td>\n<td><strong>7.07 %<\/strong><\/td>\n<td>2.46 %<\/td>\n<td>5.07 %<\/td>\n<td>11.88 %<\/td>\n<\/tr>\n<tr>\n<td><strong>0.16 s<\/strong><\/td>\n<td><strong>7.67 %<\/strong><\/td>\n<td>2.56 %<\/td>\n<td>5.57 %<\/td>\n<td>14.71 %<\/td>\n<\/tr>\n<tr>\n<td><strong>0.08 s<\/strong><\/td>\n<td><strong>8.43 %<\/strong><\/td>\n<td>2.80 %<\/td>\n<td>6.01 %<\/td>\n<td>18.29 %<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Remarkable: even at the most aggressive latency of 80 ms, the average WER is still below 9% &#8211; and for clean audio (LibriSpeech test-clean) even below 3%. For live applications, that&#8217;s an excellent value.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Tips_for_production_use\"><\/span>Tips for production use<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<ul>\n<li><strong>Parallel streams:<\/strong> Thanks to the efficient cache architecture, far more simultaneous streams fit on the GPU at the same VRAM than with buffered approaches. The good thing about this: it lowers the cost per transcription.<\/li>\n<li><strong>Choosing the right chunk size:<\/strong> For an interactive voice agent I&#8217;d start with [70, 1] or [70, 6]; for captioning or transcripts go with [70, 13] instead.<\/li>\n<li><strong>Check the audio format:<\/strong> Mono is mandatory. Wrong sample rates are the most common source of errors. When in doubt, always convert the audio file with ffmpeg beforehand.<\/li>\n<li><strong>Hosted variant for testing:<\/strong> If you first want to experiment without a local GPU, you can call the model via the NVIDIA NIM API on build.nvidia.com using <code>nvidia-riva-client<\/code>. For sovereign operation, though, the local installation remains my favorite.<\/li>\n<\/ul>\n<h2><span class=\"ez-toc-section\" id=\"Live_Demo_Web_App\"><\/span>Live Demo Web App<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Now we&#8217;ve set everything up, and basically the only thing still missing is a small web application that shows what NVIDIA Nemotron ASR is capable of. The Python program has to be started in the virtual environment as usual.<\/p>\n<p><strong>Command: <\/strong><code>source ~\/venvs\/nemotron-asr\/bin\/activate<\/code><\/p>\n<p>The Python program <code>asr_gradio_app.py<\/code> is available for download here.<\/p>\n<p><strong>Download:<\/strong> <a href=\"https:\/\/github.com\/custom-build-robots\/nemotron-asr-local-streaming-demo\" target=\"_blank\" rel=\"noopener\">https:\/\/github.com\/custom-build-robots\/nemotron-asr-local-streaming-demo<\/a><\/p>\n<p>Now save the Python program in a folder on your computer.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Option_1_%E2%80%93_SSH_tunnel\"><\/span>Option 1 &#8211; SSH tunnel:<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Next I had to set up an SSH tunnel to my server, because otherwise the microphone isn&#8217;t supported since I don&#8217;t have HTTPS encryption. To do this, I ran the following command in my Windows PowerShell, which establishes an SSH tunnel to my server.<\/p>\n<p><strong>Command:<\/strong> <code>ssh -L 3000:localhost:7860 ingmar@192.168.2.119<\/code><\/p>\n<p>After that I could simply enter the following URL in the browser and open the Gradio web app. It was then also possible to use the microphone for recording.<\/p>\n<p><strong>URL: <\/strong><code>http:\/\/localhost:3000\/<\/code><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Option_2_%E2%80%93_Gradio_public_share_URL\"><\/span>Option 2 &#8211; Gradio public share URL:<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>In the last line of the script I set share=True. This makes the Gradio app generate an https link that can be called from anywhere on the internet if you know the URL. Then you don&#8217;t have to work with the SSH tunnel and can also use the microphone directly.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Starting_the_web_app\"><\/span>Starting the web app<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Now start the Python program as follows.<\/p>\n<p><strong>Command:<\/strong> <code>python asr_gradio_app.py<\/code><\/p>\n<p>For me, the web interface looks like this.<\/p>\n<div id=\"attachment_2527\" style=\"width: 1034px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_Nemotron_ASR_Streaming-1024x430.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-2527\" class=\"size-large wp-image-2527\" src=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_Nemotron_ASR_Streaming-1024x430.jpg\" alt=\"NVIDIA Nemotron ASR Streaming\" width=\"1024\" height=\"430\" srcset=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_Nemotron_ASR_Streaming-1024x430.jpg 1024w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_Nemotron_ASR_Streaming-300x126.jpg 300w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_Nemotron_ASR_Streaming-768x322.jpg 768w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_Nemotron_ASR_Streaming-1080x453.jpg 1080w, https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_Nemotron_ASR_Streaming.jpg 1380w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/a><p id=\"caption-attachment-2527\" class=\"wp-caption-text\">NVIDIA Nemotron ASR Streaming<\/p><\/div>\n<h2><span class=\"ez-toc-section\" id=\"Conclusion\"><\/span>Conclusion<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>With Nemotron ASR Streaming you get a surprisingly powerful, compact speech recognition model that can be set up locally in under an hour. The <strong>Cache-Aware FastConformer-RNNT<\/strong> architecture delivers exactly what I need for a sovereign voice agent: low latency, good accuracy and the freedom to decide the trade-off at runtime myself. And all of that on your own hardware, without speech data wandering off into someone else&#8217;s cloud.<\/p>\n<p>For me, this is another building block in building a fully self-hosted AI stack. As a next step, I&#8217;d like to integrate the model into Open WebUI as an STT backend and combine it with a local TTS into an end-to-end voice loop. But that&#8217;s material for a separate post, if I get around to it.<\/p>\n<p>If you build this setup yourself: feel free to write in the comments which chunk size gives you the best results for your use case.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Real-time speech recognition is one of the building blocks I absolutely want to self-host for sovereign voice agents. My vision is always to run everything locally &#8211; without a cloud API, without my audio recording ever leaving my network. With the model updated in March 2026, NVIDIA Nemotron ASR Streaming (0.6B), there is now a [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":2528,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_et_pb_use_builder":"","_et_pb_old_content":"","_et_gb_content_width":"","footnotes":""},"categories":[162,8,50],"tags":[1596,1594,1597,1585,1587,1588,1593,1589,1590,1586,1595,315,1591,1032,1592],"class_list":["post-2530","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-large-language-models-en","category-news","category-top-story-en","tag-cache-aware-streaming","tag-fastconformer","tag-gradio-asr","tag-local-asr","tag-local-stt","tag-low-latency-asr","tag-nemo-toolkit","tag-nemotron-asr-streaming","tag-nvidia-nemotron","tag-real-time-speech-recognition","tag-rnnt","tag-rtx-a6000-en","tag-self-hosted-asr","tag-sovereign-ai","tag-speech-to-text","et-has-post-format-content","et_post_format-et-post-format-standard"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v28.0 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Install NVIDIA Nemotron ASR Streaming Locally - Step-by-Step Guide - Exploring the Future: Inside the AI Box<\/title>\n<meta name=\"description\" content=\"Install NVIDIA Nemotron ASR Streaming locally: a step-by-step guide to sovereign real-time speech recognition on your own GPU \u2014 incl. a Gradio live demo.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/ai-box.eu\/en\/news\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\/2530\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Install NVIDIA Nemotron ASR Streaming Locally - Step-by-Step Guide - Exploring the Future: Inside the AI Box\" \/>\n<meta property=\"og:description\" content=\"Install NVIDIA Nemotron ASR Streaming locally: a step-by-step guide to sovereign real-time speech recognition on your own GPU \u2014 incl. a Gradio live demo.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/ai-box.eu\/en\/news\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\/2530\/\" \/>\n<meta property=\"og:site_name\" content=\"Exploring the Future: Inside the AI Box\" \/>\n<meta property=\"article:published_time\" content=\"2026-06-13T19:25:49+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-06-13T19:31:08+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_Nemotron_ASR_Streaming.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1380\" \/>\n\t<meta property=\"og:image:height\" content=\"579\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Maker\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Ingmar_Stapel\" \/>\n<meta name=\"twitter:site\" content=\"@Ingmar_Stapel\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Maker\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"13 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\\\/2530\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\\\/2530\\\/\"},\"author\":{\"name\":\"Maker\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/#\\\/schema\\\/person\\\/cc91d08618b3feeef6926591b465eab1\"},\"headline\":\"Install NVIDIA Nemotron ASR Streaming Locally &#8211; Step-by-Step Guide\",\"datePublished\":\"2026-06-13T19:25:49+00:00\",\"dateModified\":\"2026-06-13T19:31:08+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\\\/2530\\\/\"},\"wordCount\":2161,\"commentCount\":0,\"image\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\\\/2530\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/ai-box.eu\\\/wp-content\\\/uploads\\\/2026\\\/06\\\/NVIDIA_Nemotron_ASR_Streaming.jpg\",\"keywords\":[\"Cache-Aware Streaming\",\"FastConformer\",\"Gradio ASR\",\"local ASR\",\"local STT\",\"low-latency ASR\",\"NeMo Toolkit\",\"Nemotron ASR Streaming\",\"NVIDIA Nemotron\",\"real-time speech recognition\",\"RNNT\",\"RTX A6000\",\"self-hosted ASR\",\"sovereign AI\",\"Speech-to-Text\"],\"articleSection\":[\"Large Language Models\",\"News\",\"Top story\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\\\/2530\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\\\/2530\\\/\",\"url\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\\\/2530\\\/\",\"name\":\"Install NVIDIA Nemotron ASR Streaming Locally - Step-by-Step Guide - Exploring the Future: Inside the AI Box\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\\\/2530\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\\\/2530\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/ai-box.eu\\\/wp-content\\\/uploads\\\/2026\\\/06\\\/NVIDIA_Nemotron_ASR_Streaming.jpg\",\"datePublished\":\"2026-06-13T19:25:49+00:00\",\"dateModified\":\"2026-06-13T19:31:08+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/#\\\/schema\\\/person\\\/cc91d08618b3feeef6926591b465eab1\"},\"description\":\"Install NVIDIA Nemotron ASR Streaming locally: a step-by-step guide to sovereign real-time speech recognition on your own GPU \u2014 incl. a Gradio live demo.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\\\/2530\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\\\/2530\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\\\/2530\\\/#primaryimage\",\"url\":\"https:\\\/\\\/ai-box.eu\\\/wp-content\\\/uploads\\\/2026\\\/06\\\/NVIDIA_Nemotron_ASR_Streaming.jpg\",\"contentUrl\":\"https:\\\/\\\/ai-box.eu\\\/wp-content\\\/uploads\\\/2026\\\/06\\\/NVIDIA_Nemotron_ASR_Streaming.jpg\",\"width\":1380,\"height\":579,\"caption\":\"NVIDIA Nemotron ASR Streaming\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/news\\\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\\\/2530\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Start\",\"item\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Install NVIDIA Nemotron ASR Streaming Locally &#8211; Step-by-Step Guide\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/#website\",\"url\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/\",\"name\":\"Exploring the Future: Inside the AI Box\",\"description\":\"Inside the AI Box, we share our experiences and discoveries in the world of artificial intelligence.\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/#\\\/schema\\\/person\\\/cc91d08618b3feeef6926591b465eab1\",\"name\":\"Maker\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g\",\"caption\":\"Maker\"},\"description\":\"I live in Bavaria near Munich. In my head I always have many topics and try out especially in the field of Internet new media much in my spare time. I write on the blog because it makes me fun to report about the things that inspire me. I am happy about every comment, about suggestion and very about questions.\",\"sameAs\":[\"https:\\\/\\\/ai-box.eu\"],\"url\":\"https:\\\/\\\/ai-box.eu\\\/en\\\/author\\\/ingmars\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Install NVIDIA Nemotron ASR Streaming Locally - Step-by-Step Guide - Exploring the Future: Inside the AI Box","description":"Install NVIDIA Nemotron ASR Streaming locally: a step-by-step guide to sovereign real-time speech recognition on your own GPU \u2014 incl. a Gradio live demo.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/ai-box.eu\/en\/news\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\/2530\/","og_locale":"en_US","og_type":"article","og_title":"Install NVIDIA Nemotron ASR Streaming Locally - Step-by-Step Guide - Exploring the Future: Inside the AI Box","og_description":"Install NVIDIA Nemotron ASR Streaming locally: a step-by-step guide to sovereign real-time speech recognition on your own GPU \u2014 incl. a Gradio live demo.","og_url":"https:\/\/ai-box.eu\/en\/news\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\/2530\/","og_site_name":"Exploring the Future: Inside the AI Box","article_published_time":"2026-06-13T19:25:49+00:00","article_modified_time":"2026-06-13T19:31:08+00:00","og_image":[{"width":1380,"height":579,"url":"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_Nemotron_ASR_Streaming.jpg","type":"image\/jpeg"}],"author":"Maker","twitter_card":"summary_large_image","twitter_creator":"@Ingmar_Stapel","twitter_site":"@Ingmar_Stapel","twitter_misc":{"Written by":"Maker","Est. reading time":"13 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/ai-box.eu\/en\/news\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\/2530\/#article","isPartOf":{"@id":"https:\/\/ai-box.eu\/en\/news\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\/2530\/"},"author":{"name":"Maker","@id":"https:\/\/ai-box.eu\/en\/#\/schema\/person\/cc91d08618b3feeef6926591b465eab1"},"headline":"Install NVIDIA Nemotron ASR Streaming Locally &#8211; Step-by-Step Guide","datePublished":"2026-06-13T19:25:49+00:00","dateModified":"2026-06-13T19:31:08+00:00","mainEntityOfPage":{"@id":"https:\/\/ai-box.eu\/en\/news\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\/2530\/"},"wordCount":2161,"commentCount":0,"image":{"@id":"https:\/\/ai-box.eu\/en\/news\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\/2530\/#primaryimage"},"thumbnailUrl":"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_Nemotron_ASR_Streaming.jpg","keywords":["Cache-Aware Streaming","FastConformer","Gradio ASR","local ASR","local STT","low-latency ASR","NeMo Toolkit","Nemotron ASR Streaming","NVIDIA Nemotron","real-time speech recognition","RNNT","RTX A6000","self-hosted ASR","sovereign AI","Speech-to-Text"],"articleSection":["Large Language Models","News","Top story"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/ai-box.eu\/en\/news\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\/2530\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/ai-box.eu\/en\/news\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\/2530\/","url":"https:\/\/ai-box.eu\/en\/news\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\/2530\/","name":"Install NVIDIA Nemotron ASR Streaming Locally - Step-by-Step Guide - Exploring the Future: Inside the AI Box","isPartOf":{"@id":"https:\/\/ai-box.eu\/en\/#website"},"primaryImageOfPage":{"@id":"https:\/\/ai-box.eu\/en\/news\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\/2530\/#primaryimage"},"image":{"@id":"https:\/\/ai-box.eu\/en\/news\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\/2530\/#primaryimage"},"thumbnailUrl":"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_Nemotron_ASR_Streaming.jpg","datePublished":"2026-06-13T19:25:49+00:00","dateModified":"2026-06-13T19:31:08+00:00","author":{"@id":"https:\/\/ai-box.eu\/en\/#\/schema\/person\/cc91d08618b3feeef6926591b465eab1"},"description":"Install NVIDIA Nemotron ASR Streaming locally: a step-by-step guide to sovereign real-time speech recognition on your own GPU \u2014 incl. a Gradio live demo.","breadcrumb":{"@id":"https:\/\/ai-box.eu\/en\/news\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\/2530\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/ai-box.eu\/en\/news\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\/2530\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/ai-box.eu\/en\/news\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\/2530\/#primaryimage","url":"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_Nemotron_ASR_Streaming.jpg","contentUrl":"https:\/\/ai-box.eu\/wp-content\/uploads\/2026\/06\/NVIDIA_Nemotron_ASR_Streaming.jpg","width":1380,"height":579,"caption":"NVIDIA Nemotron ASR Streaming"},{"@type":"BreadcrumbList","@id":"https:\/\/ai-box.eu\/en\/news\/install-nvidia-nemotron-asr-streaming-locally-step-by-step-guide\/2530\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Start","item":"https:\/\/ai-box.eu\/en\/"},{"@type":"ListItem","position":2,"name":"Install NVIDIA Nemotron ASR Streaming Locally &#8211; Step-by-Step Guide"}]},{"@type":"WebSite","@id":"https:\/\/ai-box.eu\/en\/#website","url":"https:\/\/ai-box.eu\/en\/","name":"Exploring the Future: Inside the AI Box","description":"Inside the AI Box, we share our experiences and discoveries in the world of artificial intelligence.","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/ai-box.eu\/en\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/ai-box.eu\/en\/#\/schema\/person\/cc91d08618b3feeef6926591b465eab1","name":"Maker","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/e96b93fc3c7e50c1f21c5c6b1f146dc4867936141360830b328947b32cacf93a?s=96&d=mm&r=g","caption":"Maker"},"description":"I live in Bavaria near Munich. In my head I always have many topics and try out especially in the field of Internet new media much in my spare time. I write on the blog because it makes me fun to report about the things that inspire me. I am happy about every comment, about suggestion and very about questions.","sameAs":["https:\/\/ai-box.eu"],"url":"https:\/\/ai-box.eu\/en\/author\/ingmars\/"}]}},"_links":{"self":[{"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/posts\/2530","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/comments?post=2530"}],"version-history":[{"count":1,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/posts\/2530\/revisions"}],"predecessor-version":[{"id":2531,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/posts\/2530\/revisions\/2531"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/media\/2528"}],"wp:attachment":[{"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/media?parent=2530"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/categories?post=2530"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ai-box.eu\/en\/wp-json\/wp\/v2\/tags?post=2530"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}