My Hermes Agent has been able to search the web and extract page content for a while now. Until recently I handled that with Tavily for search and a self-hosted Firecrawl for extraction. The setup worked flawlessly, right up until my Hermes Agent had burned through the free 1,000 API requests to Tavily. And one thing really started to bother me: every web search ran through Tavily and their external API service, with a free-tier limit of those 1,000 search queries per month. For an agent that likes to research autonomously and runs in cron jobs, that’s a number you have to keep in the back of your mind, and one you hit when there are still plenty of days left in the month.
So I moved the search to where most things already run anyway on my side: onto my own application server. What I’m now using, as of recently, is SearXNG. In this post I’ll show you how easy it is to set up, and why the combination ends up bringing together the best of three worlds.
I followed the official guide for my Ubuntu system here.
Why self-host in the first place?
Anyone who knows my posts here on ai-box.eu can already guess the answer: digital sovereignty. My inference setup with the two RTX A6000s runs locally, the models live with me, and the power comes in good part from my own roof. So it feels inconsistent to delegate, of all things, the web search, the part that reveals the most about my interests and research topics, to an external service.
SearXNG is the ideal solution here. It’s a privacy-friendly, open-source metasearch engine that aggregates results from over 70 search engines. No trackers, no profiling, no API key and, above all: no rate limit when you host it yourself.
SearXNG in five minutes with Docker
The setup is pleasantly straightforward. I created a working directory and wrote a docker-compose.yml.
Command: mkdir -p ~/searxng/searxng
Command: cd ~/searxng/searxng
Command: nano docker-compose.yml
Now paste the following configuration into the docker-compose.yml file.
services:
searxng:
image: searxng/searxng:latest
container_name: searxng
ports:
- "8888:8080"
volumes:
- ./searxng:/etc/searxng:rw
environment:
- SEARXNG_BASE_URL=http://localhost:8888/
restart: unless-stopped
You save the docker-compose.yml file with Ctrl + x followed by a y.
A docker compose up -d later, the container was running. So far, so easy.
Command: docker compose up -d
The one pitfall you need to know about: by default SearXNG only serves HTML. But the Hermes Agent needs the JSON format. You have to enable that first. To do so, I copied the generated configuration out of the container:
Command: docker cp searxng:/etc/searxng/settings.yml ~/searxng/searxng/settings.yml
Now open the file:
Command: nano ~/searxng/searxng/settings.yml
In the settings.yml you’ll then find the formats block (around line 84 for me). There you simply add json. It should then look like the example below.
formats:
- html
- json
Save the change in the file with Ctrl + x followed by a y.
After that, copy the file back into the container and restart it:
Command: docker cp ~/searxng/searxng/settings.yml searxng:/etc/searxng/settings.yml
Command: docker restart searxng
To check that everything works, you run a quick curl test:
curl -s "http://localhost:8888/search?q=test&format=json" | python3 -c \
"import sys,json; d=json.load(sys.stdin); print(f'{len(d[\"results\"])} results')"
For me, 21 results came back right away. If you see a 403 Forbidden instead, the JSON format isn’t active yet. In that case it can help to double-check the indentation in the settings.yml.
Hermes Agent: configuring search and extraction separately
Now comes the genuinely elegant part. Hermes lets you use different providers for search and extraction. That’s exactly what I need: SearXNG can only search, not extract. So extraction stays with my self-hosted Firecrawl.
In the ~/.hermes/.env I added the SearXNG URL. The Tavily key already in use and the Firecrawl base URL were already in there:
SEARXNG_URL=http://localhost:8888
FIRECRAWL_API_URL=http://localhost:3002
TAVILY_API_KEY=tvly-...
And in the ~/.hermes/config.yaml I switched over the web section:
web:
backend: ''
search_backend: 'searxng'
extract_backend: 'firecrawl'
That was it. From now on, every web search runs through my local SearXNG, and the extraction of page content through my local Firecrawl. Both on my own server, without an external API call.
Important to know: Hermes has no automatic fallback between two search backends. But Tavily doesn’t disappear because of that. The API key stays in the .env as a reserve. If I do happen to need Tavily’s AI-optimized search for a particular task, I just switch search_backend over briefly.
The last cloud path: summarizing locally with Ollama
While reading through the docs, I noticed a mechanism that rounds off the cloud-free approach perfectly. Long pages, forum threads, docs sites, news articles with endless comments, are run by Hermes through an auxiliary model before being handed to the agent, compressing the content. This keeps the context window clean and the costs low.
By default this helper model uses the same model as the main chat. And this is exactly where it gets interesting for my setup: this task doesn’t need to go to the cloud at all. I simply let a locally running Ollama model handle it. In the config.yaml it looks like this:
auxiliary:
web_extract:
provider: dual-a6000 # my local Ollama provider
model: qwen3.6:27b
timeout: 360
I’m using qwen3.6:27b here, because the model already runs as a workhorse on my side and, at 17 GB, fits easily on the A6000. If you want it even snappier, go for a smaller model like mistral-small3.2:24b. For simply summarizing long pages, that’s more than enough. With that, the last remaining external path is closed: search, extraction, and summarization now run entirely on my own hardware.
Conclusion
With manageable effort, I’ve brought my Hermes Agent’s web search from an external service onto my own server. The result is a clean three-layer architecture: SearXNG for search, Firecrawl for extraction, a local Ollama model for summarization. Everything now runs locally, and I always thought this would be far more difficult. Tavily I only keep as an optional reserve.
What I like best about it: there’s no more search limit I have to keep in the back of my mind, my research topics don’t leave my own server, and the configuration is so transparent that I always know which request goes where. This is exactly how I picture sovereign AI. I don’t want sovereignty understood as a dogma, but as a practical architecture in which every external building block is a deliberate decision and I have full control.
If you run Hermes in a similar way: the conversion is a matter of fifteen minutes. And the good feeling of having your search back in your own hands lasts a lot longer.







The tutorial offers a clear and practical guide for setting up and running the Tensorflow Object Detection Training Suite. Could…
This works using an very old laptop with old GPU >>> print(torch.cuda.is_available()) True >>> print(torch.version.cuda) 12.6 >>> print(torch.cuda.device_count()) 1 >>>…
Hello Valentin, I will not share anything related to my work on detecting mines or UXO's. Best regards, Maker
Hello, We are a group of students at ESILV working on a project that aim to prove the availability of…