My Hermes Agent has been able to search the web and extract page content for a while now. Until recently I handled that with Tavily for search and a self-hosted Firecrawl for extraction. The setup worked flawlessly, right up until my Hermes Agent had burned through the free 1,000 API requests to Tavily. And one thing really started to bother me: every web search ran through Tavily and their external API service, with a free-tier limit of those 1,000 search queries per month. For an agent that likes to research autonomously and runs in cron jobs, that’s a number you have to keep in the back of your mind, and one you hit when there are still plenty of days left in the month.

So I moved the search to where most things already run anyway on my side: onto my own application server. What I’m now using, as of recently, is SearXNG. In this post I’ll show you how easy it is to set up, and why the combination ends up bringing together the best of three worlds.

I followed the official guide for my Ubuntu system here.

URL: searxng-free-self-hosted

Why self-host in the first place?

Anyone who knows my posts here on ai-box.eu can already guess the answer: digital sovereignty. My inference setup with the two RTX A6000s runs locally, the models live with me, and the power comes in good part from my own roof. So it feels inconsistent to delegate, of all things, the web search, the part that reveals the most about my interests and research topics, to an external service.

SearXNG is the ideal solution here. It’s a privacy-friendly, open-source metasearch engine that aggregates results from over 70 search engines. No trackers, no profiling, no API key and, above all: no rate limit when you host it yourself.

SearXNG in five minutes with Docker

The setup is pleasantly straightforward. I created a working directory and wrote a docker-compose.yml.

Command: mkdir -p ~/searxng/searxng

Command: cd ~/searxng/searxng

Command: nano docker-compose.yml

Now paste the following configuration into the docker-compose.yml file.

services:
  searxng:
    image: searxng/searxng:latest
    container_name: searxng
    ports:
      - "8888:8080"
    volumes:
      - ./searxng:/etc/searxng:rw
    environment:
      - SEARXNG_BASE_URL=http://localhost:8888/
    restart: unless-stopped

You save the docker-compose.yml file with Ctrl + x followed by a y.

A docker compose up -d later, the container was running. So far, so easy.

Command: docker compose up -d

The one pitfall you need to know about: by default SearXNG only serves HTML. But the Hermes Agent needs the JSON format. You have to enable that first. To do so, I copied the generated configuration out of the container:

Command: docker cp searxng:/etc/searxng/settings.yml ~/searxng/searxng/settings.yml

Now open the file:

Command: nano ~/searxng/searxng/settings.yml

In the settings.yml you’ll then find the formats block (around line 84 for me). There you simply add json. It should then look like the example below.

formats:
  - html
  - json

Save the change in the file with Ctrl + x followed by a y.

After that, copy the file back into the container and restart it:

Command: docker cp ~/searxng/searxng/settings.yml searxng:/etc/searxng/settings.yml

Command: docker restart searxng

To check that everything works, you run a quick curl test:

curl -s "http://localhost:8888/search?q=test&format=json" | python3 -c \
  "import sys,json; d=json.load(sys.stdin); print(f'{len(d[\"results\"])} results')"

For me, 21 results came back right away. If you see a 403 Forbidden instead, the JSON format isn’t active yet. In that case it can help to double-check the indentation in the settings.yml.

Hermes Agent: configuring search and extraction separately

Now comes the genuinely elegant part. Hermes lets you use different providers for search and extraction. That’s exactly what I need: SearXNG can only search, not extract. So extraction stays with my self-hosted Firecrawl.

In the ~/.hermes/.env I added the SearXNG URL. The Tavily key already in use and the Firecrawl base URL were already in there:

SEARXNG_URL=http://localhost:8888
FIRECRAWL_API_URL=http://localhost:3002
TAVILY_API_KEY=tvly-...

And in the ~/.hermes/config.yaml I switched over the web section:

web:
  backend: ''
  search_backend: 'searxng'
  extract_backend: 'firecrawl'

That was it. From now on, every web search runs through my local SearXNG, and the extraction of page content through my local Firecrawl. Both on my own server, without an external API call.

Important to know: Hermes has no automatic fallback between two search backends. But Tavily doesn’t disappear because of that. The API key stays in the .env as a reserve. If I do happen to need Tavily’s AI-optimized search for a particular task, I just switch search_backend over briefly.

The last cloud path: summarizing locally with Ollama

While reading through the docs, I noticed a mechanism that rounds off the cloud-free approach perfectly. Long pages, forum threads, docs sites, news articles with endless comments, are run by Hermes through an auxiliary model before being handed to the agent, compressing the content. This keeps the context window clean and the costs low.

By default this helper model uses the same model as the main chat. And this is exactly where it gets interesting for my setup: this task doesn’t need to go to the cloud at all. I simply let a locally running Ollama model handle it. In the config.yaml it looks like this:

auxiliary:
  web_extract:
    provider: dual-a6000          # my local Ollama provider
    model: qwen3.6:27b
    timeout: 360

I’m using qwen3.6:27b here, because the model already runs as a workhorse on my side and, at 17 GB, fits easily on the A6000. If you want it even snappier, go for a smaller model like mistral-small3.2:24b. For simply summarizing long pages, that’s more than enough. With that, the last remaining external path is closed: search, extraction, and summarization now run entirely on my own hardware.

Conclusion

With manageable effort, I’ve brought my Hermes Agent’s web search from an external service onto my own server. The result is a clean three-layer architecture: SearXNG for search, Firecrawl for extraction, a local Ollama model for summarization. Everything now runs locally, and I always thought this would be far more difficult. Tavily I only keep as an optional reserve.

What I like best about it: there’s no more search limit I have to keep in the back of my mind, my research topics don’t leave my own server, and the configuration is so transparent that I always know which request goes where. This is exactly how I picture sovereign AI. I don’t want sovereignty understood as a dogma, but as a practical architecture in which every external building block is a deliberate decision and I have full control.

If you run Hermes in a similar way: the conversion is a matter of fifteen minutes. And the good feeling of having your search back in your own hands lasts a lot longer.