Install AI Toolkit on Gigabyte AI TOP ATOM - Step-by-Step Guide - Part 2-2

Table of Contents

Phase 8: Compiling and Starting the Web UI

Now I can compile and start the web UI. First, I switch to the UI directory of the AI Toolkit:

Command: cd ui

If you haven’t cloned the AI Toolkit repository yet, you need to do that first. The UI directory should be located in the root directory of the AI Toolkit repository.

Now I start the build and start process:

Command: npm run build_and_start

This command performs two steps: First, all Node.js dependencies are installed and the UI is compiled (build), then the server is started (start). The process can take a few minutes, especially the first time, as all dependencies must be downloaded.

During the build process, you will see many outputs in the terminal. This is normal. If errors occur, check the error messages – often they are due to missing dependencies or network problems.

After a successful start, you will see a message that looks something like this:

Server running on http://0.0.0.0:8675

The web UI is now accessible! Open http://<IP-Address-AI-TOP-ATOM>:8675 in your browser (replace <IP-Address-AI-TOP-ATOM> with the IP address of your AI TOP ATOM).

GIGABYTE AI TOP ATOM – AI Toolkit by Ostris

To find the IP address of your AI TOP ATOM, use:

Command: hostname -I

If a firewall is active, you must open port 8675:

Command: sudo ufw allow 8675

Important Note: The Conda environment ai-toolkit must be activated when you start the UI. If the UI does not start or errors occur, check if the environment is activated.

Phase 9: Starting the First Training Job

A very good introduction to training with the AI Toolkit is provided by Ostris’s YouTube channel, with posts on different models and how you can train your own LoRA with the AI Toolkit, for example: AI Toolkit on YouTube

After the web UI has been successfully started, you can configure your first training job. The web interface offers various options for configuration:

Model selection and configuration
Dataset management
Training parameters (Learning Rate, Batch Size, etc.)
Extended configurations via the Advanced Config

Configure your training job in the web UI and start it. You can track progress directly in the UI. You can monitor GPU usage in parallel with nvidia-smi or via the DGX Dashboard.

Tip: For your first test, I recommend using a small dataset and a smaller model to verify functionality before starting larger training jobs.

GIGABYTE AI TOP ATOM – AI Toolkit Ostris

Troubleshooting: Common Problems and Solutions

During my time with the AI Toolkit on the AI TOP ATOM, I encountered some typical problems. Here are the most common ones and how I solved them:

“Command not found” with conda commands: Miniconda is not in the PATH or the bash session was not restarted. Run source ~/.bashrc or open a new terminal window.
Conda environment is not activating: Check if the environment exists with conda env list. If not, recreate it with conda create --name ai-toolkit python=3.11.
PyTorch CUDA support not available: Check with python -c "import torch; print(torch.cuda.is_available())". If False, check the CUDA drivers with nvidia-smi and ensure the correct PyTorch version was installed.
npm command not found: npm was not installed. Install it with sudo apt install npm or ensure that Node.js was correctly installed and is in the PATH.
UI does not start or shows errors: Check if the Conda environment is activated. The UI must be started within the activated environment. Also check the logs in the terminal for detailed error messages.
Training job does not start or crashes: Training normally runs via a Python script. If the UI shows no output, you can start the training directly via the command line. To do this, copy the configuration from the Advanced Config into a YAML file and run: python run.py path/to/train.yaml (with the Conda environment activated).
Access to the UI from the network does not work: Check the firewall settings and ensure that port 8675 is open. Also check if both computers are on the same network.
Node.js version not compatible: Ensure you are using an ARM64 version of Node.js. The x86_64 version does not work on the AI TOP ATOM.

Starting Training Directly via the Command Line

If you have problems with the web UI or want to start the training directly via the command line, you can copy the configuration from the Advanced Config of the web UI and save it in a YAML file. Then start the training with:

Command: python run.py path/to/train.yaml

Important: The Conda environment ai-toolkit must be activated before you run this command. The training then runs directly in the terminal, and you will see all output in real time.

This method is particularly useful for debugging as you can see all error messages directly.

Reactivating the Conda Environment After a Restart

After a system restart, you must reactivate the Conda environment before starting the UI:

Command: conda activate ai-toolkit

If you want to start the UI automatically at system startup, you can create a systemd service or add a startup script to ~/.bashrc. Make sure the Conda environment is activated before the UI is started.

Rollback: Removing the AI Toolkit Again

If you want to completely remove the AI Toolkit from the AI TOP ATOM, run the following commands on the system:

First, stop the UI (if it’s running) with Ctrl+C in the terminal where it was started.

Remove the Conda environment:

Command: conda deactivate

Command: conda env remove --name ai-toolkit

If you also want to remove Miniconda:

Command: rm -rf ~/miniconda3

Remove the Miniconda entries from the ~/.bashrc file, if present.

If you want to remove Node.js:

Command: sudo rm -rf /opt/node-v24.11.1-linux-arm64

Remove the Node.js entries from the ~/.bashrc file.

If npm was installed via apt:

Command: sudo apt remove npm

Important Note: These commands remove all training data, models, and configurations stored in the Conda environment. Make sure you really want to remove everything before running these commands. Training checkpoints and models cannot be easily restored.

Summary & Conclusion

The installation of the AI Toolkit on the Gigabyte AI TOP ATOM is surprisingly straightforward thanks to compatibility with the NVIDIA DGX instructions. In about 30-45 minutes, I have set up the AI Toolkit and can now train my own models via the user-friendly web interface.

What particularly excites me: The Conda environment allows for a clean isolation from the system Python installation, and the web UI makes configuring and starting training jobs significantly easier than via the command line. The performance of the Blackwell GPU is fully utilized, and I can also start and monitor training jobs from other computers on the network.

I also find it particularly practical that the web UI offers an Advanced Config, which makes it possible to create complex configurations and export them directly as YAML files. This makes it easy to reproduce or share training jobs.

For teams or developers who want to train their own AI models, this is a perfect solution: A central server with full GPU power, on which training jobs can be started and monitored via an intuitive web interface. The Conda environment ensures that everything is cleanly isolated and can be easily removed.

If you have questions or encounter problems, feel free to look at the official NVIDIA DGX Spark documentation or the documentation of the AI Toolkit by Ostris. The community is very helpful, and most problems can be solved quickly.

Next Step: Preparing Own Datasets and Training Models

You have now successfully installed the AI Toolkit and the web UI is running. The basic installation works, but that is just the beginning. The next step is to prepare your own datasets and configure training jobs for your specific use cases.

The AI Toolkit supports various dataset formats and training methods. Experiment with different configurations to achieve the best results for your models. The web UI makes it easy to try out different parameters and compare the results.

A very good entry point is Ostris’s YouTube channel with posts on different models that he expands, for example, with his own LoRAs in his contributions: AI Toolkit on YouTube

Good luck experimenting with the AI Toolkit on your Gigabyte AI TOP ATOM. I am excited to see which models you train with it! Let me and my readers know here in the comments.