BEFORE ANYONE ESCALATES: ALL AI LLM EXPERIMENTS WERE CONDUCTED IN A KVM VM USING SOLAR POWER 😎☀️🙌
after a lot of trouble with airllm
- have the user heared of https://www.nomic.ai/gpt4all
- a desktop gui that makes selecting and downloading models easy
- and https://unsloth.ai/docs/new/studio ?
- a beautiful web gui that makes selecting and downloading models easy
OMG finally something that not only looks BEAUTIFUL BUT ALSO ACTUALLY WORKS 😀
AND it can do web search too? 🤔
it works on CPU but (of course) it even runs better on GPU (depending on available VRAM)
it does not seem to have the capacity to slice big LLM into chungs to be feed to the VRAM (was the idea behind airllm)
so any model that does not fit in GPU’s VRAM most likely will be run in CPU + RAM
sample output of the install.sh bash script:
su - root; # become root apt install libcurl4-openssl-dev; # install this dependency Ctrl+D; # log off root, become non-root user curl -fsSL https://unsloth.ai/install.sh | sh 🦥 Unsloth Studio Installer ──────────────────────────────────────────────────── platform linux (<- should be gnu linux but ok) deps all system dependencies found venv creating Python 3.13 virtual environment /home/user/.unsloth/studio/unsloth_studio venv using environment /home/user/.unsloth/studio/unsloth_studio installing PyTorch (https://download.pytorch.org/whl/cu128)... installing unsloth (this may take a few minutes)... setup running unsloth studio update... 🦥 Unsloth Studio Setup ──────────────────────────────────────────────────── frontend up to date deps [==========----------] 7/13 flash-attn deps [====================] 13/13 ROCm torch (final) deps installed transformers 5.3.0 pre-installed transformers 5.5.0 pre-installed installing prebuilt llama.cpp... existing install detected -- validating update llama.cpp prebuilt up to date and validated installed release: unslothai/llama.cpp@b8955 ──────────────────────────────────────────────────── Unsloth Studio Installed ──────────────────────────────────────────────────── launch unsloth studio -H 0.0.0.0 -p 8888 Created Unsloth Studio shortcut Unsloth Studio installed! ──────────────────────────────────────────────────── launch starting Unsloth Studio... Starting Unsloth Studio on http://145.224.73.99:8888 <- not sure how it detemined this IP adress (it is clearly wrong, just use ip -c a) [OK] Frontend loaded from /home/user/.unsloth/studio/unsloth_studio/lib/python3.13/site-packages/studio/frontend/dist INFO: Started server process [2559] INFO: Waiting for application startup. Hardware detected: CUDA -- NVIDIA GeForce RTX 5060 Ti INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:8888 (Press CTRL+C to quit) {"timestamp": "2026-04-30T11:40:37.094668Z", "level": "info", "event": "Pre-caching helper GGUF: unsloth/gemma-4-E2B-it-GGUF/gemma-4-E2B-it-UD-Q4_K_XL.gguf"} {"timestamp": "2026-04-30T11:40:37.244890Z", "level": "info", "event": "Helper GGUF cached: 1 file(s)"} 🦥 Unsloth Studio is running ──────────────────────────────────────────────────── On this machine -- open this in your browser: http://127.0.0.1:8888 (same as http://localhost:8888) From another device on your network / to share: http://145.224.73.99:8888 API & health: http://127.0.0.1:8888/api http://127.0.0.1:8888/api/health ──────────────────────────────────────────────────── Tip: if you are on this computer, open http://localhost:8888/ in your browser. {"timestamp": "2026-04-30T11:40:43.721650Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/chat", "status_code": 200, "process_time_ms": 75.09} {"timestamp": "2026-04-30T11:40:46.072342Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/", "status_code": 200, "process_time_ms": 1.49} {"timestamp": "2026-04-30T11:40:48.976195Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/health", "status_code": 200, "process_time_ms": 0.49} {"timestamp": "2026-04-30T11:40:48.983286Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/auth/status", "status_code": 200, "process_time_ms": 1.28}
it is so refreshing if beautiful software “just works” and does not end in (almost) endless python errors X-D
start stop:
how to stop it: Ctrl+C
how to (re)start it:
source /home/user/.unsloth/studio/unsloth_studio; unsloth studio -H 0.0.0.0 -p 8888;
nerdingoff:
- just btw. OF COURSE all CPU and GPU used in this tests are running on 100% solar power
- the trouble with running LLM AI locally and especially airllm: insane amount of resource use CPU, GPU also DISK SPACE
- and OF COURSE this dev admin user build a MASSIVE 1 PETABYTE storage server (not there yet but have passed the theory stage and right now 5x 2TB disks in a RAID10 are working nicely and have been sshfs mounted,
- network transfer speed performance is okayish 😀 (sometimes 100MByte/sec can be observed… limit is most likely the 1GBit switches)
- of course better speeds could be acchieved with a NFS direct pcA <-LAN->pcB connection (for security reasons this is the only way NFS and SMB should be used?)
- airllm managed to consume aproximately 1TB and is still not working correctly X-D
problems:
unsloth does not seem to have any capability to use CPU + RAM + GPU + VRAM
if the model does not fit into VRAM, it just quits.
{"timestamp": "2026-05-01T11:01:56.744300Z", "level": "info", "event": "Starting llama-server: /home/user/.unsloth/llama.cpp/llama-server -m /home/user/.cache/huggingface/hub/models--unsloth--gpt-oss-20b-GGUF/snapshots/d449b42d93e1c2c7bda5312f5c25c8fb91dfa9b4/gpt-oss-20b-UD-Q4_K_XL.gguf --port 51347 -c 131072 --parallel 1 --flash-attn on --no-context-shift -ngl -1 --jinja --spec-type ngram-mod --spec-ngram-size-n 24 --draft-min 48 --draft-max 64 --chat-template-kwargs {\"reasoning_effort\": \"high\"}"}
{"timestamp": "2026-05-01T11:01:57.253576Z", "level": "error", "event": "llama-server exited with code 1. Output: ggml_cuda_init: found 1 CUDA devices (Total VRAM: 15858 MiB):\n Device 0: NVIDIA GeForce RTX 5060 Ti, compute capability 12.0, VMM: yes, VRAM: 15858 MiB\nload_backend: loaded CUDA backend from /home/user/.unsloth/llama.cpp/build/bin/libggml-cuda.so\nload_backend: loaded CPU backend from /home/user/.unsloth/llama.cpp/build/bin/libggml-cpu-haswell.so\nerror while handling argument \"--spec-ngram-size-n\": the argument has been removed. use the respective --spec-ngram-*-size-n\nusage:\n--spec-ngram-size-n N the argument has been removed. use the respective\n --spec-ngram-*-size-n or --spec-ngram-mod-n-match\nto show complete usage, run with -h"}
{"timestamp": "2026-05-01T11:01:57.254273Z", "level": "error", "event": "Error loading model: llama-server failed to start. Check that the GGUF file is valid and you have enough memory.", "exc_info": true}
{"timestamp": "2026-05-01T11:01:57.254524Z", "level": "info", "event": "request_completed", "method": "POST", "path": "/api/inference/load", "status_code": 500, "process_time_ms": 1679379.66}
liked this article?
- only together we can create a truly free world
- plz support dwaves to keep it up & running!
- (yes the info on the internet is (mostly) free but beer is still not free (still have to work on that))
- really really hate advertisement
- contribute: whenever a solution was found, blog about it for others to find!
- talk about, recommend & link to this blog and articles
- thanks to all who contribute!


