BEFORE ANYONE ESCALATES: ALL AI LLM EXPERIMENTS WERE CONDUCTED IN A KVM VM USING SOLAR POWER 😎☀️🙌

after a lot of trouble with airllm

OMG finally something that not only looks BEAUTIFUL BUT ALSO ACTUALLY WORKS 😀

AND it can do web search too? 🤔

it works on CPU but (of course) it even runs better on GPU (depending on available VRAM)

it does not seem to have the capacity to slice big LLM into chungs to be feed to the VRAM (was the idea behind airllm)

so any model that does not fit in GPU’s VRAM most likely will be run in CPU + RAM

sample output of the install.sh bash script:

su - root; # become root
apt install libcurl4-openssl-dev; # install this dependency
Ctrl+D; # log off root, become non-root user
curl -fsSL https://unsloth.ai/install.sh | sh

  🦥 Unsloth Studio Installer
  ────────────────────────────────────────────────────

  platform       linux (<- should be gnu linux but ok)
  deps           all system dependencies found
  venv           creating Python 3.13 virtual environment
                 /home/user/.unsloth/studio/unsloth_studio
  venv           using environment
                 /home/user/.unsloth/studio/unsloth_studio
                 installing PyTorch (https://download.pytorch.org/whl/cu128)...
                 installing unsloth (this may take a few minutes)...
  setup          running unsloth studio update...

  🦥 Unsloth Studio Setup
  ────────────────────────────────────────────────────
  frontend       up to date
  deps           [==========----------]  7/13  flash-attn          


  deps           [====================] 13/13  ROCm torch (final)  
  deps           installed
  transformers   5.3.0 pre-installed
  transformers   5.5.0 pre-installed
                 installing prebuilt llama.cpp...
                 existing install detected -- validating update
  llama.cpp      prebuilt up to date and validated
                 installed release: unslothai/llama.cpp@b8955
  ────────────────────────────────────────────────────
  Unsloth Studio Installed
  ────────────────────────────────────────────────────
  launch         unsloth studio -H 0.0.0.0 -p 8888

                 Created Unsloth Studio shortcut

  Unsloth Studio installed!
  ────────────────────────────────────────────────────

  launch         starting Unsloth Studio...
Starting Unsloth Studio on http://145.224.73.99:8888 <- not sure how it detemined this IP adress (it is clearly wrong, just use ip -c a)


[OK] Frontend loaded from /home/user/.unsloth/studio/unsloth_studio/lib/python3.13/site-packages/studio/frontend/dist
INFO:     Started server process [2559]
INFO:     Waiting for application startup.
Hardware detected: CUDA -- NVIDIA GeForce RTX 5060 Ti
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8888 (Press CTRL+C to quit)
{"timestamp": "2026-04-30T11:40:37.094668Z", "level": "info", "event": "Pre-caching helper GGUF: unsloth/gemma-4-E2B-it-GGUF/gemma-4-E2B-it-UD-Q4_K_XL.gguf"}
{"timestamp": "2026-04-30T11:40:37.244890Z", "level": "info", "event": "Helper GGUF cached: 1 file(s)"}

🦥 Unsloth Studio is running
────────────────────────────────────────────────────
  On this machine -- open this in your browser:
    http://127.0.0.1:8888
    (same as http://localhost:8888)

  From another device on your network / to share:
    http://145.224.73.99:8888

  API & health:
    http://127.0.0.1:8888/api
    http://127.0.0.1:8888/api/health
────────────────────────────────────────────────────
  Tip: if you are on this computer, open http://localhost:8888/ in your browser.

{"timestamp": "2026-04-30T11:40:43.721650Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/chat", "status_code": 200, "process_time_ms": 75.09}
{"timestamp": "2026-04-30T11:40:46.072342Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/", "status_code": 200, "process_time_ms": 1.49}
{"timestamp": "2026-04-30T11:40:48.976195Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/health", "status_code": 200, "process_time_ms": 0.49}
{"timestamp": "2026-04-30T11:40:48.983286Z", "level": "info", "event": "request_completed", "method": "GET", "path": "/api/auth/status", "status_code": 200, "process_time_ms": 1.28}

it is so refreshing if beautiful software “just works” and does not end in (almost) endless python errors X-D

start stop:

how to stop it: Ctrl+C

how to (re)start it:

source /home/user/.unsloth/studio/unsloth_studio; unsloth studio -H 0.0.0.0 -p 8888;

nerdingoff:

  • just btw. OF COURSE all CPU and GPU used in this tests are running on 100% solar power
  • the trouble with running LLM AI locally and especially airllm: insane amount of resource use CPU, GPU also DISK SPACE
  • and OF COURSE this dev admin user build a MASSIVE 1 PETABYTE storage server (not there yet but have passed the theory stage and right now 5x 2TB disks in a RAID10 are working nicely and have been sshfs mounted,
  • airllm managed to consume aproximately 1TB and is still not working correctly X-D

problems:

unsloth does not seem to have any capability to use CPU + RAM + GPU + VRAM

if the model does not fit into VRAM, it just quits.

{"timestamp": "2026-05-01T11:01:56.744300Z", "level": "info", "event": "Starting llama-server: /home/user/.unsloth/llama.cpp/llama-server -m /home/user/.cache/huggingface/hub/models--unsloth--gpt-oss-20b-GGUF/snapshots/d449b42d93e1c2c7bda5312f5c25c8fb91dfa9b4/gpt-oss-20b-UD-Q4_K_XL.gguf --port 51347 -c 131072 --parallel 1 --flash-attn on --no-context-shift -ngl -1 --jinja --spec-type ngram-mod --spec-ngram-size-n 24 --draft-min 48 --draft-max 64 --chat-template-kwargs {\"reasoning_effort\": \"high\"}"}
{"timestamp": "2026-05-01T11:01:57.253576Z", "level": "error", "event": "llama-server exited with code 1. Output: ggml_cuda_init: found 1 CUDA devices (Total VRAM: 15858 MiB):\n Device 0: NVIDIA GeForce RTX 5060 Ti, compute capability 12.0, VMM: yes, VRAM: 15858 MiB\nload_backend: loaded CUDA backend from /home/user/.unsloth/llama.cpp/build/bin/libggml-cuda.so\nload_backend: loaded CPU backend from /home/user/.unsloth/llama.cpp/build/bin/libggml-cpu-haswell.so\nerror while handling argument \"--spec-ngram-size-n\": the argument has been removed. use the respective --spec-ngram-*-size-n\nusage:\n--spec-ngram-size-n N the argument has been removed. use the respective\n --spec-ngram-*-size-n or --spec-ngram-mod-n-match\nto show complete usage, run with -h"}
{"timestamp": "2026-05-01T11:01:57.254273Z", "level": "error", "event": "Error loading model: llama-server failed to start. Check that the GGUF file is valid and you have enough memory.", "exc_info": true}
{"timestamp": "2026-05-01T11:01:57.254524Z", "level": "info", "event": "request_completed", "method": "POST", "path": "/api/inference/load", "status_code": 500, "process_time_ms": 1679379.66}

 

liked this article?

  • only together we can create a truly free world
  • plz support dwaves to keep it up & running!
  • (yes the info on the internet is (mostly) free but beer is still not free (still have to work on that))
  • really really hate advertisement
  • contribute: whenever a solution was found, blog about it for others to find!
  • talk about, recommend & link to this blog and articles
  • thanks to all who contribute!
admin