KoboldAI does not support GGML/GGUF models. For GGML/GGUF support, see KoboldCPP.
KoboldAI is a backend for text generation which serves as a gateway for model text writing.
Make sure you don't have a
B:
drive.
Download KoboldAI from the link below and run the Windows installer.
Once KoboldAI finishes installing, run the shortcut that has been placed in your Desktop or Start Menu to launch KoboldAI.
Make sure you have
git
installed on your system.
Do not install KoboldAI using administrative permissions.
git clone https://github.com/henk717/KoboldAI && cd KoboldAI
./install_requirements.sh cuda # For NVIDIA
./install_requirements.sh rocm # For AMD
./play.sh # Running via Localhost
./play.sh --host <ip addr> # Running via Local Network
./play.sh --remote # Running via Cloudflare Links (outside home)
Replace
<ip addr>
with the IP you want to whitelist so your KoboldAI instance is secure.
To use KoboldAI as a backend for frontend systems like SillyTavern:
If you have yet downloaded a model, you may download one via the several options below, starting with Adventure Models.
The general idea here is that you should allocate all the layers you can on your GPU (GPU Layers) before resorting to CPU Layers. This will be a lot of trial and error but once you stop hitting CUDA_OUT_OF_MEMORY issues, you should be good to go.
Do not put any layers on disk.
Go back to SillyTavern, go to API Connections, select KoboldAI (not KoboldAI Horde) configure the link to your KoboldAI location and click Connect.
KoboldAI supports both automatic downloads and manual downloads. See either section for your use case.
At the time of writing, KoboldAI's automatic downloader only supports 16-bit HF (HuggingFace) models which it will automatically convert to 4-bit as it loads. If you wish to download a GPTQ (4-bit) model see Manual Download.
The same suggestions explained for the API applies here as well.
Download a model of your choosing or from LLM Models following this guide.