In order to see if you can run any LLM model on your system, refer to the information below to see our recommendations for running LLMs on your system.
As of the time of writing, this list only covers systems running NVIDIA and AMD GPUs with ROCm support. If you are using a Intel GPU, Apple Silicon, or a AMD GPU that does not support ROCm, our recommendation is to either use KoboldCPP or use cloud hosting.
For AMD Users with ROCm support
If you have a AMD GPU that supports ROCm, you are limited to running YellowRoseCx's fork of KoboldCPP or using cloud hosting.
Open Task Manager, go to Performance
, select your GPU and see Dedicated GPU memory.
dxdiag
and hit Enter. You should see something similar to this screen below.This guide is for NVIDIA GPUs only.
Install nvtop onto your system and run it (using nvtop
). You should see how much VRAM you have to the top-right of the nvtop session.
The bigger the better(usually). Use this calculator to figure out which models you can run. It's not perfectly accurate but it gives you a good idea of what you can run.
When picking which quant to use, you should generally aim to run as big of a quant as your hardware allows, but Q5_K_M / 6.0bpw is the sweet spot where you get both speed and performance. Q4_K_M / 5.0bpw is a good alternative if Q5_K_M / 6.0bpw is too slow. Going below IQ4KS / 4.0bpw is not really recommended because the quality loss increases exponentially the smaller the quant is. Bigger models (>30B) are more forgiving when it comes to small quants, allowing you to get good outputs even at Q3_K_M / 3.75bpw.
Running models on 6 or less GB of VRAM is rough and you'll be limited to ≤8B models with really low quants which will heavily impact the quality. You have a few choices:
This depends on how much RAM you have on your system. See Task Manager to see how much RAM you have.
If your system has 8GB of VRAM, it starts becoming possible to run models locally comfortably. It's recommended to:
12GB VRAM allows you to run medium models very comfortably and is usually the sweet spot when it comes to how much VRAM you want Same as before, it's recommended to: