In order to see if you can run any LLM model on your system, refer to the information below to see our recommendations for running LLMs on your system.
As of the time of writing, this list only covers systems running NVIDIA and AMD GPUs with ROCm support. If you are using a Intel GPU, Apple Silicon, or a AMD GPU that does not support ROCm, our recommendation is to either use KoboldCPP or use cloud hosting.
For AMD Users with ROCm support
If you have a AMD GPU that supports ROCm, you are limited to running YellowRoseCx's fork of KoboldCPP or using cloud hosting.
If you are on Windows 10/11, you can open Task Manager, go to
Performance
, select your NVIDIA GPU and see how much VRAM you have in total under Dedicated GPU memory.
To find how much VRAM you have, Press Win+R on your keyboard, type dxdiag
and hit Enter. You should see something similar to this screen below.
Click on Display 1 (or any display really) and you should see something similar to this screen below.
What you should pay attention is the Display Memory (VRAM)
part. If you put this MB total into Google and convert it to GB, you should get your total VRAM size (which is 11 GB for my case).
With this information, you can now go ahead and see our recommendations about what to use given your total amount of VRAM.
If your system uses an integrated GPU (typically Intel or AMD) or has a GPU less than 6GB VRAM, you can:
This depends on how much RAM you have on your system. See Task Manager to see how much RAM you have.
If your system has either more than 6GB VRAM but less than 10GB VRAM, you can either use the steps mentioned previously or you can:
exllama_HF
or lower parameter model).If your system has either more than 10GB VRAM but less than 16GB VRAM, you can either use the steps mentioned previously or you can:
exllama_HF
or lower parameter model)If your system has either 16GB VRAM or more, you can either use the steps mentioned previously or use the following options:
All minimums and formulas are all approximate. Minimum VRAM Required accounts for the Max Context Size the model supports. The formula we used to calculate this are as follows:
In which is the size of the model on your hard drive, and is the context multiplier from 1024 using this formula:
Lowering context size can reduce VRAM requirements but lowers the memory the AI can use to remember what you talked about to them.
Parameter Size | Min. VRAM Required |
---|---|
6B | ~17 GB |
7B | ~18 GB |
13B | ~26 GB |
Parameter Size | Min. VRAM Required |
---|---|
6B | ~9 GB |
7B | ~10 GB |
13B | ~20 GB |
Parameter Size | Min. VRAM Required |
---|---|
6B | ~5 GB |
7B | ~6 GB |
13B | ~10 GB |