Please keep in mind that Vast.ai operates as a paid service. Before you dive into the guide, you might want to add some funds to your account. Even just $5.
For this section, we will be using Pygmalion 13B (4-bit) and a RTX 3060. Your choice of GPU to rent will be dictated by the model requirements listed here.
Create an account on Vast.ai, then use the template linked here!
On the left, use the slide to allocate at least 50GB of disk space.
1. From the Any GPU dropdown menu, go ahead and pick RTX 3060.
2. Now search for a host with 1x RTX 3060 with 12GB of VRAM.
3. Click on Rent.
The price listed below for each GPU will fluctuate over time. Do not use the price in the image below as a expected cost. Choose the least cost GPU to rent available or use another GPU option.
On the left panel, click on Instances.
Now wait until the Open button appear and click on it.
This may take anywhere from 10 to 30 minutes, depending on the internet connection of the provider you have selected.
Once you do that, the TextGen WebUI interface will pop up in your browser. Look up at the upper left corner and click on the Model tab.
The panel to download the model of your choice is on the right. With 12GB of VRAM, you can load any 13B model with 4-bit quantinization or a smaller one. The choice is up to you.
Example:
notstoic/pygmalion-13b-4bit-128g
Once the model is download, look at the top right panel.
If you are not using a LLaMA model, use Transformers instead.
Go back to Instances on Vast and copy the IP address.
For this example, this would be 65.130.193.108
Go into your SillyTavern tab and follow these steps.
Congratulations! Now you can now chat with your favorite character freely with no problems. Do be mindful to end your Vast.ai session when you stop using it or else it will continue to use up all your credits.