Posted on September 18, 2023
The OpenAI API is expensive. But did you know it is possible to get, a ChatGPT level AI model for free on your OWN computer?
Quantisation is a process that allows a very big language model (like GPT3.5) to be shrunk to fit on consumer grade hardware. This means that you can run the model for free on your very own computer.
The LLM quantisation technique means that the GPT technology is soon to be ubiquitous & (almost) free. Imagine a world when you can be offline and have an LLM on your phone!
If you are looking to get started with Quantisation here are 3 projects, with pros & cons, try it out yourself:
Pros: Use GGML if you cannot fit the model entirely on VRAM
Pros: Newest Framework, Ease of use
Pros: Fast, If you can fit the model entirely on the GPU using VRAM, GPTQ is faster