The OpenAI API is expensive. But did you know it is possible to get, a ChatGPT level AI model for free on your OWN computer?

Quantisation is a process that allows a very big language model (like GPT3.5) to be shrunk to fit on consumer grade hardware. This means that you can run the model for free on your very own computer.

The LLM quantisation technique means that the GPT technology is soon to be ubiquitous & (almost) free. Imagine a world when you can be offline and have an LLM on your phone!

If you are looking to get started with Quantisation here are 3 projects, with pros & cons, try it out yourself:

Technique: GGML

Pros: Use GGML if you cannot fit the model entirely on VRAM

Cons: Slow

Technique: Bitsandbytes

Pros: Newest Framework, Ease of use

Cons: Slowest

Technique: GPTQ

Pros: Fast, If you can fit the model entirely on the GPU using VRAM, GPTQ is faster

Cons: ?