The GPT Quantum Realm – Getting LLMs on Laptops
Posted on September 18, 2023
The OpenAI API is expensive. But did you know it is possible to get, a ChatGPT level AI model for free on your OWN computer?
Quantisation is a process that allows a very big language model (like GPT3.5) to be shrunk to fit on consumer grade hardware. This means that you can run the model for free on your very own computer.
The LLM quantisation technique means that the GPT technology is soon to be ubiquitous & (almost) free. Imagine a world when you can be offline and have an LLM on your phone!
If you are looking to get started with Quantisation here are 3 projects, with pros & cons, try it out yourself:
Technique: GGML
Pros: Use GGML if you cannot fit the model entirely on VRAM
Cons: Slow
Technique: Bitsandbytes
Pros: Newest Framework, Ease of use
Cons: Slowest
Technique: GPTQ
Pros: Fast, If you can fit the model entirely on the GPU using VRAM, GPTQ is faster
Cons: ?