QLoRA: Efficient Finetuning of Quantized LLMs

Tim Dettmers , Artidoro Pagnoni , Ari Holtzman , Luke Zettlemoyer
0
We present QLoRA, an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit finetuning task performance...
2023-05-23 arXiv 4-bit NormalFloat (NF4) Efficient Finetuning of Quantized LLMs Low Rank Adapters (LoRA)