You can quantize activations in a Transformer using QLoRA by enabling bnb_4bit_use_double_quant and setting quantization parameters during model loading.
Here is the code snippet you can refer to:

In the above code we are using the following key strategies:
-
Enables double quantization for activation compression.
-
Uses NF4 (normal float 4-bit) for better activation representation.
-
Specifies target modules for fine-grained QLoRA insertion.
-
Compatible with Hugging Face + bitsandbytes quantization backend.
Hence, activation quantization in QLoRA is achieved by configuring quantization-aware loading parameters that efficiently compress intermediate representations in Transformers.