You can use Rotary Positional Embedding (RoPE) to encode relative positions directly in attention scores, improving generalization to longer sequences.
Here is the code snippet below:

In the above code, we are using the following key points:
- 
Frequency-based angle generation for encoding relative positions 
- 
Element-wise rotation of input embeddings using sine and cosine 
- 
Applied only to a specific portion of the embedding (rotary dimension) 
Hence, rotary embeddings boost generative model robustness on longer context inputs.