How Transformers Handle Variable-length Sequnces
“Transformer models don’t require a fixed sequence length.” Since most of my projects revolve around computer vision, this was very confusing to me. In computer vision models, images are always preprocessed to a fixed size before being fed into deep learning models. Otherwise, you will encounter matrix multiplication error. In this post, we will learn how transofrmer handles variable-length sequnces. Self-attention - Q, K, V Linear Projection into Embedding Space Let’s see basic CNN code example....