<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Transformer on Baam's Techlog</title><link>https://baampark.github.io/tags/transformer/</link><description>Recent content in Transformer on Baam's Techlog</description><generator>Hugo -- 0.128.0</generator><language>en-us</language><lastBuildDate>Mon, 27 Jan 2025 13:49:47 -0500</lastBuildDate><atom:link href="https://baampark.github.io/tags/transformer/index.xml" rel="self" type="application/rss+xml"/><item><title>How Transformers Handle Variable-length Sequnces</title><link>https://baampark.github.io/posts/2025-01-28_variable_sequence/</link><pubDate>Mon, 27 Jan 2025 13:49:47 -0500</pubDate><guid>https://baampark.github.io/posts/2025-01-28_variable_sequence/</guid><description>&amp;ldquo;Transformer models don&amp;rsquo;t require a fixed sequence length.&amp;rdquo; Since most of my projects revolve around computer vision, this was very confusing to me. In computer vision models, images are always preprocessed to a fixed size before being fed into deep learning models. Otherwise, you will encounter matrix multiplication error. In this post, we will learn how transofrmer handles variable-length sequnces.
Self-attention - Q, K, V Linear Projection into Embedding Space Let&amp;rsquo;s see basic CNN code example.</description></item></channel></rss>