<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Tokenization on Baam's Techlog</title><link>https://baampark.github.io/tags/tokenization/</link><description>Recent content in Tokenization on Baam's Techlog</description><generator>Hugo -- 0.128.0</generator><language>en-us</language><lastBuildDate>Tue, 08 Jul 2025 21:40:50 -0400</lastBuildDate><atom:link href="https://baampark.github.io/tags/tokenization/index.xml" rel="self" type="application/rss+xml"/><item><title>Why and When to Add New Special Tokens in LLMs and VLMs</title><link>https://baampark.github.io/posts/2025-07-08_special_token/</link><pubDate>Tue, 08 Jul 2025 21:40:50 -0400</pubDate><guid>https://baampark.github.io/posts/2025-07-08_special_token/</guid><description>A tokenizer converts natural language into a sequence of tokens. Among these tokens are special tokens, which are not regular words but serve specific functions for the model (e.g., &amp;lt;BOS&amp;gt; and &amp;lt;EOS&amp;gt;). While reviewing academic literature on LLMs and VLMs, I came across several studies that introduce new special tokens to enhance model capabilities. In this blog, we’ll explore what special tokens are in LLM tokenization and, more importantly, examine when and why researchers choose to add new special tokens.</description></item></channel></rss>