본문 바로가기
로그인

RESEARCH

Semiconductor System Lab

Through this homepage, we would like to share our sweats, pains,
excitements and experiences with you.

HI SYSTEMS 

Slim-Llama : LLM Processor with Binary/Ternary Weights for Billion-Par…

본문

Overview

Slim-Llama is an ASIC designed to address the high energy consumption in large language models due to external memory access. By using binary/ternary quantization and integrating a Sparsity-aware Look-up Table, Slim-Llama improves energy efficiency significantly. Output reuse scheme and index vector reordering enhance performance, achieving up to 4.59× better benchmark energy efficiency than previous state-of-the-art. It is the first ASIC to efficiently run billion-parameter Llama with 4.69mW.

Implementation results





Performance Comparison





Architecture





Features

  - Output Reuse in S-LUT based BMM Core (SBC)

  - Dual mode (LUT-mode & Buffer-mode) of sparsity-aware look-up table (S-LUT) and sparsity-aware workload allocation

  - Index vector reordering for bit transition reduction

Related Papers

  - ISSCC 2025

Address#1233, School of Electrical Engineering, KAIST, 291 Daehak-ro (373-1 Guseong-dong), Yuseong-gu, Daejeon 34141, Republic of Korea
Tel +82-42-350-8068 Fax +82-42-350-3410E-mail sslmaster@kaist.ac.kr·© SSL. All Rights Reserved.·Design by NSTAR