Slim-Llama : LLM Processor with Binary/Ternary Weights for Billion-Parameter Llama Model > HI systems

Slim-Llama : LLM Processor with Binary/Ternary Weights for Billion-Par…

본문

Overview

Slim-Llama is an ASIC designed to address the high energy consumption in large language models due to external memory access. By using binary/ternary quantization and integrating a Sparsity-aware Look-up Table, Slim-Llama improves energy efficiency significantly. Output reuse scheme and index vector reordering enhance performance, achieving up to 4.59× better benchmark energy efficiency than previous state-of-the-art. It is the first ASIC to efficiently run billion-parameter Llama with 4.69mW.

Implementation results

Performance Comparison

Architecture

Features

- Output Reuse in S-LUT based BMM Core (SBC)

- Dual mode (LUT-mode & Buffer-mode) of sparsity-aware look-up table (S-LUT) and sparsity-aware workload allocation

- Index vector reordering for bit transition reduction

Related Papers

- ISSCC 2025

RESEARCH

Semiconductor System Lab

Through this homepage, we would like to share our sweats, pains,
excitements and experiences with you.

HI SYSTEMS