OmniDRL: Energy-efficient Deep Reinforcement Learning Processor > HI systems

OmniDRL: Energy-efficient Deep Reinforcement Learning Processor

본문

Overview

We present an energy-efficient deep reinforcement learning (DRL) processor, OmniDRL, for DRL training on edge devices. Recently, the need for DRL training is growing due to the DRL's distint characteristics that can be adapted to each user. However, a massive amount of external and internal memory access limits the implementation of DRL training on resource-constrained platforms. OmniDRL proposes 4 key features that can reduce external memory access by compressing as much data as possible, and can reduce internal memory access by directly processing compressed data. A group-sparse training enables a high weight compression ratio for every DRL iteration. A group-sparse training core is proposed to fully take advantage of compressed weight from GST. An exponent mean delta encoding additionally compresses exponent of both weight and feature map. A world-first on-chip sparse-weight-transposer enables the DRL training process of compressed weight without off-chip transposer. As a result, OmniDRL is fabricated in 28 nm CMOS technology and occupies a 3.6x3.6 mm2 die area. It achieved 7.42 TFLOPS/W energy efficiency for training robot agent (Mujoco Halfcheetah, TD3), which is 2.4x higher than the previous state-of-the-art.

Implementation results

Performance comparison

Architecture

Features

- Group-sparse training

- Group-sparse training core

- Exponent-mean delta encoding

- Sparse weight transposer

Related Papers

- S.VLSI 2021

- HOTCHIPS 2021

- AICAS 2021

- AICAS 2022

RESEARCH

Semiconductor System Lab

Through this homepage, we would like to share our sweats, pains,
excitements and experiences with you.

HI SYSTEMS