HI SYSTEMS
Neural Radiance Field (NeRF) is an emerging computer graphics task that is used for 3D modeling and rendering in the metaverse, providing a user-friendly and immersive experience. However, it has limitations to be accelerated on mobile AR/VR devices due to its memory-intensive hash encoding and extensive computational load. This paper presents NeuGPU to achieve NeRF-based instant 3D modeling and real-time rendering with 3 k…
A low-power and real-time 3D neural rendering processor, MetaVRain, is proposed with 3 key features: 1) a visual perception core for 1120× faster rendering, 2) the 1D and 2D hybrid neural engines for 3.7× higher throughput with 2.4× higher energy efficiency during DNN inference, and 3) a modulo based positional encoding unit to minimize HW realization of the sinusoidal function. It finally achieves a maximum of 118 FPS whil…
We present an eDRAM-based CIM processor called DynaPlasia with a novel triple-mode 3T2C cell and a dynamic reconfigurable core architecture that enables high system efficiency for ML workloads. For ResNet-18 (ImageNet dataset), DynaPlasia achieves system energy efficiency of 37.2TOPS/W and compute density of 2.03TOPS/mm2 at 1.0V and 250MHz for INT4/INT5 activation/weight precision
A low-latency and low-power dense RGB-D acquisition and 3D bounding-box extraction system-on-chip, DSPU, is proposed. The DSPU produces accurate dense RGB-D data through CNN-based monocular depth estimation and sensor fusion with a low-power ToF sensor. Furthermore, it performs a 3D point cloud-based neural network for 3D bounding-box extraction. The architecture of the DSPU accelerates the system by alleviating the data-in…
An effective and high-speed 3D point cloud-based neural network processing unit (PNNPU) is proposed using the block-based point processing. It has three key features: 1) page-based block memory management unit (PMMU) with linked list-based page table (LLPT) for on-chip memory footprint reduction, 2) hierarchical block-wise farthest point sampling (HFPS), and block skipping ball query (BSBQ) for fast and efficient point proc…
We present an energy-efficient deep reinforcement learning (DRL) processor, OmniDRL, for DRL training on edge devices. Recently, the need for DRL training is growing due to the DRL's distint characteristics that can be adapted to each user. However, a massive amount of external and internal memory access limits the implementation of DRL training on resource-constrained platforms. OmniDRL proposes 4 key features that can red…
The authors propose a heterogeneous floating-point (FP) computing architecture to maximize energy efficiency by seperately optimize exponent processing and mantissa processing. The proposed exponent-computing-in-memory (ECIM) architecture and mantissa-free-exponent-computing (MFEC) algorithm reduce the power consumption of both memory and FP MAC while resolving previous FP computing-in-memory processors' limitations. Also, …
This paper presents HNPU, which is an energy-efficient DNN training processor by adopting algorithm-hardward co-design. The HNPU supports stochastic dynamic fixed-point representation and layer-wise adaptive precision searching unit for low-bit-precision training. It additionally utilizes slice-level reconfigurability and sparsity to maximize its efficiency both in DNN inference and training. Adaptive-bandwidth reconfigurab…
Generative adversarial networks (GAN) have a wide range of applications, from image style transfer to synthetic voice generation [1]. GAN applications on mobile devices, such as face-to-Emoji conversion and super-resolution imaging, enable more engaging user interaction. As shown in Fig. 7.4.1, a GAN consists of 2 competing deep neural networks (DNN): a generator and a discriminator. The discriminator is trained, while the …
Recently, deep neural network (DNN) hardware accelerators have been reported for energy-efficient deep learning (DL) acceleration [1-6]. Most prior DNN inference accelerators are trained in the cloud using public datasets; parameters are then downloaded to implement AI [1-5]. However, local DNN learning with domain-specific and private data is required meet various user preferences on edge or mobile devices. Since edge and …
Recently, deep neural networks (DNNs) are actively used for object recognition, but also for action control, so that an autonomous system, such as the robot, can perform human-like behaviors and operations. Unlike recognition tasks, real-time operation is important in action control, and it is too slow to use remote learning on a server communicating through a network. New learning techniques, such as reinforcement learning…
Recently, 3D hand-gesture recognition (HGR) has become an important feature in smart mobile devices, such as head-mounted displays (HMDs) or smartphones for AR/VR applications. A 3D HGR system in Fig. 13.4.1 enables users to interact with virtual 3D objects using depth sensing and hand tracking. However, a previous 3D HGR system, such as Hololens [1], utilized a power consuming time-of-flight (ToF) depth sensor (>2W) lim…
Deep neural network (DNN) accelerators [1-3] have been proposed to accelerate deep learning algorithms from face recognition to emotion recognition in mobile or embedded environments [3]. However, most works accelerate only the convolutional layers (CLs) or fully-connected layers (FCLs), and different DNNs, such as those containing recurrent layers (RLs) (useful for emotion recognition) have not been supported in hardware. …
Recently, deep learning with convolutional neural networks (CNNs) and recurrent neural networks (RNNs) has become universal in all-around applications. CNNs are used to support vision recognition and processing, and RNNs are able to recognize time varying entities and to support generative models. Also, combining both CNNs and RNNs can recognize time varying visual entities, such as action and gesture, and to support image …
Recently, face recognition (FR) based on always-on CIS has been investigated for the next-generation UI/UX of wearable devices. A FR system, shown in Fig. 14.6.1, was developed as a life-cycle analyzer or a personal black box, constantly recording the people we meet, along with time and place information. In addition, FR with always-on capability can be used for user authentication for secure access to his or her smart phon…
Address#1233, School of Electrical Engineering, KAIST, 291 Daehak-ro (373-1 Guseong-dong), Yuseong-gu, Daejeon 34141, Republic of Korea Tel +82-42-350-8068 Fax +82-42-350-3410E-mail sslmaster@kaist.ac.kr·© SSL. All Rights Reserved.·Design by NSTAR