# A Low Power Reconfigurable I/O DRAM Macro with Single Bit line Writing Scheme

Jeonghoon Kook and Hoi-Jun Yoo Dept. of EECS, Korea Advanced Institute of Science and Technology 373-1, Kusong-dong, Yusong-ku, Taejon, Korea email : jhkook@eeinfo.kaist.ac.kr

# Abstract

A novel bit line control scheme, Single Bit line Writing Scheme, is proposed for low power DRAM. The suggested control scheme, which is applied to the folded bit line and shared sense amplifier structure, suppresses the voltage swing of unnecessary bit line. By this scheme, the power consumption occurred during bit line sensing is reduced by 22% with negligible area penalty. In order to provide I/O reconfigurability, a flexible I/O scheme is also proposed. From the widest I/O configuration, using an I/O decoder and a 4-to-1 multiplexer, I/O width can be reduced 1/4 of currently used I/O width. Low power DRAM Macros are designed for a frame buffer with 0.18µm embedded DRAM technology.

#### 1. Introduction

Recently, many mobile applications, such as cellular phones and the portable PC, are growing rapidly. In addition, multimedia processing has become an essential function in such mobile applications.

Embedded DRAM technology is well known to have an advantage in multimedia processing like 3D CG and MPEG decoding, because of its high data bandwidth and relatively low power consumption [1, 2]. For achieving low power consumption, several techniques have been proposed such as sequential block activation [1], partial segment activation [3], low voltage swing in I/O [4] and so on.

Embedded DRAM Macro can satisfy the bandwidth requirement of several high-end applications through its intrinsic wide I/O. The system bandwidth requirements, however, have to meet that of the bandwidth supported by DRAM Macro. Therefore, a DRAM Macro with I/O reconfigurability is required in the ASIC environment.

In this paper, a new bit line control scheme called Single Bit line Writing (SBW) and a flexible I/O scheme (FIS) are proposed for low power and optimal bandwidth operation. By using SBW scheme, the power dissipation during bit line sensing is reduced by 22%.

SBW has another advantage in implementation because it only needs small modification from conventional bit line structure and does not need additional transistors in the cell array area. Besides, as the number of activated columns increases, the amount of power reduction increases accordingly. With the compatibility with other low-power techniques, SBW proves to be a very efficient low power scheme.

FIS is very useful for low-end applications that do not need wide I/O. DRAM Macro cell core is optimised for bandwidth matching between system requirements and I/O configuration of DRAM Macro. Without changing cell array size, FIS can be applied to wide I/O DRAM Macro to reduce proper I/O configuration for low-end applications using an I/O decoder and a 4-to-1 multiplexer.

Using a  $0.18\mu m$  embedded DRAM technology, a DRAM Macro of 6Mb with 12 banks has been designed for a 3D rendering engine and 2Mb with 1 bank for a MPEG-4 decoder.

#### 2. Single Bit line Writing (SBW) Scheme

In the conventional DRAM, the folded bit line architecture is prevalently used in order to cancel the noise coupled by word line activation. In such architecture, the small voltage difference developed by cell data is amplified to full Vcc level by differential sense amplifiers (SA). Before that, the bit line pair must be precharged to half Vcc level. After sensing operation has finished, the amplified data is written to the memory cell because of the destructive read operation. Whenever bit line sensing occurs, a considerable amount of power is consumed in the bit line precharge or discharge operation because bit line pairs are swinging from half Vcc to full Vcc. Figure 1 shows the folded bit line cell array structure. A regularity of cells is observed in the array. This is because vertically adjacent cells share a bit line contact to obtain small cell area. All cells activated by a word line are connected to either BL or /BL. The bit line connected to the cells is defined as a *real bit line* and the bit line not connected to cells is defined as a *dummy bit line*. The sense amplifier does not need to amplify the voltage of the dummy bit line because all the dummy bit lines have to do is to provide the reference for the differential sensing with SA.

In the conventional scheme, SA develops both BL (real) and /BL (dummy) to full Vcc level during every read or write operation. However, the relatively large capacitance of the bit line causes unnecessary power consumption.



Fig 1. Folded bit line cell array structure

In this paper, an intelligent bit line control scheme called Single Bit line Writing (SBW) is proposed. This scheme is to suppress the transition of dummy bit line during writing period and to use dummy bit line only as a reference in differential sensing. In the read operation, a word line is activated to transfer cell data to a real bit line and then SA is isolated from the bit line pair. Then, SA is enabled and differential sensing begins. Because the heavy load capacitance of the bit line is isolated from SA, the differential nodes of SA develop to full Vcc level very fast. To rewrite data, only the real bit line is connected to SA. Figure 2 shows the SBW scheme in the read operation.

In the write operation, when SA is isolated from bit line, new data from I/O are amplified at SA nodes. Then, the data are written only through the real bit line.



Fig 2. Read operation with SBW (a) Charge sharing (b) Differential sensing (c) Rewriting

#### 3. Flexible I/O scheme (FIS)

DRAM Macro requires reconfigurability not only for a fast time to market but also for the bandwidth matching between ASIC system requirements and fill frequency of the DRAM Macro. But changing I/O configuration often requires changing the size of the cell array. Its optimised core operation is difficult because the physical parameters are changed as core size varies.

In this paper, a FIS scheme is proposed. It provides I/O reconfigurability not by changing the cell core but by reducing the widest I/O width. Figure 3 shows the structure of flexible I/O. The maximum I/O numbers are 1/4 of a full page. If a smaller I/O configuration is required, an I/O decoder and a 4-to-1 multiplexer can be cascaded up to three stages under I/O SA. Whenever an I/O reduction stage is added, 2b-address is added to the column address field too. I/O decoder not only selects a selected I/O SA but also suppresses the activation of unselected I/O SAs by IOSA enable signals for saving power dissipation in unused I/O lines as shown figure 4.





# Fig 3. 1Mb dual macro module with (widest, 1/16, 1/64) I/O configuration

reduction by 1/16

# 4. Implementation of SBW

SBW can be easily implemented by adding a control signal to the conventional shared SA structure [8]. There are two bit line isolation gates to selectively connect SA to the upper or lower cell array. Those isolation gates can be used for the control of SBW as shown in Figure 5. Therefore, there is no area penalty in the cell core region.

For each row address, real bit line and dummy bit line are previously determined during the design phase since the cell array has good regularity (Figure 1).

Two least significant bits (LSB) of the row address are sufficient to determine the real bit line and dummy because the regularity in the cell array is found every two word lines. Figure 6 shows the timing relationship of the bit line isolation signals.

The rising edge of the SA enable signal generates a short negative-going-pulse for the real bit line control signal (BIS\_0) and turns off the dummy bit line control signal (BIS\_1). BIS\_1 is later turned on when SA is disabled. The bit line pair is then precharged.

In the read operation, the rewriting period of SBW becomes shorter than that of the conventional scheme (Figure 6). However, the differential nodes of SA develop to full Vcc level very fast. As a result, the timing margin of rewriting the data to the cell is almost not affected, which is the same situation as in reference [3, 4].



Fig 5. Bit line isolation gate (a) conventional (b) SBW



Fig 6. Timing relationship of SBW control signals

# 5. Simulation Results

Figure 7, 8 show simulated bit line waveforms of the conventional scheme and SBW scheme, respectively. From the SPICE simulation results, the power consumption of SBW is observed to be reduced by 22% per column, as shown in Figure 9.

When the SBW scheme is used, the power dissipation of the periphery and the sub-array control is slightly increased because of additional SBW control. However, as the number of columns increases, the reduction effect becomes larger.

The simulation is performed in a 8kb segment (4 WLs and 4k BLs) at Vcc=2.2V, Vpp=3.8V, Temp 85°C.



Fig 7. Conventional bit line waveforms





Fig 9. Comparison of the power consumption

# 6. DRAM Macro Description

Using a 0.18µm 6-metal embedded DRAM technology, DRAM Macro has been designed. The memory capacity is customized by cascading 512Kb segments for single macro modules and 1Mb for dual macro modules. The control circuits in the center are shared by dual macro module located at each side of the control circuit. A 512Kb segment has 1024 bit line pairs, 512 word lines and 256 global I/O line pairs. Up to 64 segments can be cascaded and they can be divided up to 16 banks with additional bank control logic, if needed. For a smaller I/O configuration, FIS is used. For low power consumption, I/O decoders disable unnecessary

I/O SAs as mentioned before. A 1Mb dual module layout is shown in Figure 10.



Fig 10. CAD plot of 1Mb dual macro module

## 7. Conclusions

A novel bit line control scheme (SBW) has been proposed for low power DRAM Macro. It reduces the power consumption of bit line sensing by 22% compared to the conventional scheme. SBW can be easily implemented by adding a control signal to the conventional bit line structure without area penalty.

A flexible I/O scheme (FIS) is also proposed. It provides DRAM Macro with the capability of matching the bandwidth of I/O to specific applications when they do not need full I/O bandwidth. An I/O decoder and a 4to-1 multiplexer are used to reduce the I/O width by 1/4. They can be cascaded up to 3 stages under the I/O SA.

With the above schemes, two low power DRAM Macros are designed with 0.18µm Embedded DRAM technology. One is a 6Mb, 256-I/O frame buffer for a 3D rendering engine and the other is a 2Mb, 128-I/O frame buffer for a MPEG-4 decoder.

**Table 1. DRAM Macro features** 

| organization               | Single Macro Module                                                                                   | Dual Macro Module         |
|----------------------------|-------------------------------------------------------------------------------------------------------|---------------------------|
| granularity                | 512kb<br>(512 WLs & 2k BLs)                                                                           | 1Mb<br>(512 WLs & 4k BLs) |
| number of I/O              | 4, 16, 64, 256                                                                                        | 8, 32, 128, 512           |
| maximum capacity           | 32Mb                                                                                                  | 64Mb                      |
| maximum number of segments | 64                                                                                                    |                           |
| maximum number of<br>banks | 16                                                                                                    |                           |
| read latency               | 1, 2, 3                                                                                               |                           |
| technology                 | 0.18 mm, Embedde DRAM technology, 6-metal                                                             |                           |
| cell size                  | stack, 0.44 x 0.82 mm <sup>2</sup>                                                                    |                           |
| macro size                 | 4.21 x 1.52 mm <sup>2</sup> @ 6Mb 12 bank 256 I/O<br>0.73 x 2.70 mm <sup>2</sup> @ 2Mb 1 bank 128 I/O |                           |

# 8. Acknowledgment

The authors express appreciation to W. Lee, S. J. Lee, C. W. Yoon and other members of the project who have provided valuable assistance.

#### 9. References

[1] Park, Y. H. et al., "A 7.1GB/s Low-Power 3D Rendering Engine in 2D Array-Embedded Memory Logic CMOS", *Dig. of Technical Paper, International Solid-State Circuits Conference*, pp.242-243, 2000.

[2] Nishikawa, T. et al., "A 60MHz 240mW MPEG-4 Video-Phone LSI with 16Mb Embedded DRAM", *Dig. of Technical Paper, International Solid-State Circuits Conference*, pp.230-231, 2000.

[3] Sugibayashi, T. et al., "A 30ns 256Mb DRAM with Multi-Divided Array Structure", *Dig. of Technical Paper, International Solid-State Circuits Conference*, pp.50-51, 1993.

[4] Aimoto, Y. et al., "Design of 1024-I/Os 3.84 GB/s High Bandwidth 600mW Low Power 16Mb DRAM Macros for Parallel Image Processing RAM", *IEICE Trans. Electron., pp. 759-767*, May 1998

[5] Kim, J. S. et al., "A Low Noise Folded Bit-line Sensing Architecture for Multi-Gb DRAM with Ultra High Density  $6F^2$  Cell", *Proceedings of the 23^{rd} European Solid-State Circuits Conference*, pp. 192-195, 1997

[6] Kim, J. S. et al., "A Low Noise Folded Bit-line Sensing Architecture for Multi-Gb DRAM with Ultra High Density 6F<sup>2</sup> Cell", *IEEE Journal of Solid State Circuits, vol.33, no.7,* pp.1096-1102, July 1998

[7] Watanabe, T. et al., "A Modular Architecture for a 6.4GB/s, 8Mb DRAM-integrated Media Chip", *IEEE Journal of Solid State Circuits, vol.32, no.5,* pp.636-641, May 1997

[8] Miyamoto, H. et al., "A Fast 256k x 4 CMOS DRAM with a Distributed Sense and Unique Restore Circuit", *IEEE Journal of Solid State Circuits, vol.22, no.5,* Oct 1987