The rapid development of the Internet of Things and artificial intelligence technology has put forward higher requirements for the real-time data processing capability and energy efficiency of the edge node computing platform. The non-volatile in-memory computing technology based on the new memory can realize the in-situ storage and calculation of data, and minimize the power consumption and delay overhead caused by data transfer, thereby greatly improving the data processing capability and performance ratio of edge devices. However, non-volatile in-memory computing still faces limitations in computing performance and energy efficiency due to non-ideal factors in the characteristics of the basic unit, parasitic effects in the array, and the hardware overhead of the analog-to-digital conversion circuit.
Focusing on the above key issues, the team of academician Liu Ming of the Institute of Microelectronics adopted a cross-level collaborative design method to propose a new RRAM in-memory computing structure with high parallelism and high performance ratio.
At the device level, the research team proposed a memory-computing array structure with a weighted two-transistor-memristor ( WH-2T1R ). The WH-2T1R structure uses core transistors to form a decoupled storage and calculation data path to reduce the impact of parasitic effects on the calculation current. Compared with the 1T1R structure, it only causes an additional 30.3% area overhead. The calculation unit utilizes the amplification characteristics of the second transistor sub-threshold region to increase the calculation switching ratio by 13.5 times while reducing the calculation current in the low-resistance state by 88% , thereby achieving a 63.4% reduction in the power consumption of the multiply-add operation. Thanks to the improvement of the calculation switch ratio, the RRAM memory calculation structure can support higher input parallelism and multi-bit multiplication and addition operations.
At the circuit level, the research team proposed a readout circuit for a reference current subtraction current-type sense amplifier. The reference current subtraction branch first subtracts the input current according to the last readout result and then sends it to the current mirror to read out the data. The reference current subtraction branch reduces the input current range of the current mirror in half, doubles the calculation current range supported by the RRAM storage and calculation structure, and can achieve higher input parallelism and multi-bit multiplication and addition, and achieve 79.5% of the power consumption of the readout circuit reduce. By further optimizing the current subtraction configuration of the current-type sensitive amplifier, the research team achieved a 5 -fold increase in the integral nonlinear error and a 3.75 -fold increase in the differential nonlinear error .
At the level of algorithm mapping, the research team proposed a high data redundancy ( MSB_RSM ) mapping strategy. The RRAM in-memory computing structure is equipped with multiple sets of arrays with different second transistor multiplier parameters and an additional set of redundant arrays. Wherein different second transistors are used to map different bits of the multi-bit weight value. Since the impact of RRAM and transistor non-ideal factors on the calculation current cannot be ignored, redundant arrays are used for additional mapping weights to compensate for non-ideal factors. After analyzing the compensation effects of different bits, MSB-RSM can reduce the 1σ error by 40% when operating on high-bit weights . Thanks to the more stable calculation current, the CIFAR-10 and CIFAR-100 tasks under the ResNet-18 model have achieved 0.96% and 2.83% accuracy improvements.
The above scheme has been verified on the embedded 28nm process independently developed by the team . The new RRAM in-memory computing structure supports highly parallel analog domain multiplication and addition operations. In the ResNet-18 task with 1 -bit input, 3 -bit weight, and 4- bit output The average energy efficiency reaches 30.34TOPS/W , and can be increased to 154.04TOPS/W by further optimizing the readout timing . This work provides a new idea for high-energy-efficiency, high-precision analog in-memory computing through the system design of units, circuits, and systems.