Systems on Nanoscale Information fabriCs

To appear in GOMACTech Conference, March 12-15, 2018, Miami, Florida, USA

Systems on Nanoscale Information fabriCs Outcomes and Future Prospects

Naresh Shanbhag1, Yongjune Kim1, Andrew Singer1, Boris Murmann2, Naveen Verma3, Jan Rabaey4, and David Blaauw5

1University of Illinois at Urbana-Champaign, Urbana, IL, Email: {shanbhag, yongjune, acsinger}@illinois.edu 2Stanford University, Stanford, CA, Email: [email protected] 3Princeton University, Princeton, NJ, Email: [email protected]

4Univeristy of California at Berkeley, Berkeley, CA, Email: [email protected] 5Univeristy of Michigan, Ann Arbor, MI, Email: [email protected]

Abstract— This paper provides an overview of the SONIC Center’s research vision, outcomes, and future prospects. SONIC’s research focused on establishing a Shannon-inspired statistical foundation for computing platforms realized on nanoscale process technologies (both CMOS and beyond CMOS) to meet the needs of emerging data-centric learning-based applications. SONIC’s outcomes include architectural concepts of deep in-memory and deep in-sensor computing, the use of stochastic nanofunctions coupled with statistical models of computing to realize systems in beyond CMOS, and fundamental limits on energy-latency-accuracy of statistical computing systems and design principles to approach those limits. Numerous laboratory prototypes have validated these outcomes thereby demonstrating the hallmark of SONIC’s research – the systems-to-devices journey. These outcomes also set the stage for developing design methodologies to realize complex autonomous cognitive agents that learns to observe, infer, reason, create, explain, and act by engaging with its immediate environment, while operating at the limits of energy, latency, and accuracy trade-off by leveraging statistical computing design principles.

Keywords—Shannon-inspired computing, nanoscale devices, in-memory computing, in-sensor computing, nanofunctions.

I. INTRODUCTION SONIC was established in 2013 to take a fresh look at the design of computing platforms in deeply scaled CMOS and beyond CMOS nanoscale fabrics in order to address the unique needs of emerging applications. These applications (Fig. 1) tend to be learning-based, requiring the sensing, storage, transportation, and processing of massive volumes of data. The value of today’s computing platforms is judged primarily in terms of their ability to extract information and actionable intelligence from data rather than their raw processing capability as has been the case traditionally.

While today’s systems rely on the cloud in order to extract information from data captured by edge devices, SONIC chose to focus on embedding intelligence directly into the edge or the swarm. Edge devices that can process data locally avoid the high energy and latency costs involved in data transmission to and from the cloud, and alleviate concerns related to privacy and security.

Fig. 1. Emerging applications require information extraction from massive quantities of data.

Indeed, today’s machines have begun to approach and

exceed human performance in many complex inference tasks, e.g., AlexNet [1], ResNet [2] which outperformed humans in recognition tasks, and Google DeepMind’s AlphaGo [3], which beat human champion Sedol Lee in the ancient game of Go.

Despite these well-acclaimed successes, machines have a lot of catching up to do when energy costs are accounted for. It is estimated that the computing platforms employed to realize these wins (1202 CPUs and 176 GPUs for AlphaGo) consume about four-orders-of-magnitude higher energy as compared to the human brain. Therefore, the fundamental question facing us today is – How to design intelligent machines that can proactively interpret and learn from data, solve unfamiliar problems using what it has learned, and operate with the energy efficiency of the human brain?

SONIC’s research was motivated by the realization that present day computing platforms based on the von Neumann architecture will not be able to address the applications challenge shown in Fig. 1. The reasons are multifold: (a) the unique nature of modern-day cognitive applications/tasks, which emphasize exploration-, learning-, and association-based computations based on statistical representations and processing of massive volumes of data vs. conventional data processing; (b) the traditional von Neumann architecture [4], its associated Turing model of computation [5], with its requirement of a deterministic, Boolean- switch-based, semiconductor process


technology that has become hard to realize because of (c) the realities of semiconductor feature-size scaling (Fig. 2(a)), e.g., diminishing energy-delay benefits via CMOS scaling, and increased stochasticity (variations and failures) (Fig. 2(b)), which squarely oppose the deterministic requirements of today’s von Neumann-style computing. While many beyond CMOS devices have been invented, none surpass the silicon MOSFET in terms of the metrics for a deterministic switch. Additionally, the data-centric nature of emerging applications places an undue burden on the memory-processor interface aggravating the well-known ‘memory wall’ problem (Fig. 2(c)) in von Neumann architectures.

(a)

(b)

(c)

Fig. 2. Challenges facing present day computing platforms include: (a) diminishing energy-delay benefits from CMOS scaling [6], (b) stochastic nanoscale RRAM behavior [7], and (c) aggravating the ‘memory wall’ (data movement) problem [8] via the use of the von Neumann architecture.

II. SONIC VISION AND APPROACH

Fig. 3. The SONIC vision.

SONIC’s vision (Fig. 3) was based on the understanding that information and nanoscale stochasticity (noise) needs to be, and can be, addressed within a common statistical framework in order to be able to design information processing systems in nanoscale technologies that can approach fundamental limits on energy efficiency, latency and accuracy. The reason being that both information and noise are statistical quantities and therefore are in some sense ‘compatible’, and because the feasibility and benefits of such a unified view have already been demonstrated in the area of communications through the work of Claude Shannon [9] (Fig. 4). SONIC’s vision stands in contrast with traditional von Neumann computing, where nanoscale noise and computational errors are viewed as problems needing to be heavily suppressed to present a deterministic fabric for computing platforms. This suppression is the root cause of the unfavorable energy-latency-accuracy trade-offs inherent in von Neumann computing.

Fig. 4. Shannon theory for communications.

Shannon’s 1948 paper established information as a statistical quantity and laid out a complete theory of communications (Fig. 4). A key contribution was to introduce a simple yet pragmatic abstraction of the problem of communicating information over a noisy channel. This problem abstraction enabled Shannon to define channel capacity as a function of noise statistics, and obtain a surprising result that reliable, i.e., probability of error approaching zero, communication can be achieved if the information transmission rate is less than the channel capacity. Prior to this work, it was commonly assumed that the only way

42 March/April 2017

THE END OF MOORE’S LAW

Figure 1 shows a very important exponential trend in information technology that already shows a sharp slowing of progress. In 1988, Rolf Landauer5 published some remarkable data on energy dissipation in computing that had been collected over many years by his IBM colleague Robert Keyes. From the 1940s through the 1980s, a span that includes the replace-ment of vacuum tubes by bipolar transistors, the in-vention of the integrated circuit, and the early stages of the replacement of bipolar transistors by field-effect transistors (FETs), the typical energy dissipated in a digital switching event dropped exponentially by over 10 orders of magnitude. We replotted that data in Figure 1 along with Landauer’s extrapolation of the trend toward switching energies on the order of the thermal fluctuation energy, kT, evaluated at T = 300 K. Landauer was well aware that the switching energy would not approach kT around 2015, not with the es-tablished complementary metal-oxide-semiconductor (CMOS) device and circuit technology. His extrapo-lation was a way of highlighting the possibility of, and perhaps the need for, a new way of computing.

Some have mistaken the kT per switching event as a fundamental lower bound on the energy con-sumption of digital computation. Landauer knew

that it isn’t. His 1988 publication reviewed funda-mental research demonstrating the possibility of an energy-conserving form of computation. As in today’s circuits, the devices in energy-conserving circuits would store enough energy—many times kT—to reliably distinguish the digital state from the inevi-table thermal noise. For good engineering reasons, today’s circuits dissipate that stored energy every time a device is switched. In contrast, energy- conserving circuits would dissipate only a small frac-tion of the stored energy in each switching event. In such circuits, there’s no fundamental lower bound on the energy efficiency of digital computation.

Although no commercially viable energy- conserving computing systems emerged in the 1990s or later, digital quantum computing, still in its infancy, exemplifies the energy-conserving approach. To show what did happen in the commercial sector af-ter 1988, we added data to Figure 1 showing switch-ing energies for minimum channel width CMOS FET technologies based on technical publications from IBM and Intel. For a while, switching energy continued to drop rapidly, as IBM led the industry in rapidly reducing operating voltage. Roughly fol-lowing the elegant scaling rules laid out by Robert Dennard and colleagues,6 each successive genera-tion of smaller, lower voltage, lower power devices was also faster. Increasingly potent CMOS technol-ogy extended the long run of exponential increases in microprocessor clock frequency7—a key mea-sure of computing performance—that had begun with the Intel 4004 in 1972. And the ever smaller, ever cheaper transistors enabled rapid elaboration of computer architecture. For example, the intro-duction in the 1990s of sophisticated approaches to instruction-level parallelism (superscalar architectures) further multiplied the system-level performance gains from increasing clock speed. By the late 1990s, CMOS had displaced the more power-hungry bi-polar transistor from its last remaining applications in high-performance computing. Despite these tri-umphs, the historic rate of reduction in switching energy couldn’t be maintained through the 1990s as the FET approached some fundamental constraints to its further development.

Physical Constraints on Continued Miniaturization Note that this slowing of a highly desirable ex-ponential trend in information technology—the break in slope so evident in Figure 1—has noth-ing to do with the approach of switching energy toward the thermal fluctuation energy. Even in

Figure 1. Minimum switching energy dissipation in logic devices used in computing systems as a function of time. Black diamonds replicate data from Rolf Landauer,5 and the dashed line is Landauer’s 1988 extrapolation of the historic trend toward kT (evaluated at T = 300 K), indicated by the dotted line. Triangles and Xs are published values from IBM and Intel, respectively, compiled by Chi-Shuen Lee and Jieying Luo at Stanford University during their PhD thesis research with one of the authors (Wong). Open squares are values from the 2013 International Technology Roadmap for Semiconductors (ITRS). The data is available at https://purl.stanford.edu/gc095kp2609. Current, up-to-date data can be accessed at https://nano.stanford.edu/cmos-technology-scaling-trend.

1E-21

1E-19

1E-17

1E-15

1E-13

1E-11

1E-09

1E-07

1E-05

1E-03

1940 1960 1980 2000 2020 2040

Switc

hing

ene

rgy

(J)

Year

1988 extrapolation

Landauer 1988kT (300)IBMIntelITRS

Rapid V reductionSlower V reduction

Clock freq. plateaus

0.7 0.8 0.9 1.0 1.1

102

103

104

Pulse Amplitude (V)

Puls

e W

idth

(ns)

0.000

0.2500

0.5000

0.7500

1.000

0.7 0.8 0.9 1.0 1.1

102

103

104

Pulse Amplitude (V)

Puls

e W

idth

(ns)

0.000

0.2500

0.5000

0.7500

1.000

0%

100%

50%

25%

75%SET Probability

DISTRIBUTION STATEMENT A. Approved for public release: distribution is unlimited.


to approach zero error probability was via a large (potentially infinite) number of repeated transmission of the same information bit (repetition codes) which forced the information rate (rate at which new bits are transmitted) also to approach zero. Shannon also showed that error control codes exist that approach channel capacity. Indeed, the discovery of turbo codes [10] and the re-discovery of low-density parity-check (LDPC) codes [11], [12] in the 1990s have all but eliminated the gap to Shannon capacity for point-to-point links. In this manner, Shannon theory completely revolutionized the area of communications.

SONIC’s research agenda was designed to similarly revolutionize the area of computing, particularly for learning-based systems realized on nanoscale fabrics. The emergence of data-centric learning-based applications in conjunction with ‘noise’ in the nanoscale makes SONIC’s research highly relevant at the present time and for the foreseeable future.

(a)

(b)

Fig. 5. The SONIC approach is based on: (a) Shannon-inspired and brain-inspired principles, and is referred to as (b) statistical information processing.

SONIC’s research leveraged ideas from communications

(Shannon-inspired) and brain (neuro-inspired/principled) at the system level (Fig. 5(a)), translating the powerful insights offered by these domains into realizable architectures and circuits, followed by experimental demonstrations (design, fabrication, and test) of system prototypes in both scaled CMOS and beyond CMOS fabrics. The principle underlying the SONIC approach was to employ statistical models of computation to compose reliable and energy-efficient systems using stochastic components. This principle was shared with what might be arguably two existing information processing systems that operate at the limits of energy-latency-accuracy: a communication receiver operating over a stochastic channel and the human brain that is composed of stochastically behaving neurons and synapses. SONIC’s approach, referred to as statistical information processing or statistical computing, employed information-based metrics such as mutual information for system design and evaluation and to fully

explore the trade-offs between energy-efficiency, latency, and robustness (Fig. 5(b)). An important goal in SONIC’s research endeavors was to develop principles of statistical information processing and demonstrate their effectiveness in reducing energy, enhancing robustness and increasing functional density, via experimental realizations.

The SONIC’s systems-driven and top-down approach enabled a principled embrace of component stochasticity. To develop a Shannon-inspired foundation for computing, SONIC brought together a diverse group of researchers from neuroscience, information and coding theory, machine learning, microarchitecture, integrated circuit design, and devices. SONIC’s vertically (from applications/systems through architectures and circuits down to nanoscale devices) integrated research organization was critical for developing a statistical foundation for computing in the nanoscale era, and now has become a de facto standard for all major semiconductor research programs including Joint University Microelectronics Program (JUMP), Nanoelectronic COmputing REsearch (nCore), and DARPA’s Electronics Resurgence Initiative (ERI). This research organization is described next.

Fig. 6. The SONIC Center’s vertically integrated structure.

SONIC has three system-driven research themes (Themes 1,

2, and 3) firmly anchored in a circuits/devices theme (Theme 4). Theme 1 focused on developing Shannon-inspired design principles and fundamental limits for realizing statistical information processing systems using stochastic components. Theme 1 outcomes guided the practical realizations of systems in the rest of the SONIC themes. Theme 2 employed Theme 1 provided design principles to realize analog mixed-signal systems and circuits for inference applications. Theme 3 developed neuro-inspired approaches for the design of statistical information processing systems on nanoscale fabrics encompassing computation, communication, and both discrete and mixed-signal representations. Thus, all three systems themes drove towards the development of statistical foundations for information processing. Theme 4 was focused on experimental demonstrations of Shannon-inspired nanofunctions and systems in both CMOS and beyond CMOS fabrics. Theme 4 demonstrated that the Shannon/neuro-inspired


system design approaches from Themes 1, 2, and 3, enabled deeply-scaled device/circuit fabrics to be deployed effectively in a manner not feasible today under the von Neumann model.

In the final year (YR5) of its tenure, SONIC’s Shannon and brain-inspired approach to computing was consolidated into a set of four broad outcomes – deep in-memory computing, deep in-sensor computing, systems-in-beyond CMOS, and fundamental limits/design principles.

III. SONIC OUTCOMES

A. Deep In-Memory Computing SONIC’s deep in-memory architecture (DIMA) [13], [14]

directly attacks the memory wall problem intrinsic to the von Neumann architecture. DIMA (Fig. 7) does so by reading functions of data stored in multiple rows, and processing them via analog computations embedded in a pitch-matched manner in close proximity to SRAM bit-cell array (BCA). DIMA has four sequential processing stages: 1) multi-row functional read (FR): fetches a function of bits stored in multiple rows per read cycle) from each column, 2) BL processing (BLP): performs word-level arithmetic operations, via column pitch-matched analog processors in parallel, 3) cross BL processing (CBLP): aggregates multiple BLP outputs via charge-sharing to obtain a scalar output, 4) ADC and residual digital: generates the digital output (decision) from the previous analog computations.

Fig. 7. The deep in-memory architecture (DIMA).

By substantially eliminating an explicit processor-memory interface, DIMA simultaneously enhances energy, delay, and latency. In doing so, it trades-off the circuit signal-to-noise ratio (SNR) making it an ideal fabric for demonstrating Shannon-inspired concepts. Additionally, it turns out that DIMA is well-matched to the data-flow of machine learning and cognitive applications which are based heavily on projection-based computations. However, DIMA has multiple design challenges primarily due the need for analog computations under severe area constraints imposed by the BCA pitch. The SONIC research team successfully overcame these design challenges and demonstrated several prototypes. SONIC’s DIMA integrated circuit (IC) prototypes [15]–[19] showed energy savings ranging from 10× to 113×; throughput gains of up to 6× and energy-delay product (EDP) reduction ranging from 52× to 162× over custom von Neumann architectures. These gains were obtained prior to the application of design

optimizations in DIMA, i.e., these designs have room for improvement.

B. Deep In-Sensor Computing

Fig. 8. The deep in-sensor architecture (DISA).

Deep in-sensor computing aims to overcome the energy, scalability, and fidelity limitations that arise by separating the sensing functionality from feature-extraction and inference functions. This is because the resulting large volume data transfer from sensor to processor dominates the energy and latency costs. Deep in-sensor computing extracts information by embedding computing functions into the sensory substrate by realizing those via devices native to those substrates and which tend to be slow and unreliable.

SONIC researchers developed a range of technologies for diverse, distributed, and form-fitting sensors. For example, SONIC’s system-level principles and demonstrations based on large-area electronics for image sensing using boosted embedded weak classifiers and embedded random projection-based compression, and electroencephalogram (EEG) acquisition and processing via compressed-domain transformations [20], demonstration of a photoplethysmo-graphic (PPG) acquisition via a sensing-and-processing epidermal-electronics patch [21], design and test of a nano-power chip designed for potentiometric ion sensing consumes only 5.5 nW for full functionality, which is by far the lowest power reported by any wireless ion sensing system [22], design and test of a 12.5 Gb/s (over 5 m) 130 GHz link achieving an energy efficiency of 7.8 pJ/bit and a link efficiency of 1.55 pJ/bit/m, which is 40× improvement over the current state-of-the-art [23]. In addition, the team demonstrated an ultrasound front-end that compresses the image signals within the sensor interface, leading to a 30-50× data reduction [28].

C. Systems in-beyond CMOS SONIC significantly accelerated the deployment of beyond

CMOS and CMOS nanoscale fabrics via the concept of nanofunctions. Nanofunctions are pervasively employed computational kernels in the inference application space that are sufficiently simple in structure (can be realized with a <10-20 devices) and yet functionally complex to enable useful systems research by enabling the composition of complex inference systems. Nanofunctions bridge the devices-to-systems gap in the SONIC way. SONIC researchers successfully developed the following set of nanofunctions: graphene dot-product [24], RRAM nano-oscillators for Oscillatory Neural Networks (ONNs) [25], graphene mux-based logic [26], Magnetic Tunnel Junction (MTJ)-based random number generators [27], which form a key component in deep neural networks, hyper-dimensional (HD) computing, and spin channel networks for inference applications. SONIC researchers also created

13

It’s the interfaces, stupid…

LAELarge number

of sensors

CMOS• Instrumentation • Computation

Featureextraction Inference

DataFeaturevector

Decision

Sensors

Embedded TFT weak classifiers[W. Rieutort-Louis, ISSCC’15]


behavioral models of these nanofunctions and shared them with systems researchers who developed methods for designing statistical information processing systems using nanofunctions.

D. Fundamental Limits and Design Principles SONIC explored the fundamental limits and design

principles for Shannon-inspired statistical information processing. SONIC researchers developed systems models of stochastic nanofunctions in CMOS and beyond CMOS, obtained bounds and limits on information processing capacity, and developed capacity achieving design principles. SONIC researchers developed a number of principles for statistical information processing and demonstrated their broad applicability for guiding the design of systems. SONIC’s design principles took a system-level approach to statistical information processing systems, focusing on application-relevant metrics that expose statistical dependencies in sensory inputs data.

Specifically, SONIC researchers (1) leveraged ML algorithmic approaches that can exploit the relationships for efficiency and robustness to nanoscale upsets and non-idealities; (2) developed architectural principles that provide robust building blocks for systematizing design of reliable information kernels from unreliable nanofabrics; and (3) developed fundamental limits of performance, scaling laws, and performance predictors to provide insight into the “adjacent possible” leveraging today’s nanofunctional blocks.

IV. CONCLUSIONS & FUTURE PROSPECTS SONIC’s Shannon-inspired statistical computing research is

a complete and principled approach to non von Neumann computing, encompassing principles of statistical information processing at the system level, mapped to architectures and circuits, followed up by experimental demonstrations of system prototypes in both scaled CMOS and beyond CMOS fabrics. SONIC’s research has leveraged information-based metrics for design and evaluation, which enabled the exploration of trade-offs between energy-efficiency, robustness, and application-level accuracy.

SONIC’s research outcomes already had the following immediate impact:

• CMOS deep in-memory/in-sensor inference architectures are ready for deployment today, and are already called for in major semiconductor research programs.

• Statistical models of computation enabling systems in beyond CMOS have provided an alternative path for device scaling, one in which the device error rate is coupled with statistical approaches for error compensation.

• Fundamental limits, metrics, and design principles have provided a compelling framework for the principled design of AI systems in silicon and beyond.

SONIC’s research outcomes have set the stage for developing design methodologies to compose complex autonomous cognitive agents, which are sensor-rich autonomous entities, highly constrained in volume, energy, and computational resources, e.g., an unmanned aerial vehicle or untethered robot that learns to observe, infer, reason, create,

explain, and act by engaging with its immediate environment, including meaningful interactions with humans. In this way, SONIC research will contribute to a society which today stands at the cusp of a Machine Revolution that is expected to lead to a future in which cognitive machines—i.e., machines that think like humans—will collaborate with humans and augment their capabilities. Furthermore, SONIC’s multidisciplinary and systems-to-devices research agenda, in addition to establishing a statistical foundation for computing in the nanoscale era, has also set a framework for conducting advanced research in semiconductor-based computing systems for the foreseeable future.

SONIC’s impact also extends into the future of semiconductor research. Its vertically integrated research (systems research grounded in experimental prototypes) has become the standard for conducting advanced semiconductor research. JUMP sought to establish four (out of six) vertically integrated centers. Key SONIC concepts such as Shannon-inspired computing, algorithmic noise-tolerance, and in-memory architectures were called out as key areas for future research. DARPA’s ERI similarly has a vertically integrated structure geared to translate outcomes from STARnet and JUMP. Furthermore, nCORE also embraces the SONIC philosophy of providing systems pull on device research.

REFERENCES [1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification

with deep convolutional neural networks,” in Proc. Adv. Neural Inf. Process. Syst. (NIPS), Dec. 2012, pp. 1097–1105.

[2] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vision Pattern Recognition (CVPR), Jun. 2016, pp. 770–778.

[3] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot et al., “Mastering the game of Go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, Jan. 2016.

[4] J. von Neumann, “First draft of a report on the EDVAC,” Tech. Rep., Jun. 1945.

[5] A. M. Turing, “On computable numbers, with an application to the Entscheidungsproblem,” Proc. London Math. Soc., vol. 2, no. 1, pp. 230–265, Nov. 1937.

[6] H.-S. P. Wong, C.-S. Lee, J. Luo, “CMOS Technology Scaling Trend,” https://nano.stanford.edu/cmos-technology-scaling-trend.

[7] H. Li, T. F. Wu, S. Mitra and H. S. P. Wong, "Resistive RAM-centric computing: Design and modeling methodology," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 64, no. 9, pp. 2263–2273, Sep. 2017.

[8] W. A. Wulf and S. A. McKee, “Hitting the memory wall: Implications of the obvious,” SIGARCH Comput. Archit. News, vol. 23, no. 1, pp. 20–24, Mar. 1995.

[9] C. Shannon, “A mathematical theory of communication,” Bell Syst. Tech. J., vol. 27, no. 3, pp. 379–423, Jul. 1948.

[10] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon limit error-correcting coding and decoding: Turbo-codes,” in Proc. IEEE Int. Conf. Commun. (ICC), May 1993, pp. 1064–1070.

[11] R. Gallager, “Low-density parity-check codes,” IRE Trans. Inf. Theory, vol. 8, no. 1, pp. 21–28, Jan. 1962.

[12] D. J. C. MacKay, “Good error-correcting codes based on very sparse matrices,” IEEE Trans. Inf. Theory, vol. 45, no. 2, pp. 399–431, Mar. 1999.

[13] M. Kang, M.-S. Keel, N. R. Shanbhag, S. Eilert, and K. Curewitz, “An energy-efficient VLSI architecture for pattern recognition via deep embedding of computation in SRAM,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), May 2014, pp. 8326–8330.

To appear in GOMACTech Conference, March 12-15, 2018, Miami, Florida, USA [14] N. Shanbhag, M. Kang, and M.-S. Keel, “Compute memory,” U.S. Patent

9 697 877, Feb. 5, 2015. [15] M. Kang, S. K. Gonugondla, and N. R. Shanbhag, “A 19.4 nJ/decision

364K decisions/s in-memory random forest classifier in 6T SRAM array,” in Proc. European Solid-State Circuits Conf. (ESSCIRC), Sep. 2017, pp. 263–266.

[16] M. Kang, S. K. Gonugondla, A. D. Patil, and N. R. Shanbhag, “A multifunctional in-memory inference processor using a standard 6T SRAM array,” IEEE J. Solid-State Circuits, 2018, accepted.

[17] S. K. Gonugondla, M. Kang, and N. R. Shanbhag, “A 42pJ/decision 3.12 TOPS/W robust in-memory machine learning classifier with onchip training,” in Proc. IEEE Int. Solid-State Circuits Conf. Dig. Tech. Pap. (ISSCC), 2018, accepted.

[18] J. Zhang, Z. Wang, and N. Verma, “In-memory computation of a machine-learning classifier in a standard 6T SRAM array,” IEEE J. Solid-State Circuits, vol. 52, no. 4, pp. 915–924, Apr. 2017.

[19] D. Bankman, L. Yang, B. Moons, M. Verhelst, and B. Murmann, “An always-on 3.8𝜇J/86% CIFAR-10 mixed-signal binary CNN processor with all memory on chip in 28nm CMOS,” in Proc. IEEE Int. Solid-State Circuits Conf. Dig. Tech. Pap. (ISSCC), 2018, accepted.

[20] M. Shoaib, K. H. Lee, N. K. Jha, and N. Verma, “A 0.6–107 𝜇W energy-scalable processor for directly analyzing compressively-sensed EEG,” IEEE Trans. Circuits Syst. I, vol. 61, no. 4, pp. 1105–1118, Apr. 2014.

[21] B. Xu, A. Akhtar, Y. Liu, H. Chen, W.-H. Yeo, S. I. Park, B. Boyce, H. Kim, J. Yu, H.-Y. Lai, S. Jung, Y. Zhou, J. Kim, S. Cho, Y. Huang, T. Bretl, and J. A. Rogers, “An epidermal stimulation and sensing platform for sensorimotor prosthetic control, management of lower back exertion, and electrical muscle activation,” Adv. Mater., vol. 28, no. 22, pp. 4462–4471, Jun. 2016.

[22] H. Wang, X. Wang, J. Park, A. Barfidokht, J. Wang, and P. P. Mercier, “A 5.5 nW battery-powered wireless ion sensing system,” in Proc. European Solid-State Circuits Conf. (ESSCIRC), Sep. 2017, pp. 364–367.

[23] N. Dolatsha, B. Grave, M. Sawaby, C. Chen, A. Babveyh, S. Kananian, A. Bisognin, C. Luxey, F. Gianesello, J. Costa, C. Fernandes, and A. Arbabian, “A compact 130GHz fully packaged point-to-point wireless system with 3D-printed 26dBi lens antenna achieving 12.5Gb/s at 1.55pJ/b/m,” in Proc. IEEE Int. Solid-State Circuits Conf. Dig. Tech. Pap. (ISSCC), Feb. 2017.

[24] N. C. Wang, S. K. Gonugondla, I. Nahlus, N. R. Shanbhag, and E. Pop, “GDOT: A graphene-based nanofunction for dot-product computation,” in Proc. IEEE Symp. on VLSI Tech. (VLSIT), Jun. 2016, pp. 1–2.

[25] T. C. Jackson, A. A. Sharma, J. A. Bain, J. A. Weldon, and L. Pileggi, “Oscillatory neural networks based on TMO nano-oscillators and multilevel RRAM cells,” IEEE Trans. Emerg. Sel. Topics Circuits Syst., vol. 5, no. 2, pp. 230–241, Jun. 2015.

[26] M. Darwish, V. Calayir, L. Pileggi, and J. A. Weldon, “Ultracompact graphene multigate variable resistor for neuromorphic computing,” IEEE Trans. Nanotechnol., vol. 15, no. 2, pp. 318–327, Mar. 2016.

[27] S. Bhuin, J. Sweeney, S. Pagliarini, A. K. Biswas, and L. Pileggi, “A self-calibrating sense amplifier for a true random number generator using hybrid FinFET-straintronic MTJ,” in Proc. IEEE/ACM Int. Symp. Nanoscale Archit. (NANOARCH), Jul. 2017, pp. 147–152.

[28] J. Spaulding, Y.C. Eldar, and B. Murmann, “A sub-nyquist analog front-end with subarray beamforming for ultrasound imaging,” in Proc. IEEE Int. Ultrasonics Symp. (IUS), Oct. 2015, pp. 1–4.

Documents

Systems on Nanoscale Information fabriCs