Recent years have seen marked developments in deep neural networks (DNNs) stemming from advances in hardware and increasingly large datasets. DNNs are now routinely used in domains including computer vision and language processing. At their core, DNNs rely heavily on multiply-accumulate (MAC) operations making them well-suited for the highly parallel computational abilities of GPUs. GPUs, however, are von Neumann in architecture and physically separate memory blocks from computational blocks. This exacts an unavoidable time and energy cost associated with data transport known as the von-Neumann bottleneck. While incremental advances in digital hardware accelerators mitigating the von Neumann bottleneck will continue, we explore the potentially game-changing advantages of non-von Neumann architectures that perform MAC operations within the memory.

© 2019 IEEE

PDF Article


You do not have subscription access to this journal. Citation lists with outbound citation links are available to subscribers only. You may subscribe either as an OSA member, or as an authorized user of your institution.

Contact your librarian or system administrator
Login to access OSA Member Subscription