Recent years have seen marked developments in deep neural networks (DNNs) stemming from advances in hardware and increasingly large datasets. DNNs are now routinely used in domains including computer vision and language processing. At their core, DNNs rely heavily on multiply-accumulate (MAC) operations making them well-suited for the highly parallel computational abilities of GPUs. GPUs, however, are von Neumann in architecture and physically separate memory blocks from computational blocks. This exacts an unavoidable time and energy cost associated with data transport known as the von-Neumann bottleneck. While incremental advances in digital hardware accelerators mitigating the von Neumann bottleneck will continue, we explore the potentially game-changing advantages of non-von Neumann architectures that perform MAC operations within the memory.
© 2019 IEEEPDF Article