We present a proposal of a partial reconfiguration architecture for optically reconfigurable gate arrays and present an 11,424 gate dynamic optically reconfigurable gate array VLSI chip that was fabricated on a chip using an three-metal complementary metal oxide semiconductor process technology. The fabricated VLSI chip achieved a partial reconfiguration.
© 2010 Optical Society of America
Currently, almost all computer systems use reduced instruction set computer (RISC) architectures [1, 2]. Such architectures offer benefits in terms of higher clock frequency, smaller implementation area, and lower power consumption than conventional complex instruction set computer (CISC) architectures [3, 4]. Their success is based on the simple principle that the simplest circuit works at the highest clock frequency, in the smallest implementation area, and with the lowest power consumption because the simplest circuit can be constructed with fewer selector passes, less load capacitance of fewer gates, and less capacitance of short metal wires. This principle is also applicable to programmable devices. The ultimately simplest processor type is a single instruction set computer. Only those necessary for a single clock cycle can be implemented onto the programmable device if clock-by-clock reconfiguration for a programmable device is possible. Such a processor can operate at the highest clock frequency, with the lowest power consumption, and in the smallest implementation area. Moreover, the smallest implementation area enables large parallel computation when using the same implementation area as that provided by conventional processors. Consequently, the overall performance can be increased dramatically. However, under such dynamic implementation, partial reconfigurations occur at every location of a gate array because the configuration time and lifetime of a circuit on the gate array mutually differ. Therefore, high-speed dynamic partial reconfigu ration capability is extremely important for next- generation programmable devices.
Field programmable gate arrays (FPGAs) are widely used for various applications [5, 6, 7]. Moreover, an FPGA with optical communication functions has been developed . However, since FPGA reconfiguration requires more than a few hundred milli seconds, FPGAs are unsuitable as dynamically reconfigurable devices.
Therefore, to realize such dynamic high-speed reconfiguration, optically reconfigurable gate arrays (ORGAs) have been developed [9, 10, 11, 12, 13, 14, 15, 16, 17, 18]. An ORGA consists of a holographic memory, a laser array, and an optically reconfigurable gate array VLSI. Circuit information or configuration contexts can be stored on a holographic memory, from which they can be addressed using a laser array. Finally, they can be optically programmed onto an ORGA. The ORGA architecture realizes high-speed reconfiguration since the bandwidth of an optical bus between a holographic memory and a programmable gate array VLSI is extremely large. In addition, numerous reconfiguration contexts can be realized since storage capacity of a three-dimensional holographic memory is larger than that of silicon memories.
The first proposed optical programmable gate array (OPGA) has demonstrated 50–100 reconfiguration contexts and reconfiguration time [9, 10, 11]. In another demonstration of an optically differential reconfigurable gate array (ODRGA), the reconfiguration time was improved to the maximum using a differential reconfiguration strategy . Furthermore, an ODRGA has achieved 100 reconfiguration contexts, just as OPGAs have . However, such demonstrations were executed using small-gate-count VLSI chips. For example, the first proposed OPGA used an 80 gate VLSI chip; the ODRGA used a 68 gate VLSI chip. Of course, a practical 51,272 gate-count ORGA-VLSI chip has been reported . However, that paper only shows a layout of the 51,272 gate-count ORGA-VLSI chip using a standard complementary metal oxide semiconductor (CMOS) process technology. The demonstration described in that report was executed using a 68 gate-count ORGA-VLSI chip instead of the reported design. Therefore, to date, fabrication of such a high-gate-count ORGA-VLSI chip has never been reported.
Moreover, to adopt ORGA for practical applications, in addition to realizing a high-gate-count VLSI, a partial reconfiguration capability must be implemented onto the ORGA-VLSI. To date, ODRGAs with a bit-by-bit reconfigurable architecture have been proposed [12, 16]. However, that architecture prohibits realization of a high-gate-count VLSI since the configuration circuit requires a large implementation area . The largest designed ODRGA-VLSI chip has only 272 gates. Therefore, to realize a high-gate-count ORGA-VLSI, a dynamic optically reconfigurable gate array (DORGA) architecture was proposed [15, 18]. It perfectly removes static configuration memory to store a context. It uses the photodiodes’ junction capacitance as dynamic con figuration memory. The above 51,272 gate-count ORGA-VLSI chip also takes the architecture . However, such a DORGA-VLSI with a partial reconfiguration architecture has not been reported to date, to the best of our knowledge.
This paper, therefore, presents a proposal of a partial reconfiguration architecture for ORGAs and presents an 11,424 gate DORGA-VLSI chip fabricated on a chip using a three- metal CMOS process technology. The fabricated VLSI chip achieved a partial reconfiguration.
2. DORGA Architecture with a Partial Reconfiguration Capability
In an FPGA, each programming point of a gate array is connected to a bit of a configuration static random access memory. In contrast, in an ORGA, each programming point is connected to a single bit con figuration circuit including a static configuration memory and a photodiode. Moreover, in DORGA architecture, photodiodes are used not only for detecting optical configuration contexts, but also for maintaining the state of a programmable gate array. Therefore, in DORGA architecture, the static configuration memory can be removed perfectly so that the architecture accommodates a high-gate-count VLSI. In addition, this paper presents a proposal of a partial block-by-block reconfiguration indication circuit, as shown in Fig. 1. For each logic block, each switching matrix, and each input/output (I/O) block, one partial block-by-block reconfiguration indication circuit including a block photodiode (BPD) is added. The timing diagram is presented in Fig. 2. The junction capacitance of all photodiodes inside the block including the BPD are charged by activating a negative logic refresh (nREF) internal signal if the BPD detects illumination after the junction capacitance of the BPD is charged by activating the negative logic block refresh (nBREF) signal. Subsequently, the nBREF signal is reactivated to negate the nREF signal. Then a configuration context is applied from a holographic memory and stored on configuration photodiodes; it is then provided to programming points of a programmable gate array. When the BPD cannot detect illumination, the previous state of the block can be retained. Consequently, by adding the block reconfiguration indication circuit, each logic block, each switching matrix, and each I/O block can be reconfigured individually. To date, a 36 con figuration context DORGA architecture has been demonstrated . Always, the number of configuration contexts on an ORGA depends only on the storage capacity of its holographic memory; it never depends on the architecture. Therefore, since a 100 reconfiguration context implementation has already been demonstrated and because no limitation exists for the increase , high-speed dynamic partial reconfiguration processes based on many configuration contexts can be executed on this architecture. Although current commercial FPGAs can also support partial reconfiguration, the reconfiguration time is limited to the millisecond order because of a serial transfer [5, 6, 7]. High-speed dynamic partial reconfiguration processes cannot be executed on an FPGA.
3. 11,424 Gate DORGA-VLSI
A new 11,424 gate-count DORGA-VLSI chip was designed and fabricated using a standard CMOS process technology as shown in Fig. 3. Optically programmable selectors, an optically pro grammable lookup table (LUT), and an optically programmable transmission gate that includes a photodiode cell were designed as custom cells having the same height as a standard cell. The gate array design was synthesized by combining such custom cells and standard cells provided by Rohm Co. Ltd., and using a logic synthesis tool (Design Compiler, Synopsys Inc.). Then, a place and route for the synthesized gate array design was executed using Apollo (Synopsys Inc.). Finally, the DORGA-VLSI was fabricated at Rohm’s manufacturing facility. The specifications are presented in Table 1. Voltages of the core and I/O cells were designed identically using . Photodiodes were constructed between an diffusion layer and a P-substrate. The junction area of a photodiode was designed as . The photodiode cells are arranged at horizontal intervals and at vertical intervals. This design incorporates 37,856 photodiodes. The average aperture ratio of the overall VLSI is 4.24%. The top metal layer was used for guarding transistors from light irradiation; the other two layers were used for wiring.
The gate array of the DORGA-VLSI uses an island style. The basic functionality of a gate array is fundamentally identical to that of currently available FPGAs. In all, 336 optically reconfigurable logic blocks (ORLBs), 360 optically reconfigurable switching matrices (ORSMs), and eight optically reconfigurable I/O blocks (ORIOBs), which include four programmable I/O bits, were implemented in the gate array. The ORLBs, ORSMs, and ORIOBs are, respectively, programmable block-by-block through 59, 49, and 49 optical connections. For each block, a partial block-by-block reconfiguration indication circuit was implemented. The number of the partial block-by-block reconfiguration indication circuits is 704 cells. Because the additional partial reconfiguration circuit size for a block cell is estimated as , the percentage of the area of the partial reconfiguration architecture to the entire size of the DORGA-VLSI chip is estimated as less than 0.254%. Compared with an ORGA without the partial reconfiguration capability, the area increase according to the partial reconfiguration architecture is very small.
3A. Optically Reconfigurable Logic Block
The block diagram and CAD layout of an optically reconfigurable logic block is presented in Fig. 4. Each optically reconfigurable logic block consists of two four-input–one-output LUTs, ten multiplexers, eight tristate buffers, and two delay-type flip-flops with a reset function. The input signals from the wiring channel, which are applied through some switching matrices and wiring channels from ORIOBs, are transferred to LUTs through eight multiplexers. The LUTs are used for implementing Boolean functions. The outputs of an LUT and of a delay-type flip-flop connected to the LUT are connected to a multiplexer. A combinational circuit and sequential circuit can be chosen by changing the multiplexer, as in FPGAs. Finally, outputs of the multiplexers are connected to the wiring channel again through eight tristate buffers. As a result, four-input–one-output LUTs, multiplexers, tristate buffers, and a partial block-by-block reconfiguration indication circuit, respectively, have 16 photodiodes, two photodiodes, one photodiode, and one photodiode. In all, 59 photodiodes are used for programming an ORLB. The ORLB can be reconfigured perfectly in parallel. The cell size is . Such an ORLB design is based on a standard cell design, except for the custom designs used for the transmission gate cells and photodiode cells.
3B. Optically Reconfigurable Switching Matrix
The block diagram and CAD layout of the ORSM is portrayed in Fig. 5. Its basic construction is the same as that used by Xilinx Inc. Four-directional switching matrices with 48 transmission gates were implemented in the gate array. Each transmission gate can be considered as a bidirectional switch. A photodiode is connected to each transmission gate; it controls whether the transmission gate is closed or not. Based on that capability, four-direction switching matrices including a partial block-by-block reconfiguration indication circuit can be programmed as 49 optical connections. The cell size is . Such an ORSM was designed using custom cells of photodiode cells and transmission gate cells, except for some buffers.
4. Holographic Memory Generation and Development Tool
4A. Calculation Method of a Holographic Memory
Here, a thin holographic medium is introduced. An aperture plane of target lasers, a holographic plane, and a DORGA-VLSI plane are parallelized. The laser beam is assumed as a collimated beam. The reference wave from the laser propagates into the holographic plane. The holographic medium comprises rectangular pixels of size on the holographic plane. The pixels are assumed as analog values. On the other hand, the input object comprises rectangular pixels of size on the object plane. The pixels can be modulated to be either on or off. The intensity distribution of a holographic me dium is calculable using the following equation:
4B. Tool Set and a Circuit Implementation Flow
Because the gate array of the DORGA-VLSI is fundamentally identical to that of currently available FPGAs, the same tool set as that for FPGAs is necessary for programming the DORGA-VLSI. They are a logic synthesis tool and a place and route tool. In addition to them, a holographic memory calculation tool and alignment monitoring software are necessary for a DORGA’s implementation. To date, such a logic synthesis tool and place and route tool used for ORGAs have been developed, respectively, based on Design Compiler and Apollo (Synopsys Inc.) . However, the tool’s target was the other first prototype ORGA. The fitting work is very hard work. In this work, a logic design, placement for logic blocks, and a route design for switching matrices were done manually. Such a manual design outputs a binary configuration context. Here, the holographic memory calculation tool is presented in Fig. 6. The binary configuration context pattern is arranged to a two- dimensional configuration context pattern, as shown in Fig. 6a inside the holographic memory converter tool. After this, the holographic memory calculation tool calculates the corresponding holographic memory pattern, as shown in Fig. 6b, by using Eqs. (1, 2). Moreover, the holographic memory calculation tool can plot the simulation result of a configuration context pattern generated from the calculated holographic memory pattern, as shown in Fig. 6c. The holographic memory pattern was programmed onto a holographic memory material. This is the circuit implementation flow. Moreover, a DORGA system package always requires high-accuracy alignment. After fabrication, a certain alignment check must be executed. Therefore, alignment monitoring software has been developed as presented in Fig. 7. The DORGA-VLSI chip has nine adjustment-purpose photodiodes at each corner. When the alignment confirmation process is executed, such an adjustment-purpose photodiode and/or photodiodes in I/O blocks are monitored using a test holographic memory pattern. The alignment error can be detected by sensing the light intensities received on the target photodiodes. In this experiment, by using three adjustable stages, positions of components were adjusted manually while the alignment monitoring software is used.
5. Experimental Results and Discussion
5A. Retention Time Measurement of Photodiode Memory Architecture
The fabricated DORGA-VLSI adopts photodiode memory architecture, which uses junction capacitances of photodiodes as dynamic configuration memory. Therefore, the state of a gate array cannot be maintained indefinitely. The demonstration pres ented in this section yields the result that the retention time of the fabricated DORGA-VLSI is sufficiently long to execute dynamic reconfiguration.
5A1. Optical System Setup
As depicted in Fig. 8, a holographic configuration system was constructed using a liquid crystal spatial light modulator (LC-SLM) as a holographic memory and a , He–Ne laser as a light source. The laser beam was collimated and was expanded to a diameter. The expanded beam is incident to the holographic memory on the LC-SLM. The laser power per unit area is . The LC-SLM used in this experiment is a projection television panel (L3P07X-31G0; Seiko Epson Corp.), which is a twisted nematic device with a thin film transistor. The panel consists of pixels, each having a size of . The LC-SLM is connected to an evaluation board (L3B07X-E10A; Seiko Epson Corp.); the video input of the board is connected to the external display terminal of a personal computer. Programming for the LC-SLM is executed by displaying a holographic memory pattern with 256 gradation levels on the personal computer display. The DORGA-VLSI was placed distant from the LC-SLM.
5A2. Experimental Results
First, the holographic configuration was confirmed using a holographic memory pattern of an AND circuit portrayed in Fig. 9a. The optical configuration context pattern at the DORGA-VLSI position, which was recorded using a CCD camera, is portrayed in Fig. 9b. The contrast ratio of light intensities of bright bits to dark area on the configuration context pattern was estimated as . Although background light reduces the retention time of the DORGA-VLSI photodiode memory architecture, holographic memory can generate a good contrast configuration context. Results show that the configuration procedure was successful. In addition, the retention time of the DORGA-VLSI was measured as when the light intensity of the background per unit area was . The retention time is much longer than that of current dynamic random access memories (DRAMs). The photodiode memory architecture is extremely useful as an ORGA architecture.
5B. High-Speed Reconfiguration
5B1. Optical System Setup
This optical system is the same as that of the previous system, except for the inclusion of a laser source (Fig. 10). This system used a laser (torus 532; Laser Quantum) as a light source. The laser power was about . The diameter beam from the laser source is expanded by 5 times to using two lenses with 50 and focal lengths.
5B2. Experimental Results
The holographic memory pattern of a NOR circuit was calculated using the method explained above. Figure 11a presents the holographic memory pattern of the NOR circuit. Figure 11b shows a CCD-captured context image reconstructed from a holographic memory of Fig. 11a displayed on the LC-SLM. The holographic memory generation result is used for later experiments. A configuration of NOR circuit implementation experiments was executed. The product of the photodiode response time and laser power for each photodiode was measured as . The reconfiguration period of the NOR circuit was measured as , meaning that the reconfiguration period of this DORGA architecture is as short as that of holographic configurations of conventional ORGAs. The partial reconfiguration indication overhead was measured as less than . Since the partial reconfiguration overhead is less than the reconfiguration period, the partial reconfiguration overhead is negligible if a previous configuration procedure generates the block indication. In even the worst case without a previous indication, partial reconfiguration is possible. Therefore, the world’s largest DORGA-VLSI was demonstrated as a useful device that presents no disadvantages compared to a static configuration technique: that of ORGAs.
5C. Implementation of a Ring Oscillator and a Full-Adder Circuit
To estimate the combinational circuit performance of a gate array on the fabricated DORGA-VLSI chip, a three-stage ring oscillator was programmed. The block diagram of the three-stage ring oscillator is shown in Fig. 12a. The three-stage ring oscillator is constructed using three inverters on LUTs of ORLBs and transmission gates on ORSMs. For the programming, a holographic memory pattern including a three-stage ring oscillator circuit was calculated, as shown in Fig. 12b. The holographic memory pattern was programmed on an LC-SLM. A CCD- captured configuration context pattern at the DORGA-VLSI position, as generated from the LC-SLM, is portrayed in Fig. 12c. Such an optical configuration context was programmed onto a DORGA-VLSI. The frequency of oscillation of the ring oscillator is . Results confirmed that an operation implemented on a single LUT can be executed at a clock frequency higher than .
Additionally, a single bit full-adder circuit was implemented onto two LUTs. The holographic memory pattern and CCD-captured configuration contexts are presented in Fig. 13. The performance was measured as . However, in the case of commercially available XC4000XL-3 (Xilinx Inc.), which uses a similar CMOS process , the maximum delay of the implementation was presented as . Therefore, although the performance of commercial FPGAs is superior to that of the DORGA-VLSI, the difference is based only on design methods: commercial FPGAs are designed as full-custom VLSI; the DORGA-VLSI’s logic block was designed using a standard cell-based design. If a DORGA-VLSI is designed as a full-custom VLSI and takes the logic block with the same structure, then the performance can be improved to the same level.
5D. Power Consumption Estimation
The power consumption of the DORGA architecture is mainly categorized to the power consumption of the reconfiguration and circuit executions on its gate array. Moreover, the power consumption of the reconfiguration is categorized to optical power consumption and VLSI power consumption. Therefore, the total power can be estimated using the following equation:
This paper has presented the world’s largest 11,424 gate-count DORGA-VLSI. The fabricated VLSI chip executed a partial reconfiguration. Furthermore, at that time, the retention time of the DORGA- VLSI was measured as . That retention time is much longer than that of current DRAMs. The DORGA-VLSI chip is the first practical VLSI chip to support dynamic reconfiguration.
This research is supported by the Ministry of Internal Affairs and Communications of Japan under the Strategic Information and Communications R & D Promotion Programme (SCOPE). This research was also supported by the Ministry of Education, Science, Sports and Culture under Grant-in-Aid for Scientific Research on Innovative Areas No. 20200027. The VLSI chip in this study was fabricated in the chip fabrication program of VLSI Design and Education Center (VDEC), the University of Tokyo, in collaboration with Rohm Co. Ltd. and Toppan Printing Co. Ltd.
1. Y. Pizhou and L. Chaodong, “A RISC CPU IP core,” in International Conference on Anti-Counterfeiting, Security and Identification (IEEE, 2008), pp. 356–359. [CrossRef]
2. J. Goodacre and A. N. Sloss, “Parallelism and the ARM instruction set architecture,” Computer 38, 42–50 (2005). [CrossRef]
3. T. Jamil, “RISC versus CISC,” IEEE Potentials 14, 13–16 (1995). [CrossRef]
4. D. B. Tolley, “Analysis of CISC versus RISC microprocessors for FDDI network interfaces,” in Conference on Local Computer Networks (IEEE, 1991), pp. 485–493.
5. , “Altera devices,” http://www.altera.com.
6. Xilinx Inc., “Xilinx product data sheets,” http://www.xilinx.com.
7. Lattice Semiconductor Corporation, “Lattice ECP and EC family data sheet” (2005), http://www.latticesemi.co.jp/products.
8. P. Mal, P. D. Patel, and F. R. Beyette, “Design and demonstration of a fully integrated multi-technology FPGA: a reconfigurable architecture for photonic and other multi-technology applications,” IEEE Trans. Circuits Syst. I 56, 1182–1191 (2009). [CrossRef]
9. J. Mumbru, G. Panotopoulos, D. Psaltis, X. An, F. Mok, S. Ay, S. Barna, and E. Fossum, “Optically programmable gate array,” Proc. SPIE 4089, 763–771 (2000). [CrossRef]
10. J. Mumbru, G. Zhou, X. An, W. Liu, G. Panotopoulos, F. Mok, and D. Psaltis, “Optical memory for computing and information processing,” Proc. SPIE 3804, 14–24 (1999). [CrossRef]
11. J. Mumbru, G. Zhou, S. Ay, X. An, G. Panotopoulos, F. Mok, and D. Psaltis, “Optically reconfigurable processors,” in 1999 Euro-American Workshop on Optoelectronic Information Processing, Critical Review Vol. 74 (SPIE, 1999), 265–288.
12. M. Nakajima and M. Watanabe, “A four-context optically differential reconfigurable gate array,” J. Lightwave Technol. 27, 4460–4470 (2009). [CrossRef]
13. M. Nakajima and M. Watanabe, “A 100-context optically reconfigurable gate array,” in IEEE International Symposium on Circuits and Systems (IEEE, 2010), pp. 2884–2887.
14. M. Nakajima and M. Watanabe, “36-context dynamic optically reconfigurable gate array,” in IEEE International Symposium on System Integration (IEEE, 2009), pp. 19–23. [CrossRef]
15. M. Watanabe and F. Kobayashi, “Dynamic optically reconfigurable gate array,” Jpn. J. Appl. Phys. 45, 3510–3515 (2006). [CrossRef]
16. M. Miyano, M. Watanabe, and F. Kobayashi, “Optically differential reconfigurable gate array,” Electron. Comput. Jpn. Part II 90, 132–139 (2007). [CrossRef]
17. M. Watanabe, T. Shiki, and F. Kobayashi, “Scaling prospect of optically differential reconfigurable gate array VLSIs,” Analog Integr. Circ. Sig. Process. 60, 137–143, (2009). [CrossRef]
18. D. Seto and M. Watanabe, “A dynamic optically reconfigurable gate array—perfect emulation,” IEEE J. Quantum Electron. 44, 493–500 (2008). [CrossRef]
19. M. Watanabe and F. Kobayashi, “A logic synthesis and place and route environment for ORGAs,” in International Conference on Engineering of Reconfigurable Systems and Algorithms (2006), pp. 237–238.