## Abstract

A heterodyne interferometer for highly sensitive vibration measurements in the range 100kHz − 1.3GHz is presented. The interferometer measures absolute amplitude and phase. The signal processing of the setup is analyzed and described in detail to optimize noise suppression. A noise floor of 7.1fm/Hz^{1/2} at 21MHz was achieved experimentally where the bandwidth is the inverse of all time needed for filter settling and signal sampling. To demonstrate the interferometer, measurements up to 220MHz were performed on arrays of capacitive micromachined ultrasonic transducers (CMUTs). The measurements provided detailed information e.g. about the frequency response, vibration patterns and array uniformity. Such measurements are highly valuable in the design process of ultrasonic transducers.

© 2013 Optical Society of America

## 1. Introduction

Optical probing of high frequency surface vibrations has been an area of interest for several decades and is a very useful tool in the development of acoustic devices. The design of ultrasonic transducers for emission and reception of ultrasound in imaging can greatly benefit from such tools. Vibration measurements also provide a great advantage in the design of surface acoustic wave (SAW) and bulk acoustic wave (BAW) filters. Probing the actual acoustic field inside the components provides much better information about the devices compared to only measuring their electrical properties.

The discussion is here limited to interferometric setups, which can be highly sensitive. Knuuttila *et al.*[1] demonstrated measurements on a SAW filter using a homodyne interferometer with swept reference path length. Graebner *et al.*[2] presented a stabilized homodyne interferometer capable of measuring absolute vibration amplitude and phase, and showed measurements on both piezo and MEMS components. Fattinger and Tikka [3] demonstrated a very fast homodyne interferometer that is stabilized by both a piezo actuated mirror and an optical phase modulator. Monchalin [4] measured ultrasonic vibrations on a steel plate using a heterodyne interferometer. Martinussen *et al.*[5] and Kokkonen and Kaivola [6] reported heterodyne interferometers that additionally were sensitive to the vibration phase and had high lateral resolution. Polytec also manufactures the “‘UHF-120”’ vibrometer which is a heterodyne interferometer where the demodulation is done digitally after sampling using a high frequency oscilloscope. Tachizaki *et al.*[7] demonstrated a Sagnac interferometer that uses a mode locked laser both for measurement and thermoelastic excitation. Fujikura *et al.*[8] also showed that it could be used with electrical excitation. Some of the best properties are the extremely high frequency capability of ∼ 1THz of [7], and the great sensitivity of 5.8fm/(Hz)^{1/2} of [8]. However both of these are limited to exciting and measuring only at multiples of the pulsed laser repetition rate of 76MHz. This can be a significant drawback when characterizing high Q devices operating below a few gigahertz. The sensitivity reported in [1] of 11fm/(Hz)^{1/2} is also outstanding, however the signal is not optimally utilized because it is measured only at points of optimum interferometer biasing while scanning the reference beam length. This causes a higher noise level than expected for the used measurement time.

Previously we have reported a heterodyne interferometer capable of measuring absolute vibration amplitude and phase up to the GHz-range [5]. Inherent resilience against frequency dependent detection, RF-cables and RF-electronics ensured absolute amplitude measurements without calibration. The reported noise floor was 6pm @ 3.3Hz detection bandwidth corresponding to ∼ 3.3pm/Hz^{1/2}. This paper presents modifications to the interferometer which improves the sensitivity to 7.1fm/Hz^{1/2} and makes it less vulnerable to slow but large fluctuations in the optical path length. The signal processing is also explained in greater detail, and with emphasis on optimal utilization of the measurement time. Improved sensitivity is always a desired property as it reveals more details about the device under test, and at least for ultrasonic transducers reduces the necessary excitation voltages effectively allowing less invasive measurements. Slow but large path length fluctuations might originate from building vibrations, air flows, laser cooling systems, and surface waves on the fluid when measuring immersed devices. Reducing the vulnerability to these effects can therefore allow measurements in harsher environments and with a wider range of equipment.

The setup is capable of reaching very low measurement bandwidths without transferring or processing excessive amounts of data. The noise floor is therefore demonstrated by showing measurements performed at 1Hz bandwidth. Systematic errors [9] can cause a limit for the minimum detectable vibration amplitude, which cannot be decreased by reducing the detection bandwidth. By demonstrating the sensitivity at the low bandwidth of 1Hz it is verified that such systematic errors do not inhibit the measurements of 7.1fm vibration amplitude. To the knowledge of the authors this is currently the best sensitivity demonstrated for interferometers where the measurement frequency can be chosen freely within the frequency range of the setup. The sensitivity measurement is performed on an array of capacitive micromachined ultrasonic transducers (CMUTs). It is also demonstrated that the interferometer can reveal adhesion problems in the wafer bonding process of CMUT fabrication.

Section 2 of this paper starts by describing the optical and electrical parts of the setup. The last part of the section presents a theoretical sensitivity limit and analysis of the utilized signal processing. The latter is only briefly explained in Section 2, and is instead thoroughly described in Appendix A. The CMUT structure is briefly described in Section 3. The measurements are presented in Section 4, and their implication on the CMUT properties are also discussed here. Section 5 discusses the same results with regard to the properties and useability of the interferometer. Finally a conclusion is drawn in Section 6.

## 2. Theory and experimental setup

#### 2.1. Optical setup

The optical setup is illustrated in Fig. 1. Some of the details of the beam path are described in [5], and these will only be briefly presented here. The laser (Coherent Verdi V-2) emits vertically polarized light at 532nm. The beam first enters the half wave plate (*λ*/2) which is used to tune the power ratio between the reference and object beams that are separated in the subsequent polarizing beam splitter (PBS). The object beam is focused to a ∼1μm spot on the sample and reflected back to the PBS where most of the light is reflected towards the ordinary beam splitter (BS) due to the double pass through the quarter wave plate (*λ*/4). The reference beam is frequency shifted by the acousto-optic modulators (AOMs) with a total frequency shift of *F _{m}* before it is recombined with the object beam in the BS, whereupon the interference signal is detected by the photodiode (PD). The sample can be laterally scanned by an XY translation stage allowing measurements of vibration patterns. Two AOMs are used to cover a wide frequency range and to prevent frequency dependent beam deflection. This is discussed more in Section 2.2.

#### 2.2. AOM configuration

### 2.2.1. Frequency shift

The measurement setup relies on a variable optical frequency shift *F _{m}* that is approximately half of the measuring frequency

*F*. A large frequency range is possible by using two AOMs as illustrated in Fig. 1. These can be configured to either shift the frequency in the same or in opposite directions. For measurements below 200MHz two Isomet 1250c with the range 200 ± 50MHz shift in opposite directions. These AOMs also cover the measurement frequencies 600 − 1000MHz when shifting in the same directions. The AOMs available to the setup are listed in Table 1. Using these Table 2 shows different configurations that combined allow measurements in the complete range between 0−1300MHz. The parameter

_{a}*F*

_{0}and the speed of sound

*V*are explained in Section 2.2.3. The second configuration in Table 2 results in a combined negative frequency shift. This changes Eq. (4) such that the signal of interest becomes the amplitude of the third term instead of the second term, but equally precise vibration measurements are still achievable. Note that AOM 1 does not need to be swapped or rotated to produce a down shift, which considerably eases the process of changing between configurations.

_{a}### 2.2.2. Beam focusing

Previously [5, 10] the reference beam was directed straight into AOM 1. As indicated in Fig. 1, the laser beam is now focused at the center of each of the two AOMs. The lens L1 creates a focus inside AOM 1, L2 refocuses the beam into AOM 2, and L3 collimates the beam before the final beamsplitter. A narrow beam at the center of the AOMs leads to a significant increase in conversion efficiency, which has been measured to ∼ 70% for a single AOM at the center frequency. An increased efficiency leads to a stronger signal at the detector and, as indicated by Eq. (9), an improved sensitivity. This improvement contributed significantly to the enhanced sensitivity presented here compared to that reported in [5]. The focal length of L1 and L3 should be identical and short enough to cause a sufficiently narrow beam waist, but long enough such that the diffraction orders do not overlap due to large beam divergence angle. L2 should have a focal length identical to one fourth of the center-to-center distance between the AOMs, ensuring that the field in AOM 1 is imaged at the center of AOM 2. The chosen focal lengths are 250mm for L1 and L3, and 25mm for L2.

### 2.2.3. Deflection control

In addition to a frequency shift, AOMs cause a frequency dependent deflection [11] which can be a problem for automated interferometers that rely on a continuously variable shift. The problem is alleviated by the lenses as L2 images the field in AOM 1 to AOM 2 regardless of deflection, and since L3 collimates the light from AOM 2 to a direction that is independent of the output angle from AOM 2. However, a change in this output angle leads to a parallel shifted output from L3. To ensure proper alignment when changing the frequency shift it is therefore necessary to keep the output angle from AOM 2 constant. For the configuration specified in the first row of Table 2 the combined deflection angle of the AOMs is determined by the sum of their driving frequencies when they are oriented as indicated in Fig. 1. Since the combined frequency shift *F _{m}* is determined by the difference in their driving frequencies, it is possible to keep the sum constant and thereby maintain a constant deflection angle when changing

*F*. If the speed of sound in the two AOMs is different the relation between the driving frequency and the deflection angle is also different. Expressions for the driving frequencies of AOM 1 and AOM 2,

_{m}*F*

_{1}and

*F*

_{2}, respectively, that ensure constant combined deflection can be found from expressions for the deflection angle based on phase matching between the optical and the acoustic waves [11]. The driving frequencies can in this way be found as

*F*

_{0}is a selectable frequency parameter and

*V*

_{a}_{1}and

*V*

_{a}_{2}are the respective speeds of sound in the acousto-optic interaction media of AOM 1 and AOM 2. The positive prefix in Eq. (2) is valid when the AOMs shift the optical frequency in the same direction, and the negative is valid when they shift it in opposite directions. Note that AOM 2 must be rotated ∼ 180° relative to the orientation in Fig. 1 when changing to a configuration where the AOMs shift the optical frequency in the same direction. The frequency ranges and the values for

*F*

_{0}in Table 2 all ensure that Eqs. (1) and (2) lead to values of

*F*

_{1}and

*F*

_{2}that are within the bandwidth of the respective AOMs. The combined effect of the lenses and the effect of the following Eqs. (1) and (2) thereby allows continuous frequency scans without disrupting the interferometer alignment.

#### 2.3. Detector signal

It is assumed that the probed surface vibrates harmonically as

where*d*(

*t*) is the instantaneous displacement,

*a*is the amplitude,

*F*is the frequency and

_{a}*ϕ*is the phase. For vibration amplitudes

_{a}*a*much smaller than the wavelength of the light

*λ*, it can be found from [5], that the current in the photodiode

*I*(

_{D}*t*) to the first order is given by

*α*is the photodiode responsivity,

*P*is the average optical power incident on the photodiode,

_{avg}*C*is the interference contrast, and

*ϕ*

_{1}and

*ϕ*

_{2}are the phase of the object and reference beam, respectively. The interference contrast holds information about the power ratios and the alignment between the two interfering beams. For optimal alignment with parallel phase fronts the interference contrast is equal to the fringe visibility [11]. If the complex amplitude of the first term in Eq. (4) with frequency

*F*is denoted

_{m}*R*and second term with frequency

_{N}*F*−

_{a}*F*is denoted (

_{m}*R*)

_{I}^{*}, the complex vibration amplitude

*a*exp(

*iϕ*) can be found by where * denotes complex conjugation.

_{a}#### 2.4. Signal extraction method

Figure 2 sketches the electronics used to extract *R _{I}* and

*R*. The AOMs are both driven by amplified signal generators and the vibrating sample is excited by a generator without additional amplification. The photodiode labeled Si-PIN (Hamamatsu S5973-2) is connected directly to a 50Ω RF bias-tee (Mini-Circuits ZFBT-4R2GW+) for reverse biasing. The signal is amplified by an amplifier based on the broadband 50Ω GALI-51+ from Mini-Circuits. The detector signal is, after amplification, split in two by a 3dB splitter and each part is amplified and subsequently attenuated. The two parts are mixed with dedicated local oscillator signals with frequencies

_{N}*F*and

_{mix,I}*F*such that they down convert the frequencies of the

_{mix,N}*R*and

_{I}*R*signals to the same frequency

_{N}*F*. As explained in [5], the initial amplifications and attenuations prevent local oscillator signals from leaking between the two signal paths. The mixer outputs are low pass filtered in passive fourth order Chebyshev filters to remove any remains of the local oscillator and the detector signal. These filters also remove the down mixed version of the

_{LIA}*R*-signal in the

_{N}*R*-path and vice versa. The filter outputs are led to two isolation transformers that eliminate ground currents, which otherwise could lead to additional noise. Finally the signals enter two Stanford Research Systems SR830 digital dual lock-in amplifiers (LIAs). The isolation transformers have a ratio that doubles the voltage, which reduces the impact of the noise generated by the LIAs. Both LIAs are locked to the reference frequency

_{I}*F*generated by a common signal generator. The LIA outputs are first sampled and stored in the internal memory before they are transferred to the computer. Simultaneous acquisition of

_{LIA}*R*and

_{I}*R*is important to accurately calculate the vibration amplitude according to Eq. (5). To ensure this, the sampling process is initiated by a trigger pulse that is common to both LIAs. This pulse is generated by one of the auxiliary outputs of one of the LIAs. The down conversions are here performed in two steps by first using analog mixers, and secondly using digital dual LIAs. This allows the detector signal to have higher frequencies than the LIAs support, while keeping most of the ideal behavior of a digital dual LIA. The generators labeled

_{N}*F*

_{1},

*F*

_{2}, and

*F*are Rohde & Schwarz SMB100A, the ones labeled

_{mix,I}*F*, and

_{a}*F*are HP 8648D, and the one labeled

_{mix,N}*F*is Agilent 33250A. All these signal generators are synchronized with a common 10MHz reference frequency.

_{LIA}For a given vibration frequency *F _{a}*, the described system allows the two frequency parameters

*F*and Δ

_{LIA}*F*=

*F*− 2

_{a}*F*to be chosen. Δ

_{m}*F*is the frequency difference between the

*R*and

_{I}*R*signals. The two local oscillator frequencies

_{N}*F*and

_{mix,I}*F*become

_{mix,N}*R*and

_{I}*R*signals. The figure also shows the positions of the local oscillator signals

_{N}*F*and

_{mix,I}*F*. Note that since the

_{mix,N}*R*-signal is down converted with a local oscillator signal of higher frequency the amplitude is complex conjugated, thus removing the initial * of the symbol. There are several considerations when choosing the frequency parameter Δ

_{I}*F*. Since the vibration amplitude is determined by the ratio of the two frequency components separated by Δ

*F*and since all electrical components have a frequency dependent efficiency, it is desirable to keep Δ

*F*as small as possible. On the other hand phase noise around the

*R*-signal at

_{N}*F*can limit the lowest detectable value of

_{m}*R*if Δ

_{I}*F*is too small. Experiments have indicated an improved sensitivity for increasing Δ

*F*all the way up to approximately Δ

*F*= 1.8MHz. This value is therefore currently used for vibration frequencies above 10MHz. Below this limit Δ

*F*is decreased because 1.8MHz becomes significant compared to

*F*and

_{m}*F*−

_{a}*F*. The frequency

_{m}*F*is chosen to be 10kHz. This value is high enough to allow the use of short filter time constants in the LIA filters, and at the same time it does not lead to strict requirements for the isolation transformers in Fig. 2. The filter time constant for the LIAs is discussed more in Section 2.6.

_{LIA}#### 2.5. Sensitivity

The minimum detectable vibration amplitude is limited by the noise in the system. The sensitivity is here defined as the vibration amplitude *a _{min}* where the average signal power of the

*F*−

_{a}*F*-frequency term in Eq. (4) equals the noise power. Such noise can in our system originate from laser noise, thermal noise, amplifier noise, noise in the AOM driver generators, noise in the local oscillators of the mixers in Fig. 2 and shot noise caused by random arrival time of photons. The effect of all these sources except the latter can theoretically be made insignificant by using high quality equipment or by sufficiently raising the signal level. In such a measuring system shot noise is the dominant noise source with a root mean square (RMS) photodiode noise current

_{m}*i*of [12]

_{s}*q*is the electron charge,

*α*is the responsivity of the photodiode, and

*ENBW*is the equivalent noise bandwidth of the system. As explained in [12], the resulting sensitivity then becomes Note that [12] states the sensitivity as a function of total laser power, while it here is a function of the average power incident on the photodiode. A condition for shot noise limited operation is that the noise from thermal sources contribute less to the total noise than the shot noise. In the setup described here, thermal noise is introduced by the noise figure of the RF amplifier following the detector. The noise figure specified by the manufacturer is valid for a source impedance of 50Ω. The photodiode has a very high impedance which transforms to a complex frequency dependent impedance by the short transmission line to the amplifier input. The thermal noise level is therefore difficult to know a priori. Noise measurements have indicated that shot noise exceeds this thermal noise for

*P*> 1.3mW which is approximately 70% higher than the specified noise figure of 3.7dB would suggest.

_{avg}In this paper there will be a distinction between the *ENBW* of the system and the measurement bandwidth *MBW*. As implied by Eq. (8) the *ENBW* is the total effective bandwidth at which the system is sensitive to noise, and is a quantity that should be as low as possible to suppress noise. The disadvantage of a low *ENBW* is a long measuring time *T _{m}* which here is defined as all the time needed by the signal processing system to perform one measurement and includes filter settling time. The

*MBW*is the reciprocal of

*T*and it is the square root of this quantity that is used as normalization when specifying the sensitivity of the setup. Normalizing with the

_{m}*MBW*leads to a more useful benchmark for the entire system as the

*ENBW*can be larger than the

*MBW*due to non-optimal signal processing or non-optimal utilization of the signal. Such non-ideal properties would not be reflected in the sensitivity specification if the

*ENBW*was used for normalization.

#### 2.6. Signal processing

This section briefly describes the signal processing in the LIAs and the computer. A more detailed derivation of its manner of operation and properties are presented in Appendix A. LIAs operate by mixing the input and the reference signals and then low pass filtering the result. Figure 4 illustrates the signal processing following the mixer step. Note that all signal paths in Fig. 4 illustrate complex signals. The *R _{N}* and

*R*signals are first individually low pass filtered inside the LIAs using the

_{I}*H*(

_{LIA}*F*)-filter which is defined by a time constant

*τ*and a filter order

*p*. After filtering the signals are sampled and transferred to the computer that calculates the complex ratio between them. Subsequent filtering by simple averaging is used to limit the

*ENBW*before the result is found by scaling and conjugating as stated in Eq. (5). According to Section A.5 the averaging is more efficient than the

*H*(

_{LIA}*F*)-filter at reducing

*ENBW*for a given measuring time, yielding a better

*MBW*to

*ENBW*ratio. It is therefore beneficial to limit

*ENBW*by averaging and using a short filter time constant

*τ*. The

*H*(

_{LIA}*F*)-filter is only needed to avoid that aliasing in the sampling process leads to increased noise. As stated in Section A.4, this is accomplished if

*τ*and

*p*are chosen such that

*F*is the sampling frequency. Note that this is less restrictive than the ordinary sampling theorem would suggest. If Eq. (10) is fulfilled and if the averaging time

_{s}*T*is much longer than ∼ 10 ×

_{a}*τ*, then Eq. (26) can be approximated to yield an

*ENBW*and

*MBW*of

*ENBW*is twice as large as

*MBW*because the mixers in Fig. 2 down convert to

*F*the power around the two frequencies

_{LIA}*F*±

_{mix,I}*F*while there is signal only at

_{LIA}*F*−

_{mix,I}*F*.

_{LIA}The sampling frequency for the measurements in this paper was chosen to be *F _{s}* = 128Hz because a higher sampling frequency leads to a notable data transfer time due to our rather slow GPIB communication link. Using Eq. (10) we have found that

*τ*= 3ms and

*p*= 3 is suitable as it results in 2|

*H*(

_{LIA}*F*)|

_{s}^{2}≈ 0.0063. Inserting these values for

*τ*and

*p*and an averaging time of

*T*= 1s in the exact Eq. (20) yields an

_{a}*ENBW*of 2.003Hz. A lower value of

*τ*could violate Eq. (10) and lead to a higher

*ENBW*.

The ratio between the *R _{I}* and

*R*signals is calculated before the averaging step. As explained in Section A.2, this has little impact on the sensitivity to weak noise, but it improves the resilience against low frequency phase noise in the interferometer. Such noise might originate from e.g. atmospheric fluctuations, building vibrations, and laser cooling systems. The measurements are insensitive to the optical path difference because it equally affects the phase of both the complex amplitudes

_{N}*R*and

_{I}*R*[5]. However large fluctuations in the optical path difference while a measurement is ongoing could reduce the signal to noise ratio. When averaged or low pass filtered the signals could get heavily attenuated because of the random nature of their phase fluctuations. It is therefore beneficial that the time constant of

_{N}*H*(

_{LIA}*F*) is as short as possible such that the path difference is unable to change significantly within that period. The

*R*to

_{I}*R*complex ratio calculation suppresses these common phase variations, and additional filtering or averaging can subsequently be performed. This optimization was necessary in order to perform the measurements on fluid submerged CMUTs in [10].

_{N}## 3. Measured sample

The interferometer is demonstrated by using it to characterize a 2D array of capacitive micro-machined ultrasonic transducers (CMUTs). CMUTs are a MEMS ultrasound transducer using electrostatic forces to excite ultrasonic waves. A benefit of CMUTs compared to piezo-electric transducers is inherently wider bandwidth. The CMUT array presented here is developed at NTNU, Department of Electronics and Telecommunications and Sintef MINALAB, and is fabricated at the latter location. It is intended for high resolution intravascular imaging of plaque in the coronary arteries. The array is made by wafer bonding a thin film to a wafer with etched holes. The bottoms of these holes are covered by doped poly-silicon which forms the bottom electrodes of the CMUTs. The poly-silicon is insulated from the substrate by a layer of silicon oxide and electrically connected to the back side of the wafer through vias. The wafer bonded film consists of a layer of silicon nitride and silicon oxide, and layers of titanium and another layer of silicon nitride are added after the wafer bonding process. This film stack is called the top plate and is illustrated together with the bottom wafer in Fig. 5. The gap between the bottom electrode and the top plate is approximately 60nm [13]. The top plate is unpatterned and the titanium layer in it is electrically grounded and serves as the top electrode for all the CMUTs in the array. Figure 6 illustrates the form of the etched holes. Four circular CMUTs together with connective ditches are etched out. The four CMUTs are denoted a cell, and they share a common via to the bottom side which is placed under one of the CMUTs. Aluminum traces at the back side of the wafer connect the electrodes of the cells to pads for wire bonding. The device examined here is a test structure where the cells are interconnected along lines in the array effectively forming a 1D array. These interconnected lines will be denoted as transducer elements. The transducer array consists of 52 × 36 cells where the elements are formed along the longest direction such that the array has 36 electrically independent elements. The top plate of the CMUTs is electrostatically deflected by applying a DC voltage between the poly-silicon and the titanium layer, and vibrations are excited by additionally adding a lower AC voltage that modulates the deflection. More information about the structure and the fabrication of an almost identical device with aluminium instead of titanium in the top plate, can be found in [13].

## 4. Experiments

#### 4.1. System sensitivity

The system sensitivity was estimated by measuring the vibration amplitude at the center of a CMUT while exponentially increasing the excitation AC voltage. The DC voltage was 20V and the excitation frequency was 21MHz. For this measurement series the *MBW* was at 1Hz, which according to Section A.5 and Eq. (27), corresponds an averaging time *T _{a}* of 0.9664s and an

*ENBW*of 2.07Hz. Note that the measurements presented in sections 4.2 – 4.4 are performed with an

*MBW*of 4.3Hz. Figure 7 illustrates the result, and indicates that for high excitation voltages the measured amplitude increases proportionally with AC excitation voltage. At the lower end of the plot the measured amplitude is dominated by noise, and the root mean square of the values with AC excitation voltage below 3μV is 7.1fm. This demonstrates that the noise floor of the interferometer normalized to the

*MBW*, here is 7.1fm/Hz

^{1/2}. A total of 275 amplitude samples was used to estimate this value, which implies that its 95% confidence interval is within 7.1 ± 0.5fm. The laser wavelength

*λ*was 532nm and the average optical power on the photodiode

*P*, was during this measurement series found to be 6.7mW by measuring the average photo current. The interference contrast

_{avg}*C*was estimated to 47% and the photodiode responsivity,

*α*, was specified to ∼ 0.37A/W by the manufacturer. Using these parameters Eq. (9) gives a theoretical sensitivity of 4.2fm. The measured sensitivity limit is therefore 70% worse than the shot noise limit. The measurement in Fig. 7 consisted of 666 measurement points and took 25min to complete, which corresponds to 2.3s per measurement point.

#### 4.2. Vibration patterns and frequency response

To get an overview of the vibration pattern of the CMUTs, four cells near the center of the array were measured when biased at 20V and excited by an RMS AC voltage of 5mV at 21MHz. This frequency is below the lowest resonance, which means the measured vibration pattern should resemble the static deflection of the CMUTs. This can give information about the uniformity and distribution of the electrical excitation force and the stiffness of the top plate. The measurement was performed with a raster scan resolution of 1μm, and linear interpolation was used to estimate vibration amplitudes between measurement points. Figure 8 shows the resulting vibration pattern of four cells with both a linear and a logarithmic color scale. The form is mostly consistent with the shape of the CMUT cell shown in Fig. 6, but a variation in CMUT vibration amplitude is revealed. In particular the upper left CMUT vibrates significantly weaker than the rest. Figure 8(b) also indicates vibrations between and in the center of the CMUT cell at the lower left. This is most likely because of acoustic cross coupling from the CMUTs to the relatively thin (∼ 20μm) substrate. The measurement of the vibration pattern in Fig. 8 took 82min to complete, which corresponds to 1.7*s* per measurement point.

The frequency response of the CMUT cell at the lower left in Fig. 8(a) was found by measuring the vibration amplitude at the two points indicated by black circles in the figure. The points are not located along any of the symmetry lines of the CMUT cell, which reduces the probability of missing resonances by accidentally probing only the nodes of the vibration pattern. The excitation frequency was scanned from 10 to 220MHz, the DC bias was 20V and the RMS AC excitation voltage was 20mV. The AOMs were configured as specified by the first row of Table 2 for the entire measurement series despite it being formally limited to below 200MHz. The results shown in Figs. 9(a) and (b) indicate that the fundamental resonance frequency is at 35.62MHz. This resonance is sharp (*Q* ≈ 140), but will broaden considerably when the device is operated in fluids due to better acoustic impedance match. The curve shape close to the resonance is not a single peak, but consists of several small peaks. In addition to the fundamental resonance, Fig. 9 also indicates the presence of higher order resonances, and significant amplitudes are found around 68.80 and 103.55MHz. Figure 9(c) indicates a declining signal level for increasing *F _{a}*, and the signal level at the highest frequency is ∼ 41% of the peak. To further investigate both the shape of the fundamental and the higher order resonances, the vibration pattern of the CMUT cell at the lower left of Fig. 8 was measured for the three resonances and at 196.30MHz. The latter was included because Fig. 9(a) indicates consistently high amplitude around this point. The lateral resolution of the raster scan was 0.5μm, and the voltages were the same as for the frequency response measurement. Figure 10 illustrates the vibration patterns for the excitation frequencies 35.62, 68.80 103.55, and 196.30MHz. The left side plots show the measured absolute amplitude, and the right side plots show the vibration phase. The labels in Fig. 9(a) and (b) indicate which subfigure in Fig. 10 that shows the vibration pattern of the different peaks.

The vibration patterns of an ideal cell should have the same symmetry as the CMUT structure in Fig. 6, and most of Fig. 8 and Fig. 10(c) are such examples. Figs. 10(a), (b), and (d) show vibration patterns without this symmetry, which indicates that also antisymmetric and non-symmetric vibration modes are excited. Measuring the presence, form and frequency at which all these vibration patterns occur, provides helpful feedback in the CMUT development process. It can for example provide information about the performance of the design, fabrication accuracy and material parameters.

#### 4.3. Array uniformity

To get an overview of the uniformity of the main resonance frequency, the amplitude of all 104 × 72 CMUTs in the array were measured for excitation frequencies ranging from 29 to 36MHz with a frequency step of 0.5MHz. The vibration amplitude at the center of each CMUT was measured while exciting with a DC voltage of 20V and an RMS AC voltage of 5mV. The results indicated a regional dependence of the main resonance across the array. This dependence is best illustrated by the measurements at 33.5 and 35.5MHz, which are shown in Fig. 11. Each pixel in these figures represents the amplitude of one CMUT. Due to space constraints, measurements performed at other frequencies are not shown. Figure 11(a) shows highest amplitudes close to the bottom and side edges of the array, while Fig. 11(b) shows highest amplitudes close to the center and the top edge. This indicates that the resonance frequency is lower close to the bottom and side edges compared to the center and close to the top edge. Such a regionally dependent resonance might be due to non-uniform stress or thickness gradients in the wafer bonded top plate. Figure 11 also indicates four rows with significantly weaker amplitude, which is due to faulty wire bonding. The high vibration amplitudes of some of the CMUTs in these lines indicate that acoustic cross coupling from neighboring rows is able to notably excite these rows close to their resonance. In addition to the regional dependence of the resonance frequency the plots in Fig. 11(b) indicate random variations. This might be because their resonances have fluctuations similar to the one in Fig. 9(b). Some of the CMUTs could then have a low response at the excitation frequency despite it being close to their resonance.

#### 4.4. Top plate adhesion

Several vibration patterns similar to Fig. 8 have been measured, and Fig. 12 shows one from a CMUT cell that is close to the lower edge of the array. As for the measurement depicted in Fig. 8, the excitation frequency, *F _{a}*, was 21MHz, the DC voltage was 20V, and the AC RMS voltage was 5mV. In contrast to the vibration pattern in Fig. 8, the measurement indicates notable amplitude at the center of the CMUT cell, which is out of phase with the vibrations on the CMUTs. This implies that the top plate is not fixed at the center of the cell, but is loose and pivots along the edge of the support structure at the center. Figure 12 also indicates out of phase vibration in the area outside the CMUT cell, which could indicate a loose top plate also here. However since the amplitude is low, the cause might also be vibrations in the substrate.

## 5. Discussion

The presented measurements have provided precise information about the CMUT array under investigation. The sensitivity of the heterodyne interferometer was measured to be 7.1fm/Hz^{1/2} at 21MHz. This is 70% higher than a purely shot noise limited system which means that also other noise sources contribute. Taking into account the thermal amplifier noise mentioned in Section 2.5 only reduces the theoretical to measured discrepancy to 56%. It is believed that much of this noise is generated in the signal generators and is inserted into the signal path through the AOMs and the mixer for the *R _{I}*-signal in Fig. 2. Tests with signal generators of lower quality used as the local oscillator for this mixer have resulted in significantly higher noise levels. In addition the detector signal spectrum with only the reference beam incident on the photodiode has indicated elevated noise in the frequency band ∼ 0 – 50MHz. The laser is not a likely origin of this noise since it has not been observed in the same spectrum when only the object beam was detected. This can be a relevant noise source for many optical systems that use some sort of signal generator driven modulator. Such a modulator can for example be a part of heterodyne interferometers or systems that utilize some sort of beam chopping combined with lock-in techniques. The minimum detectable amplitude of measurement systems can be limited by systematic errors which are not reduced by decreasing the

*MBW*. Since the sensitivity measurement was performed with

*MBW*= 1Hz, it is clear that such errors do not inhibit measurements at 7.1fm.

In the sensitivity measurement in Fig. 7 the setup used on average 2.3s per measurement point which is 1.3s more than was needed by the signal processing system. This additional time was used for GPIB communication, computation and settling of the signal generators after changing their settings. The settling time was most likely higher than required since the software not was optimized for a scan of AC voltage. It is reasonable to assume that this additional time can be significantly reduced by using more dedicated and integrated electronics.

The *R _{N}* signal level plotted in Fig. 9(c) indicated that the AOM deflection control technique described in Section 2.2.3 performed well. The signal level at the highest frequency was ∼ 41% of the peak, which can be attributed to the reduced conversion efficiency of the AOMs. This reduction does not affect the calculated vibration amplitude because it is determined by the ratio between

*R*and

_{I}*R*, but it implies that the sensitivity to some extent is reduced. Measurements were performed above the frequency limits of the AOMs, indicating that the actual overlap between the configurations in Table 2 can be stretched further than the table states. A potential improvement can be to reflect the beam out of the lens L3 in Fig. 1 back towards the AOMs. The light would then retrace itself back and double the total frequency shift, and the beams would recombine on the PBS and could interfere on eg. a 45° polarizer. As the combined frequency shift would double, so would the frequency range of all the configurations in Table 2. Alternatively the setup could be based on a fixed frequency shift in the reference beam and instead a variable Δ

_{N}*F*in Eqs. (6) and (7). That would allow a simpler AOM configuration at the cost of losing the inherent resilience against frequency dependent RF components in the absolute amplitude calculations. Calibration of such frequency dependence is possible, but at least calibration of the photodiode, requires advanced equipment and the calibration might be invalidated by simple modifications such as cable replacements. Another advantage of the variable frequency shift is that it allows measurements at frequencies that are approximately twice as large as the maximum frequency in the electrical setup in Fig. 2.

## 6. Conclusion

The heterodyne interferometer has proven a viable tool for characterizing detailed properties of ultrasonic transducer arrays. A noise floor of 7.1fm/Hz^{1/2} was demonstrated where the measurement bandwidth is the inverse of all time needed for filter settling and signal sampling. The most significant noise sources are believed to be shot noise and noise generated by three of the signal generators and injected into the signal path by the AOMs and one of the mixers. The sensitivity was achieved by optimizing the signal sampling and processing, and by increasing the efficiency of the AOMs by focusing the input light. The latter caused the most significant improvement and simultaneously made it easier to maintain interferometer alignment while scanning the optical frequency shift. This was demonstrated by measuring the frequency response of a transducer in a CMUT array. The measurements revealed the presence of two strongly excited higher order vibration modes at 68.80 and 103.55MHz. By swapping or rotating one of the AOMs, the interferometer is able to measure vibrations up to 1.3GHz. The interferometer has also proved to be a great tool for quality control in the pre-manufacturing phase by revealing faulty adhesion between the substrate and the top plate in the center of a CMUT cell.

## Appendix A. Signal processing

## A.1. Introduction

This appendix describes the details of the signal processing steps used to estimate the vibration amplitude. Understanding and modeling this process accurately is important to understand the noise sensitivity of the setup and to optimize signal acquisition. First the signal flow is modeled to find an expression for its equivalent noise bandwidth, *ENBW*. This is used to optimize lock-in amplifier (LIA) settings and measurement procedures for minimizing *ENBW* for a given measurement bandwidth, *MBW*. The model is specific for the heterodyne interferometer presented in this paper, but most of the derivations are general to systems that acquire signals using LIAs.

## A.2. Signal flow

The LIAs used in the setups are Stanford Research System model SR830. These are dual LIAs meaning they have two outputs driven by individual mixers that operate in quadrature. The complex amplitude of the input, with phase relative to the reference frequency, is found using the two outputs as real and imaginary parts. Since the mixers inside the LIAs are digital, they do not suffer from typical analog mixer limitations such as DC offset and nonlinear behavior. A block diagram of the signal processing following the mixers in the LIAs is shown in Fig. 4. Note that all the signals in Fig. 4 are complex where the real and imaginary parts originate from the two mixers that operate in quadrature. The signals between the different blocks are denoted by a lower case *x* with appropriate indices. Upper case *X* with the same indices will, in the following sections, be used to denote the Fourier transform of the same signals. The Fourier transform of the digital signals is based on Fourier transform of discrete time signals as defined in [14], but scaled such that the level matches the time continuous Fourier transform of the signal before sampling. The physical frequency *F* with unit *cycles per second* is used instead of the normalized frequency *f*[14] with unit *cycles per sample*. The resulting definition of the Fourier transform of a sampled signal *x*(*T _{s}n*) where

*F*= 1/

_{s}*T*is the sampling frequency is

_{s}*n*is the sample number. After down converting the inputs at the frequency

*F*to DC by the LIA mixers, the LIAs low pass filter the result. The filters, which are the first blocks in Fig. 4, remove the mirror frequency at 2

_{LIA}*F*and also much of the noise. Our LIAs can have up to four cascaded filters with a time constant

_{LIA}*τ*. The resulting filter transfer function [15] becomes

*p*is the number of cascaded filters. Equation (14) has the form of a continuous time filter transfer function. It is used although the filters actually are time discrete. The error caused by this approximation is insignificant because the interesting part of the frequency spectrum is well below half the internal sampling frequency used in the LIAs which is 256kHz. After filtering the signals are down sampled before being sent to the computer. The down sampling step can, for the same reason, be regarded as a regular sampling step. The resulting down sampled signal,

*x*(

_{Is}*T*), is the output of the corresponding LIA and has the spectrum [14] A similar expression can be made for the spectrum of the output of the other LIA. As illustrated in Fig. 4, the ratio between the samples of the two complex signals is determined after the signals are transferred to the computer. This is part of the vibration amplitude calculation in Eq. (5). The ratio is independent of phase fluctuations due to changes in the optical path length and to level fluctuations caused by variations in laser intensity and object reflectivity. This is shown and discussed in more detail in [5]. The exact treatment of the statistical properties of a ratio between random variables is very complicated. Since the

_{s}n*R*-signal will in this case, be much stronger than both the

_{N}*R*-signal and the noise, it will be treated as a constant in the statistical calculations. The result will therefore be a scaled version of the

_{I}*R*-signal.

_{I}The two blocks following the ratio block in Fig. 4 model the averaging of a specific number of samples. The result is used to calculate the vibration amplitude in the last block of the figure. Accurate modeling of the averaging process is achieved by regarding it as a finite impulse response (FIR) filter [14] followed by a down sampling that only leaves one sample for the averaging period. The filter is formed by uniformly weighting a finite number of consecutive samples and weighting the rest by zero resulting in a rectangular impulse response. For an averaging period *T _{a}* the number of averaged samples is

*M*=

_{a}*T*and the filter transfer function

_{a}F_{s}*H*(

_{avg}*F*) of the averaging filter becomes

*M*original samples. With

_{a}*F*= 1/

_{s}*T*still being the sampling frequency before the down sampling, the resulting spectrum becomes

_{s}*x*(

_{Im}*t*) is constant and noise free,

*X*(

_{Im}*F*) has all its power at

*F*= 0 and Eq. (18) becomes Eq. (5). This relies on

*x*(

_{Im}*t*) and

*x*(

_{Nm}*t*) having equal scaling to

*R*and

_{I}*R*, respectively.

_{N}## A.3. Equivalent noise bandwidth

The *ENBW* of the system is determined by the filters *H _{LIA}* and

*H*as well as the sampling frequency

_{avg}*F*. In the limit where the actual vibration amplitude is zero, the measured value is determined by noise. Using Eq. (18) and assuming that the noise at the input of the

_{s}*R*-branch in Fig. 4 has power spectral density [14]

_{I}*N*, then the power spectral density of the measurement,

_{Im}*N*(

_{a}*F*), becomes

*ENBW*is the width of a rectangular filter that results in the same total noise power at the output as the effective filter in Eq. (19). Referred to the photodiode current, the

*ENBW*is twice as large because the mixers in Fig. 2 down convert the noise at both ±

*F*frequency offset from the local oscillators to

_{LIA}*F*. The resulting

_{LIA}*ENBW*therefore becomes

*F*= +

*F*down to

_{LIA}*F*= 0. The negative side therefore corresponds to positive frequencies just below +

*F*in the LIA input signal, and should therefore be included in the integral.

_{LIA}## A.4. Lock-in amplifier parameters

The expression in Eq. (20) can be used to find a suitable filter time constant *τ* and sampling frequency *F _{s}* for the lock-in amplifiers. For an averaging time

*T*significantly longer than the time constant

_{a}*τ*, Eq. (20) can be approximated to

*H*(

_{avg}*F*)| has a sharp peak at

*F*= 0, and that it is otherwise small. The approximation is true in the limits where |

*H*(

_{avg}*F*)| approaches a dirac delta function or where the series approaches a constant in the integral. The zeroth term in the series in Eq. (21) must be 1 to not affect the measurement, but all other terms should be as small as possible. In the limit where these approach zero and if

*q*> 1 in Eq. (14) only the

*k*= ±1 terms are sufficiently large to notably contribute to the total series, leading to the requirement that This means that the

*H*(

_{LIA}*F*)-filter must significantly attenuate the signal and noise at

_{s}*F*=

*F*, which is a less restrictive requirement than the ordinary sampling theorem [14] stating that all the signal and noise power for

_{s}*F*>

*F*/2 must be eliminated. The reason for this difference is that a slight violation of the sampling theorem will only affect frequencies that subsequently are attenuated by

_{s}*H*. Figure 13 compares the exact

_{avg}*ENBW*from Eq. (20) to the approximations which Eq. (22) is based on, and indicates that they are acceptable for the purpose of tuning the LIA-settings when Eq. (22) is fulfilled.

## A.5. Measurement time and MBW

This section looks at the time needed by the signal processing system to perform one measurement, and compares the two filters *H _{LIA}*(

*F*) and

*H*(

_{avg}*F*) to determine which is the most efficient at reducing the total

*ENBW*for a given measurement time. Finally a relation between the system

*ENBW*and the

*MBW*is established. The effects of sampling will be ignored which is reasonable when Eq. (22) is sufficiently satisfied. The measurement time is here defined as the time needed after complete settling of the mechanical positioning system and the stimuli circuits, until the measurement is complete. It is therefore determined by the time response of the filters

*H*(

_{LIA}*F*) and

*H*(

_{avg}*F*) as it includes both the actual averaging time

*T*in addition to the necessary filter settling time,

_{a}*T*, needed before the averaging can start.

_{w}The time domain impulse response from *H _{LIA}*(

*F*) is

*t*is time in seconds and

*u*(

*t*) is the unit step function. Here

*T*is defined by demanding that the input signal before the waiting time starts, is damped by a factor of 1000. This is equivalent to satisfying the equation The solution for

_{w}*T*to this equation and the corresponding

_{w}*ENBW*for the

*H*(

_{LIA}*F*)-filter alone are stated in Table 3 for different values of

*p*. The

*ENBW*is found by setting |

*H*(

_{avg}*F*)| = 1 in Eq. (20). The table also includes a figure of merit namely the product

*T*×

_{w}*ENBW*, which indicates how effective the filter is at reducing

*ENBW*per unit of time. By replacing the series in Eq. (20) with 1, the

*ENBW*of the averaging filter can be found to be 2/

*T*. Since the total waiting time is

_{a}*T*, the equivalent figure of merit becomes 2. This means that the averaging filter is more efficient at limiting the

_{a}*ENBW*for a given measurement time. It is therefore beneficial that the averaging filter dominates the

*ENBW*-limiting system, and that the filter time constant

*τ*is as small as possible without violating Eq. (22). The total measurement time

*T*is approximately the sum of the contributions of the two filters and the measurement bandwidth becomes The approximation is valid when the averaging filter is dominating (

_{m}*T*>>

_{a}*T*) and Eq. (22) is sufficiently satisfied. Rearranging the exact version of Eq. (26) leads to an expression for the required averaging time

_{w}*T*to achieve a specific

_{a}*MBW*For the chosen

*τ*= 3ms and

*q*= 3,

*T*becomes 33.6ms. If an

_{w}*MBW*of 1Hz is desired, the required averaging time

*T*becomes 0.9664s, and

_{a}*ENBW*becomes 2.07Hz according to Eq. (20). The approximation in Eq. (26) would instead result in

*T*= 1s and

_{a}*ENBW*= 2Hz.

## Acknowledgments

The authors acknowledge financial support from the Norwegian Research Council for funding SMIDA (project no. 159559/130) and the Norwegian PhD Network on Nanotechnology for Microsystems (project no. 192456). Both of these contributed financially.

## References and links

**1. **J. Knuuttila, P. Tikka, and M. Salomaa, “Scanning Michelson interferometer for imaging surface acoustic wave fields,” Opt. Lett. **25**, 613–615 (2000) [CrossRef] .

**2. **J. Graebner, B. Barber, P. Gammel, D. Greywall, and S. Gopani, “Dynamic visualization of subangstrom high-frequency surface vibrations,” Appl. Phys. Lett. **78**, 159–161 (2001) [CrossRef] .

**3. **G. Fattinger and P. Tikka, “Modified Mach-Zender laser interferometer for probing bulk acoustic waves,” Appl. Phys. Lett. **79**, 290–292 (2001) [CrossRef] .

**4. **J.-P. Monchalin, “Heterodyne interferometric laser probe to measure continuous ultrasonic displacements,” Rev. Sci. Instrum. **56**, 543–546 (1985) [CrossRef] .

**5. **H. Martinussen, A. Aksnes, and H. E. Engan, “Wide frequency range measurements of absolute phase and amplitude of vibrations in micro- and nanostructures by optical interferometry,” Opt. Express **15**, 11370–11384 (2007) [CrossRef] [PubMed] .

**6. **K. Kokkonen and M. Kaivola, “Scanning heterodyne laser interferometer for phase-sensitive absolute-amplitude measurements of surface vibrations,” Appl. Phys. Lett. **92**, 063502 (2008) [CrossRef] .

**7. **T. Tachizaki, T. Muroya, O. Matsuda, Y. Sugawara, D. Hurley, and O. Wright, “Scanning ultrafast sagnac interferometry for imaging two-dimensional surface wave propagation,” Rev. Sci. Instrum. **77**, 043713 (2006) [CrossRef] .

**8. **T. Fujikura, O. Matsuda, D. Profunser, O. Wright, J. Masson, and S. Ballandras, “Real-time imaging of acoustic waves on a bulk acoustic resonator,” Appl. Phys. Lett. **93**, 261101 (2008) [CrossRef] .

**9. **J. Lawall and E. Kessler, “Michelson interferometry with 10 pm accuracy,” Rev. Sci. Instrum. **71**, 2669–2676 (2000) [CrossRef] .

**10. **E. Leirset and A. Aksnes, “Optical vibration measurements of cross coupling effects in capacitive micromachined ultrasonic transducer arrays,” Proc. SPIE **8082**, 80823N (2011) [CrossRef]

**11. **B. E. A. Saleh and M. C. Teich, *Fundamentals of Photonics* (John Wiley & Sons, Inc., 1991) [CrossRef] .

**12. **R. L. Whitman and A. Korpel, “Probing of acoustic surface perturbations by coherent light,” Appl. Opt. **8**, 1567–1576 (1969) [CrossRef] [PubMed] .

**13. **J. Due-Hansen, K. Midtbø, E. Poppe, A. Summanwar, G. Jensen, L. Breivik, D. Wang, and K. Schjølberg-Henriksen, “Fabrication process for CMUT arrays with polysilicon electrodes, nanometre precision cavity gaps and through-silicon vias,”J. Micromech. Microeng. **22**, 074009 (2012) [CrossRef] .

**14. **J. G. Proakis and D. G. Manolakis, *Digital Signal Processing* (Prentice Hall, 1996), 3rd ed.

**15. **J. W. Nilsson and S. A. Riedel, *Electric Circuits* (Prentice Hall, 2001), 6th ed.