Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

High-throughput time-stretch imaging flow cytometry for multi-class classification of phytoplankton

Open Access Open Access

Abstract

Time-stretch imaging has been regarded as an attractive technique for high-throughput imaging flow cytometry primarily owing to its real-time, continuous ultrafast operation. Nevertheless, two key challenges remain: (1) sufficiently high time-stretch image resolution and contrast is needed for visualizing sub-cellular complexity of single cells, and (2) the ability to unravel the heterogeneity and complexity of the highly diverse population of cells – a central problem of single-cell analysis in life sciences – is required. We here demonstrate an optofluidic time-stretch imaging flow cytometer that enables these two features, in the context of high-throughput multi-class (up to 14 classes) phytoplantkton screening and classification. Based on the comprehensive feature extraction and selection procedures, we show that the intracellular texture/morphology, which is revealed by high-resolution time-stretch imaging, plays a critical role of improving the accuracy of phytoplankton classification, as high as 94.7%, based on multi-class support vector machine (SVM). We also demonstrate that high-resolution time-stretch images, which allows exploitation of various feature domains, e.g. Fourier space, enables further sub-population identification – paving the way toward deeper learning and classification based on large-scale single-cell images. Not only applicable to biomedical diagnostic, this work is anticipated to find immediate applications in marine and biofuel research.

© 2016 Optical Society of America

1. Introduction

Harnessing group velocity dispersion (GVD) without suffering from the associated dispersive loss through optical amplification, optical time-stretch (also known as dispersive Fourier transformation) enables continuous, single-shot spectrally-encoded measurements at an ultrafast rate beyond MHz in real-time [1]. Apart from the originally perceived applications in high-speed optical communication, its diverse potentials have progressively been invigorating many other arenas, notably high-throughput single-cell imaging [2–5]. A unique feature empowered by optical time-stretch imaging is generation of the enormous real-time image data, from which a wealth of image-based information of individual cells can be retrieved, such as textures, geometrics and morphologies, at an unprecedented throughput. Such complex high-dimensional image information is innately favorable for automated classification and thus screening applications with the use of machine learning. Prior work on integrating machine learning and time-stretch imaging was successfully demonstrated for binary (two-class) classification, such as screening cancer cells from blood cells [6]. Nevertheless, biological cells by its nature are highly heterogeneous and diverse in cell types, states and thus functions. In order to fully exploit the potential of time-stretch imaging for single-cell analysis that has proven impact in both clinical diagnostics and fundamental cell biology research, two requirements have to be met: (1) sufficiently high time-stretch image quality (resolution and contrast) for visualizing the complexity of the cellular and subcellular structures; (2) automated multi-class classification and analysis for revealing the heterogeneity of the cell populations. With the aims to address these issues and thus further widen the scope of time-stretch imaging, we here demonstrate ultrafast label-free (bright-field) imaging flow cytometry of phytoplankton, combined with the capability of automated multi-class (14 classes) support vector machine (SVM) classification, thanks to the high-resolution time-stretch images, from which up to 44 cellular features can be extracted for classification.

Phytoplankton (also called microalgae) are photosynthetic micro-organisms and are the key primary building blocks in the aquatic ecosystem [7]. The ability to enumerate and characterize phytoplankton is of considerable value for environmental monitoring [8–10], e.g. detection of harmful algal bloom (HAB) species [11] and screening of microalgal candidates as renewable and sustainable source for biodiesel production [12]. Due to the extreme diversity in genus and species, phytoplankton are highly heterogeneous in size, shape and morphology/texture. Current gold standards run short of the combination of high-throughput and high-accuracy for large-scale phytoplankton detection, identification and characterization [9, 13]. For instances, traditional microscopy allows high-resolution images, and thus detailed inspection on individual phytoplankton, at the expense of imaging throughput [14]. On the other hand, standard flow-cytometry provides high-throughput single-cell interrogation (up to 100,000 cells/s) which is by no means sufficient for taxonomic classification because of the absence of imaging capability. Such predicament is partially alleviated by incorporating image sensor in the flow cytometers [15–21]. However, the imaging throughput has to be compromised (~1,000’s cells/s) by the inherent speed limitation of the image sensors. To this end, optical time-stretch imaging is ideal for addressing the unmet need for high-throughput imaging flow cytometry of phytoplankton [13]. To fully exploit this potential of time-stretch imaging, as discussed earlier, the image resolution has to be sufficient for automated multi-class taxonomic classification without sacrificing the imaging speed – the central motivation of this work. In addition to automated multi-class classification, we also exploit other feature domains, e.g. based on Fourier analysis, to identify the heterogeneity within a single population, i.e. sub-population, of phytoplankton to exemplify the necessity of high-quality time-stretch imaging.

2. Principle of microfluidic time-stretch imaging

Optical time-stretch imaging flow cytometer developed in this work is an integrated system consisting of a customized microscope and a microfluidic system as in Fig. 1. The configuration is similar to the configuration demonstrated in [3] but we perform wavelength-time mapping before space-wavelength mapping instead. In the system, a broadband laser pulsed beam from a home-built fiber mode-locked laser [22] (center wavelength = 1064nm, bandwidth = 10nm, repetition rate = 11 MHz) is launched into a 10-km-long single-mode fiber, within which the pulses are time-stretched with a high GVD of 0.38 ns/nm, i.e. the single-shot spectra is transformed into the time waveforms. An in-line home-build optical amplifier module based on ytterbium-doped fibers is employed to provide an optical power on-off gain as high as 30 dB in order to compensate the dispersion loss. The time-stretched pulsed beam is then coupled into the microscope through a diffraction grating (groove density of 1200 lines/mm). It spatially disperses the beam to be a one-dimensional (1D) spectral shower, i.e. line-scan illumination, which is focused by a high-numerical-aperture (NA) objective lens (NA = 0.75) onto the microfluidic channel. Individual phytoplankton flows at a high speed of ~2 m/s along the channel orthogonal to the 1D line-scan beam (scanning at the same rate as laser repetition rate, i.e. 11 MHz). Made of polydimethylsiloxane (PDMS), the microfluidic channel is custom-designed with an asymmetric curved channel for generating the inertial flow focusing effect such that the cells can flow in single focused stream in the imaging section (with a dimension of 80 µm × 80µm (Height × Width)) at high-speed [3]. The time-stretch microscope is built in a double-passed configuration, i.e. the spectral shower double-passes two objective lenses, which ensures the space-wavelength mapping is preserved throughout the imaging path and thus the image resolution and quality. As a result, the spatial information of the imaged phytoplankton in each line scan is encoded to the wavelengths of the spectral shower, which is detected as a serial temporal waveform by a high-bandwidth single-pixel photodetector and a real-time oscilloscope (bandwidth = 16GHz, sampling rate = 80GS/s) for subsequent image reconstruction and off-line multi-class classification by SVM.

 figure: Fig. 1

Fig. 1 Schematic of an optofluidic time-stretch imaging system for imaging flow cytometry of phytoplankton. BPL: broadband pulsed laser; SMF: 10 km single mode fiber operating at 1060nm; YDFA: homemade Yitterbium-doped fiber amplifier; FC: fiber collimator; BP: beamsplitter; DG: diffraction grating of 1200 grooves/mm; OBJ1: 0.75NA objective lens; OBJ2: 0.8NA objective lens; M1, M2: mirrors; PD: 12GHz single-pixel photodetector; OSC: oscilloscope with a bandwidth of 16GHz and a sampling rate of 80GSa/s. The top inset shows the pulse shapes in the wavelength and time domains at different stages (1, 2, and 3), from wavelength-time mapping to space-wavelength mapping.

Download Full Size | PDF

In principle, time-stretch can be performed either before or after spectral encoding [23]. Nevertheless, there are two key differences between the two approaches: (1) the latter performs single-shot illumination, in which the exposure time is estimated from the time-bandwidth product of each spectral-resolvable sub-pulse (typically ~10’s ps). The former functions as an all-optical laser beam-scanner, in which the exposure time is determined by the temporal width of stretched waveform (~10’s ns). (2) Input intensity to the time-stretch module in the two configurations could differ by several orders of magnitude. It implies that different amplifier designs are required to optimize the SNR performance. With the aim to maximize the SNR (critical for accurate image classification) whereas minimize the photodamage effect due to excessive illumination power, we configured the time-stretch module prior to spectral-encoding. In this configuration, the input power to the time-stretch process is higher and thus results in better noise figure in amplification, compared to the previous work which implemented time-stretch after spectral-encoding [3]. In this case, the illumination power per resolvable point (~10mW) creates negligible photodamage effect to living cells at the wavelength regime of 1 μm.

When time-stretch is performed after spectral encoding, the exposure time of each resolvable spectral sub-pulse is determined by the time-bandwidth product of a transform-limited pulse, i.e. 4.4 ps. In contrast, when the time-stretch process is performed before spectral encoding, the exposure time of each resolvable spectral sub-pulse could be estimated with the product of GVD and the bandwidth of each sub-pulse δλ. Consider the bandwidth of each sup-pulse of 0.23nm and the GVD of 0.38 ns/nm, the exposure time of each point is now ~87.4 ps. Operating in a laser line-scanning mode, the effective exposure time of one line-scan is 3.8 ns, given the total bandwidth of 10nm. Despite the longer exposure time compared to [3], it essentially creates no motion-blur in the ultrafast flow at a speed of ~1 m/s, i.e. the motion blur (~4 nm) is far smaller than the image resolution (~1.59 µm) in our case. Note that as phytoplankton in general exhibit higher contrast than the largely transparent mammalian cells, contrast-enhancement techniques, e.g. phase-gradient contrast [3], are not necessary in this work. The time-stretch imaging system presented here is thus operated in the bright-field mode.

3. Imaging performance

Our system is capable of imaging cells with sufficiently high resolution and high contrast for resolving cellular structures. The image resolution of time-stretch imaging is generally governed by three limiting regimes [24]: (i) the spatial-dispersion limited resolution, which is governed by the spectral resolution of the diffraction grating δxspatial, (ii) the stationary-phase approximation limit, which is defined for the wavelength-time mapping process δxSPA, and (iii) the photodetector-limited resolution δxdet. In the current system, the resultant resolution values limited by these three regimes are evaluated as δxspatial ≈1.59 μm, δxSPA ≈0.89 μm and δxdet ≈0.54 μm. Therefore, the resolution along the line-scan is currently limited by the spatial-dispersion resolution, i.e. 1.59 µm. On the other hand, the image resolution along the flow direction could be evaluated as δy ≈1.33 μm, following the analysis in ref [3]. We employed the optical time-stretch imaging flow cytometer to capture a total of more than 10,000 images of the cultured phytoplankton (14 species) (Carolina Biological Supply), as highlighted in Fig. 2. It is shown that our optical system is capable of revealing a great variety of size, shape and morphology of algae cells, for instance, the floatation spines of Scenedesmus sp. and the helical structure of Spirulina Major. More significantly, the characteristic intracellular textures of the phytoplankton such as the vacuoles, pyrenoids and chloroplast are clearly visualized as highlighted in Fig. 2. Note again that single-cell imaging with this level of resolution and contrast is particularly necessary image-based classification which will be discussed in the next section.

 figure: Fig. 2

Fig. 2 (a)Selected time-stretch images of 14 species of microalgae in the genus level (flow rate = 2m/s, line-scan rate = 11 MHz) . The scale bar is in 20 µm for all images. A: Synura; B: Thalassiosira; C: Scenedesmus; D: Selenastrum capricornutum; E: Chorella; F: Gymnodium; G: Gymnodinium; H: Prorocentrum; I: Euglena; J: Lyngbya; K: Spirulina Major; L: Merismopedia; M: Chaetoceros gracilis; N: Navicula. (b) Cropped and magnified section (dashed red and green box) showing the sub-cellular structures of the Gymnodium (F) and Euglena (I), and a cross-sectional profile of Spirulina Major (K).

Download Full Size | PDF

4. Image processing and feature extraction

Subsequent to image reconstruction are image segmentation and feature extraction for phytoplankton classification. In the image segmentation, each grayscale image is firstly transformed into a local entropy map (B in Fig. 3) and then to a binary image by thresholding with the erosion (C in Fig. 3) and the hole filling (D in Fig. 3) operations. Geometrical parameters, e.g. size and equivalent spherical diameter of the segmented binary mask are extracted and used for determining debris/fragments that have the diameters < 1 µm or one-fourth of the mean size of the imaged population of the phytoplankton and > 90% of the field-of-view (FOV) of the image. We further perform manual screening to opt out the residual debris. These non-phytoplankton events are pre-filtered for subsequent machine learning.

 figure: Fig. 3

Fig. 3 (a) Flow chart of the image processing and classification pipeline. (b) Selected images of the intermediate stage including (A) the original gray-scale image, (B) the local entropy map, (C) the blob image and (D) the binary mask.

Download Full Size | PDF

For the purpose of feature extraction, geometrical features of the phytoplankton, such as area, perimeter, major and minor axes and thus circularity and elongation factor (See Table 1 and Table A1 in Appendix for detailed description), are extracted from the binary masks. In addition, seven image (pixel intensities) moment invariants (or called Hu’s moments), which are scale-, translation-, and rotation-invariants [25] are calculated from both the binary mask and the blob image at the intermediate stage (C) (see Fig. 3) – resulting in 14 feature parameters. Texture information of each image is retrieved by analyzing both the 1D (pixel-intensity) histogram and the gray-level co-occurrence matrix (GLCM) which are obtained from the gray-scale image but only considering the pixels inside the smallest bounding box. The 1D probability density function (PDF) of gray-scale segmented image pixel intensities gives rise to five image properties: grey-level standard deviation, skewness, entropy, energy and module. Compared to 1D histogram analysis, GLCM, is a 2D statistical measure of the pixel intensities from which texture information can be characterized [26]. GLCM is constructed by book-keeping how often a pixel-pair with specific values occur in an image (64 gray-levels in our case). The pixel separation/offset in each pair is set to be three pixels, corresponding to a half of the diffraction-limited resolution, in four directions (vertical: 0°, horizontal: 90°, diagonal: 45° and 135°). The statistics – mean and the dynamic range of each image property: contrast, correlation, energy, homogeneity and entropy, are calculated from the GLCMs along four directions - leading to a vector quantity of 10 features. We further extract the information about the granularity of the cells derived from the normalized local entropy filtered image, following the same 1D histogram and GLCM analysis. It results in 11 more features: six image properties from the 1D histogram with mean gray level intensity in addition, and five mean values of image properties from the four directions of the GLCM. In total, each segmented cell generates up to 44 features including 18 features describing its geometrics and shape, and 26 features describing its morphology. Detail descriptions of the extracted features are referred in Table 1 and Table 3 in Appendix.

Tables Icon

Table 1. Summary of the Extracted and Selected Features, as well as the Corresponding Image Processing Steps Involved for Feature Extraction

5. Feature selection

A total of 44 image features, each of which has a normalized score/value ranging from 0 to 1, are extracted (see Table 1) to represent the geometries, morphologies and textures of the cell images. Prior to supervised learning for multi-class classification, feature ranking and selection are conducted to avoid duplicate and irrelevant features input to the machine learning model that could cause misclassification and result in extended computation time for training. In this work, we employ the bagging (or bootstrap aggregation) approach based on the random forest algorithm to evaluate and rank the importance of the features [27,28]. In brief, a random forest is a large collection of decision trees (500 trees in this work), each of which is trained (or “grown”) with a random subset of the images (2/3 of the total image count), called a bootstrap sample. For each grown tree, the error rate of phytoplankton prediction/observation left out of the bootstrap samples, termed out-of-bag (OOB) samples (1/3 of the total image count), is recorded. Next, we perform a random permutation of the values of the kth image feature across the OOB samples which are fed to the tree again for measuring a new prediction error rate. The average increase in the error rate across all trees is used as a measure of ranking the importance kth image feature in the random forest. The key feature of bagging in random forest is the ability to take the ensemble average of all decorrelated trees, and thus reduce the variance – favoring a robust and unbiased feature selection process [27,28]. Our result of feature ranking shows that the low-resolution features, e.g. geometrical parameters (highlighted in yellow in Fig. 4(a)), have a comparably lower importance. In contrast, the texture-rich features (highlighted in blue in Fig. 4(b)), which can only be retrieved with high-resolution imaging, generally dominate the high-rank positions in the feature importance. For feature selection, we then implement multi-class SVM with a 10-fold cross-validation and evaluate the classification accuracy against the numbers of the selected features according to the descending ranking order, i.e. starting from the highest importance feature to the lowest one. We note that the increase in classification accuracy generally slows down when more than 16 features are selected and reaches to its maximum when the most important 26 features are chosen (Fig. 4(b)).

 figure: Fig. 4

Fig. 4 (a) A bar plot of feature importance evaluated by the bagging (or bootstrap aggregation) approach based on the random forest algorithm. For each feature (represented by a bar with an assigned number (1-44) as explained in Table 3 in Appendix), the feature importance value is the increase in mean squared error (MSE) averaged over all 500 classification trees in the forest, which is divided by the standard deviation taken over the trees, for each feature [27,28]. The low resolution features are highlighted as yellow bars in the plot – most of which show low importance for classification. (b) 10-fold cross-validated accuracy of the multi-class SVM against the numbers of the selected features according to the descending ranking order, i.e. starting from the highest importance feature to the lowest one.

Download Full Size | PDF

6. Multi-class SVM classification of phytoplankton

The multi-class SVM classification is implemented with the one-versus-one coding approach [29]. Specifically, we employ the error-correcting output codes (ECOC) multiclass model with an one-vs-one coding matrix [29–31], which constructs N = n(n-1)/2 (n = 14 is the number of classes, i.e. N = 91) binary SVM classifiers. The approach of one-vs-one ECOC is chosen because of its high accuracy at the expense of marginal increase in the computation time due to the large number of binary classifiers [29]. Each binary classifier is trained with two groups of data coded as “+1” and “-1”, and disregard the remaining data as a code”0”. It then generates an output code containing sign “+” as one class or “-” as another class and a value representing the level of confidence to compare with the labeled code [31]. In the case of three-class classification, implementing one-vs-one strategy results in 3 binary SVM classifier (represented by each column in Table 2), each of which is then assigned with a different code words.

Tables Icon

Table 2. Assigned Code Words in the One-Vs-One Strategy

By implementing the loss-weighted decoding scheme with the hinge loss function, the image is classified as the closest matching case [32], i.e. finding the smallest difference between the output generated code and the originally assigned code. The nature of SVM is to find the best separating hyperplane in the high dimensional space within a certain margin represented by the cost function. Due to the nonlinear relationship between the features and the class label, each SVM binary classifier firstly transforms the data into higher dimensional space by using the kernel function,

κ(x,y)=eγ||xy||2,
which is the radial basis function in this case and then finds the hyperplane. The multiclass SVM model therefore is optimized with cross-validation by grid-searching the two important parameters - kernel scale γ in the kernel function and the cost function C to avoid overfitting [30]. To measure the performance of the optimized SVM model, 10-fold cross-validation is implemented to compute the classification accuracy of each species. It involves three steps: (1) It randomly partitions the data set into 10 subsets; (2) the SVM model is then trained with 9 subsets, followed by validation with the remaining 1 subset for obtaining the classification accuracy; and (3) step (2) is repeated for 10 rounds to evaluate the averaged accuracy. The key results of the multi-class one-vs-one SVM classification of the phytoplankton images are shown in Fig. 5.

 figure: Fig. 5

Fig. 5 (a) The confusion matrix, and (b) the accuracy of each species of the classification result of time-stretch captured phytoplankton images based on the optimized SVM model. (c) Comparison of the accuracy of the multi-class SVM classification by using low-resolution (i.e. geometrics and shape features as highlighted in yellow in Fig. 4(a)) and the selected 26 features. The labels refer to 1:Selenastrum capricornutum; 2:Chlamydomonas; 3:Scenedesmus; 4:Chaetoceros gracilis; 5:Gymnodinium; 6:Navicula; 7:Prorocentrum; 8:Thalassiosira; 9: Merismopedia; 10: Spirulina major; 11:Chlorella; 12: Euglena; 13: Synura; 14: Lyngbya.

Download Full Size | PDF

In general, the averaged 10-fold cross-validated classification accuracy across all 14 species is 94.7% whereas the accuracies of individual species are all well above 80% (Fig. 5(a) – (b)). It is noteworthy that classification based on only the low-resolution features primarily extracted from the binary mask (i.e. highlighted in yellow in Fig. 4(b)) yields less than 60% for most of the algal species. It clearly demonstrates the significance of high-resolution time-stretch imaging for an accurate and precise multi-class classification with a high-throughput measurement capability. There are several species shows more than 5% of misclassification and are confused with the other species, notably, Navicula and Chaetoceros gracilis, and Prorocentrum and Thalassiosira (Fig. 5(a)-5(b)). Primarily, the misclassification may be attributed to the small cell sizes of the species, e.g. Navicula and Chaetoceros gracilis (averaged diameter of < 5µm), that results in insufficient resolvable image pixel number (~3 diffraction-limited resolvable pixels) to reveal the rich morphological features. Moreover, some species have similar texture heterogeneities that render the present descriptors of geometrics and morphologies not sensitive enough to classify among them at high accuracies. Nevertheless, this problem could be mitigated by exploiting texture/feature information in the other forms or domains (apart from the techniques described in this work, i.e. Table 1 and Fig. 4), namely simple Fourier analysis, wavelet texture analysis (WTA) [33], direct multivariate image analysis (MIA) [34], or even convolutional neural network [35]. This could be significant for deeper multi-class classification, not only on the species or the genus level, but also on the sub-type level within the same class of population and thus allows higher classification accuracy and specificity.

To exemplify this idea, we access the time-stretch images of the classified Scenedesmus and identify another small population (~24%) showing hollow gelatinous structures. This is in stark contrast to the majority of the population, which is rich in texture (Fig. 6). However, these two sub-populations are obscured by the present SVM model despite the high classification accuracy of Scenedesmus (95.3%). This implies that additional features are required to identify these sub-populations. To this end, we explore the spatial frequency domain of the image, i.e. to perform the 2D Fourier analysis of individual Scenedesmus images. By extracting the averaged values within three annular rings of the 2D power spectrum and using them as the new classifiers, the two dense and hollow sub-types are well distinguishable (Fig. 6). We anticipate that deeper learning could be achieved by further exploiting various feature domains, as mentioned in [33–36].

 figure: Fig. 6

Fig. 6 Scatter plot for sub-type classification of the dense (blue) and hollow (red) types of Scenedesmus based on the averaged power values of the three annular-ring bands in the 2D Fourier transformation (FT) of the images (highlighted as yellow circles in the power spectra; also indicated as 1,2,3 at the left top corner of the scatter plot). Selected images and the corresponding 2D Fourier transformation for the dense type and the hollow gelatinous type of Scenedesmus are also shown.

Download Full Size | PDF

7. Concluding remarks

To fully exploit its capability of ultrafast single-cell imaging for practical high-throughput imaging flow cytometry, we have demonstrated an optofluidic time-stretch imaging system that achieves high image resolution and contrast for visualizing sub-cellular complexity of single cells – a critical feature allowing accurate multi-class classification and analysis for revealing the heterogeneity across different and even within same population of cells. Specifically, the system was tested to classify 14 different species of phytoplankton based on multi-class SVM with a high accuracy of 94.7% and at an imaging throughput beyond 10,000 cells/sec.

The current image-based classification is entirely based on bright-field image contrast. Other enhanced image contrasts can also be accessed and can facilitate better classification, such as differential interference contrast [37], phase-gradient contrast [3] and quantitative phase contrast [38, 39]. Notably, quantitative phase imaging has been proven to quantify protein content as well as biophysical properties of the cells [38–40]. Such additional parameters can also be extracted for conducting some single-cell analysis on intracellular biochemical content, i.e. the content of lipid and improving the classification accuracy. This is proven by a recent work which demonstrated interferometry-based quantitative phase time-stretch imaging for binary classification of cells using a neural network algorithm [6]. When the high-resolution capability is enabled in quantitative phase time-stretch imaging, deep neural network for multi-class classification could readily be feasible. In addition, recent effort also showed that time-stretch imaging can be made even more versatile by incorporating the fluorescence detection capability. As a result, not only the biophysical and morphological information, but also the biomolecular signatures e.g. chlorophyll or other fluorescent markers in the phytoplankton, of the single cell can be extracted at high-throughput [41] – further enriching the collection of features to be used for classification.

The current work could find an immediate application in advancing environmental and toxicological studies of phytoplankton, which are intertwined with the algal bloom in the local scale as well as the climate change in the global scale. In contrast to the traditional methods of microalgae enumeration, identification and sizing, which are laborious, time-consuming and prone to result in an erroneous classification, time-stretch imaging flow cytometry could favor automated real-time monitoring and detection of HAB (or red tide) [11] based on deep learning of an enormous amount of single phytoplankton images. Moreover, it can also be utilized for large-scale screening and characterization of naturally occurring high-lipid microalgal species that is a critical step for design and optimization of biofuel production strategies [12].

Appendix

Tables Icon

Table 3. Summary of Features Extracted from each Time-stretch Image

Remark

  • 1. N × N is the dimensions of the time-stretch images and In(i,j) is the function describing the object label map. x and y are the spatial coordinates in the image.
  • 2. Probability density function is defined as
    pdf=P(g)=h(g)/M

    where h(g) is the number of pixels with grey-level of g ranged from 0 to L-1 and M is the total number of pixel inside the mask.

  • 3. Grey-level co-occurrence matrix (GLCM) is defined as
    MCM=x=1Ny=1N{0,otherwise1ifI(x,y)=iandI(x+dy,y+dy)=j

    where x,y are the spatial coordinates in the image, dx is the specific offset distance and angle between the pixel pair and g is the grey-level chosen to computed the GLCM ranged from 0 to L-1.

    µ is the mean value that

    μx=i=0L1ij=0L1MCM(i,j)andμy=i=0L1j=0L1jMCM(i,j)

    and σ is the standard deviation that

    σx=i=0L1(iμx)2j=0L1MCM(i,j)andσy=j=0L1(iμy)2i=0L1MCM(i,j)

    where i,j is the coordinates in the GLCM.

Funding

Research Grants Council of the Hong Kong Special Administrative Region of China (HKU 17208414, HKU 7172/12E, HKU 717911E, HKU 720112E); the University Development Funds of HKU.

Acknowledgment

We would like to thank Dr. Jianglai Wu for the fruitful discussion on phytoplankton culture and Dr. Andy K.S. Lau for discussion on the instrumentation. We also like to thank Mr. Richard W.W. Yan and Mr. Bob M.F. Chung for designing and fabricating the custom-designed microfluidic channel.

Reference and links

1. K. Goda and B. Jalali, “Dispersive Fourier transformation for fast continuous single-shot measurements,” Nat. Photonics 7(2), 102–112 (2013). [CrossRef]  

2. K. Goda, A. Ayazi, D. R. Gossett, J. Sadasivam, C. K. Lonappan, E. Sollier, A. M. Fard, S. C. Hur, J. Adam, C. Murray, C. Wang, N. Brackbill, D. Di Carlo, and B. Jalali, “High-throughput single-microparticle imaging flow analyzer,” Proc. Natl. Acad. Sci. U.S.A. 109(29), 11630–11635 (2012). [CrossRef]   [PubMed]  

3. T. T. Wong, A. K. Lau, K. K. Ho, M. Y. Tang, J. D. Robles, X. Wei, A. C. Chan, A. H. Tang, E. Y. Lam, K. K. Wong, G. C. Chan, H. C. Shum, and K. K. Tsia, “Asymmetric-detection time-stretch optical microscopy (ATOM) for ultrafast high-contrast cellular imaging in flow,” Sci. Rep. 4, 3656 (2014). [CrossRef]   [PubMed]  

4. C. Lei, B. Guo, Z. Cheng, and K. Goda, “Optical time-stretch imaging: Principles and applications,” Appl. Phys. Rev. 3(1), 011102 (2016). [CrossRef]  

5. A. K. Lau, H. C. Shum, K. K. Wong, and K. K. Tsia, “Optofluidic time-stretch imaging - an emerging tool for high-throughput imaging flow cytometry,” Lab Chip 16(10), 1743–1756 (2016). [CrossRef]   [PubMed]  

6. C. L. Chen, A. Mahjoubfar, L.-C. Tai, I. K. Blaby, A. Huang, K. R. Niazi, and B. Jalali, “Deep Learning in Label-free Cell Classification,” Sci. Rep. 6, 21471 (2016). [CrossRef]   [PubMed]  

7. L. Graham, J. Graham, and L. Wilcox, Algae (Benjamin Cummings (Pearson), 2009).

8. D. Søballe and B. Kimmel, “A large-scale comparison of factors influencing phytoplankton abundance in rivers, lakes, and impoundments,” Ecology 68(6), 1943–1954 (1987). [CrossRef]  

9. X. Irigoien, J. Huisman, and R. P. Harris, “Global biodiversity patterns of marine phytoplankton and zooplankton,” Nature 429(6994), 863–867 (2004). [CrossRef]   [PubMed]  

10. Z. V. Finkel, J. Beardall, K. J. Flynn, A. Quigg, T. A. V. Rees, and J. A. Raven, “Phytoplankton in a changing world: cell size and elemental stoichiometry,” J. Plankton Res. 32(1), 119–137 (2010). [CrossRef]  

11. M. Babin, J. C. Cullen, C. S. Roesler, P. L. Donaghay, G. J. Doucette, M. Kahru, M. R. Lewis, C. A. Scholin, M. E. Sieracki, and H. M. Sosik, “New approaches and technologies for observing harmful algal blooms,” Oceanography (Wash. D.C.) 18(2), 210–227 (2005). [CrossRef]  

12. T. M. Mata, A. A. Martins, and N. S. Caetano, “Microalgae for biodiesel production and other applications: a review,” Renew. Sustain. Energy Rev. 14(1), 217–232 (2010). [CrossRef]  

13. M. Benfield, P. Grosjean, P. Culverhouse, X. Irigolen, M. Sieracki, A. Lopez-Urrutia, H. Dam, Q. Hu, C. Davis, A. Hanson, C. Pilskaln, E. Riseman, H. Schulz, P. Utgoff, and G. Gorsky, “RAPID: research on automated plankton identification,” Oceanography (Wash. D.C.) 20(2), 172–187 (2007). [CrossRef]  

14. J. Lund, C. Kipling, and E. Le Cren, “The inverted microscope method of estimating algal numbers and the statistical basis of estimations by counting,” Hydrobiologia 11(2), 143–170 (1958). [CrossRef]  

15. N. S. Barteneva and I. A. Vorobjev, Imaging Flow Cytometry (Springer New York, 2016)

16. H. M. Sosik and R. J. Olson, “Automated taxonomic classification of phytoplankton sampled with imaging-in-flow cytometry,” Limnol. Oceanogr. Methods 5(6), e216 (2007). [CrossRef]  

17. R. J. Olson and H. M. Sosik, “A submersible imaging-in-flow instrument to analyze nano-and microplankton: Imaging FlowCytobot,” Limnol. Oceanogr. Methods 5(6), 195–203 (2007). [CrossRef]  

18. J. Wu, J. Li, and R. K. Chan, “A light sheet based high throughput 3D-imaging flow cytometer for phytoplankton analysis,” Opt. Express 21(12), 14474–14480 (2013). [CrossRef]   [PubMed]  

19. E. J. Buskey and C. J. Hyatt, “Use of the FlowCAM for semi-automated recognition and enumeration of red tide cells (Karenia brevis) in natural plankton samples,” Harmful Algae 5(6), 685–692 (2006). [CrossRef]  

20. N. J. Poulton, “FlowCam: Quantification and Classification of Phytoplankton by Imaging Flow Cytometry,” in Imaging Flow Cytometry: Methods and Protocols (Springer, 2016).

21. K. Rodenacker, B. Hense, U. Jütting, and P. Gais, “Automatic analysis of aqueous specimens for phytoplankton structure recognition and population estimation,” Microsc. Res. Tech. 69(9), 708–720 (2006). [CrossRef]   [PubMed]  

22. X. Wei, A. K. Lau, T. T. Wong, C. Zhang, K. M. Tsia, and K. K. Wong, “Coherent laser source for high frame-rate optical time-stretch microscopy at 1.0 μm,” IEEE J. Sel. Top. Quantum Electron. 20(5), 384–389 (2014). [CrossRef]  

23. A. K. Lau, A. H. Tang, J. Xu, X. Wei, K. K. Wong, and K. K. Tsia, “Optical Time Stretch for High-Speed and High-Throughput Imaging—From Single-Cell to Tissue-Wide Scales,” IEEE J. Sel. Top. Quantum Electron. 22(4), 1–15 (2016). [CrossRef]  

24. K. K. Tsia, K. Goda, D. Capewell, and B. Jalali, “Performance of serial time-encoded amplified microscope,” Opt. Express 18(10), 10016–10028 (2010). [CrossRef]   [PubMed]  

25. M.-K. Hu, “Visual pattern recognition by moment invariants,” IRE Trans. Inf. Theory 8(2), 179–187 (1962). [CrossRef]  

26. R. M. Haralick, K. Shanmugam, and I. H. Dinstein, “Textural features for image classification,” IEEE Trans. Syst. Man Cybern. 6(6), 610–621 (1973). [CrossRef]  

27. L. Breiman, “Random forests,” Mach. Learn. 45(1), 5–32 (2001). [CrossRef]  

28. I. Guyon, M. Nikravsh, S. Gunn, and L. A. Zadeh, Feature Extraction (Springer, 2006), Chap. 12.

29. C.-W. Hsu and C.-J. Lin, “A comparison of methods for multiclass support vector machines,” IEEE Trans. Neural Netw. 13(2), 415–425 (2002). [CrossRef]   [PubMed]  

30. N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods (Cambridge University Press, 2000)

31. E. L. Allwein, R. E. Schapire, and Y. Singer, “Reducing multiclass to binary: A unifying approach for margin classifiers,” J. Mach. Learn. Res. 1, 113–141 (2000).

32. S. Escalera, O. Pujol, and P. Radeva, “On the decoding process in ternary error-correcting output codes,” IEEE Trans. Pattern Anal. Mach. Intell. 32(1), 120–134 (2010). [CrossRef]   [PubMed]  

33. A. Laine and J. Fan, “Texture classification by wavelet packet signatures,” IEEE Trans. Pattern Anal. Mach. Intell. 15(11), 1186–1191 (1993). [CrossRef]  

34. M. Bharati and J. MacGregor, “Multivariate image analysis for real-time process monitoring and control,” Ind. Eng. Chem. Res. 37(12), 4715–4724 (1998). [CrossRef]  

35. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems25, (NIPS, 2012), pp.1097–1105.

36. T. Chang and C. J. Kuo, “Texture analysis and classification with tree-structured wavelet transform,” IEEE Trans. Image Process. 2(4), 429–441 (1993). [CrossRef]   [PubMed]  

37. A. M. Fard, A. Mahjoubfar, K. Goda, D. R. Gossett, D. Di Carlo, and B. Jalali, “Nomarski serial time-encoded amplified microscopy for high-speed contrast-enhanced imaging of transparent media,” Biomed. Opt. Express 2(12), 3387–3392 (2011). [CrossRef]   [PubMed]  

38. A. K. Lau, T. T. Wong, K. K. Ho, M. T. Tang, A. C. Chan, X. Wei, E. Y. Lam, H. C. Shum, K. K. Wong, and K. K. Tsia, “Interferometric time-stretch microscopy for ultrafast quantitative cellular and tissue imaging at 1 μm,” J. Biomed. Opt. 19(7), 076001 (2014). [CrossRef]   [PubMed]  

39. A. Mahjoubfar, C. Chen, K. R. Niazi, S. Rabizadeh, and B. Jalali, “Label-free high-throughput cell screening in flow,” Biomed. Opt. Express 4(9), 1618–1625 (2013). [CrossRef]   [PubMed]  

40. G. Popescu, Quantitative Phase Imaging of Cells and Tissues (McGraw Hill Professional, 2011).

41. M. Ugawa, C. Lei, T. Nozawa, T. Ideguchi, D. D. Carlo, S. Ota, Y. Ozeki, and K. Goda, “High-throughput optofluidic particle profiling with morphological and chemical specificity,” Opt. Lett. 40(20), 4803–4806 (2015). [CrossRef]   [PubMed]  

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (6)

Fig. 1
Fig. 1 Schematic of an optofluidic time-stretch imaging system for imaging flow cytometry of phytoplankton. BPL: broadband pulsed laser; SMF: 10 km single mode fiber operating at 1060nm; YDFA: homemade Yitterbium-doped fiber amplifier; FC: fiber collimator; BP: beamsplitter; DG: diffraction grating of 1200 grooves/mm; OBJ1: 0.75NA objective lens; OBJ2: 0.8NA objective lens; M1, M2: mirrors; PD: 12GHz single-pixel photodetector; OSC: oscilloscope with a bandwidth of 16GHz and a sampling rate of 80GSa/s. The top inset shows the pulse shapes in the wavelength and time domains at different stages (1, 2, and 3), from wavelength-time mapping to space-wavelength mapping.
Fig. 2
Fig. 2 (a)Selected time-stretch images of 14 species of microalgae in the genus level (flow rate = 2m/s, line-scan rate = 11 MHz) . The scale bar is in 20 µm for all images. A: Synura; B: Thalassiosira; C: Scenedesmus; D: Selenastrum capricornutum; E: Chorella; F: Gymnodium; G: Gymnodinium; H: Prorocentrum; I: Euglena; J: Lyngbya; K: Spirulina Major; L: Merismopedia; M: Chaetoceros gracilis; N: Navicula. (b) Cropped and magnified section (dashed red and green box) showing the sub-cellular structures of the Gymnodium (F) and Euglena (I), and a cross-sectional profile of Spirulina Major (K).
Fig. 3
Fig. 3 (a) Flow chart of the image processing and classification pipeline. (b) Selected images of the intermediate stage including (A) the original gray-scale image, (B) the local entropy map, (C) the blob image and (D) the binary mask.
Fig. 4
Fig. 4 (a) A bar plot of feature importance evaluated by the bagging (or bootstrap aggregation) approach based on the random forest algorithm. For each feature (represented by a bar with an assigned number (1-44) as explained in Table 3 in Appendix), the feature importance value is the increase in mean squared error (MSE) averaged over all 500 classification trees in the forest, which is divided by the standard deviation taken over the trees, for each feature [27,28]. The low resolution features are highlighted as yellow bars in the plot – most of which show low importance for classification. (b) 10-fold cross-validated accuracy of the multi-class SVM against the numbers of the selected features according to the descending ranking order, i.e. starting from the highest importance feature to the lowest one.
Fig. 5
Fig. 5 (a) The confusion matrix, and (b) the accuracy of each species of the classification result of time-stretch captured phytoplankton images based on the optimized SVM model. (c) Comparison of the accuracy of the multi-class SVM classification by using low-resolution (i.e. geometrics and shape features as highlighted in yellow in Fig. 4(a)) and the selected 26 features. The labels refer to 1:Selenastrum capricornutum; 2:Chlamydomonas; 3:Scenedesmus; 4:Chaetoceros gracilis; 5:Gymnodinium; 6:Navicula; 7:Prorocentrum; 8:Thalassiosira; 9: Merismopedia; 10: Spirulina major; 11:Chlorella; 12: Euglena; 13: Synura; 14: Lyngbya.
Fig. 6
Fig. 6 Scatter plot for sub-type classification of the dense (blue) and hollow (red) types of Scenedesmus based on the averaged power values of the three annular-ring bands in the 2D Fourier transformation (FT) of the images (highlighted as yellow circles in the power spectra; also indicated as 1,2,3 at the left top corner of the scatter plot). Selected images and the corresponding 2D Fourier transformation for the dense type and the hollow gelatinous type of Scenedesmus are also shown.

Tables (3)

Tables Icon

Table 1 Summary of the Extracted and Selected Features, as well as the Corresponding Image Processing Steps Involved for Feature Extraction

Tables Icon

Table 2 Assigned Code Words in the One-Vs-One Strategy

Tables Icon

Table 3 Summary of Features Extracted from each Time-stretch Image

Equations (5)

Equations on this page are rendered with MathJax. Learn more.

κ ( x , y ) = e γ | | x y | | 2 ,
p d f = P ( g ) = h ( g ) / M
M C M = x = 1 N y = 1 N { 0 , o t h e r w i s e 1 i f I ( x , y ) = i a n d I ( x + d y , y + d y ) = j
μ x = i = 0 L 1 i j = 0 L 1 M C M ( i , j ) and μ y = i = 0 L 1 j = 0 L 1 j M C M ( i , j )
σ x = i = 0 L 1 ( i μ x ) 2 j = 0 L 1 M C M ( i , j ) and σ y = j = 0 L 1 ( i μ y ) 2 i = 0 L 1 M C M ( i , j )
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.