## Abstract

Quantification of protein abundance and subcellular localization dynamics from fluorescence microscopy images is of high contemporary interest in cell and molecular biology. For large-scale studies of cell populations and for time-lapse studies, such quantitative analysis can not be performed effectively without some kind of automated image analysis tool. Here, we present fast algorithms for automatic cell contour recognition in bright field images, optimized to the model organism budding yeast (*Saccharomyces cerevisiae*). The cell contours can be used to effectively quantify cell morphology parameters as well as protein abundance and subcellular localization from overlaid fluorescence data.

© 2008 Optical Society of America

## 1. Introduction

Fluorescence microscopy is an exceptionally powerful technique to study protein processes *in vivo*. However, due to the qualitative and subjective nature of human interpretation of images, quantitative data analysis is not straightforward. In addition, for statistical or high throughput measurements, manual analysis becomes highly time consuming. Automated image analysis opens up for advanced studies in the diversity of cell populations, i.e. characterization of phenotype heterogeneity, or single cell studies *in vivo*. This type of studies have recently attracted significant attention, and have provided key insights in for example, gene expression noise [1, 2] and cell cycle regulation [3].

In this report, novel algorithms designed for automatic analysis of bright field and fluorescence images of budding yeast (*Saccharomyces cerevisiae*) are presented. Budding yeast is one of the most important model organisms for cell and molecular biology [4, 5], with advantages of being easy to handle and modify genetically. In this context, it also benefits from the library of chromosomally tagged Green Fluorescence Protein (GFP) fusion proteins that became publicly available in 2003 [6]. The main principle behind our approach is to use bright field images for cell recognition and then transfer the extracted contours to the corresponding fluorescence images for protein analysis. Hence, our method does not rely on staining of the cell membrane and it is therefore suitable for *in vivo* studies.

Automatic identification and recognition of cells in microscopy images has been reported before, but rather few authors have addressed the entire process from image to determination of precise cell contours of individual cells. Chen *et al.* [7] used a machine learning approach to automatically find approximate cell boundaries in a graphical model representation. de Carvalho *et al.* [8] used the watershed transform applied on a topology based on a combination of gray-scale images and various hierarchical and geometric properties of the cells. Niemistö [9] also used a watershed approach for separating yeast cells prior to localization of small fluorescent organelles. Rue and Husby [10] successfully used an advanced deformable template model for the near-circular cells of interest in order to identify fluorescently labeled contours of cells even when the borders are in part not visible. The drawback with their scheme is its huge computational complexity and the computations had to be carried out using a Markov chain Monte Carlo (MCMC) algorithm. Mardia *et al.* [11] used a similar approach to detect partially occluded objects. Software for automatic analysis of fluorescence microscopy images of budding yeast has been presented by other groups [12, 13]. Miroshita and co-workers developed an image analysis program for cell morphology characterization of gene-deletion strains [12]. However, their algorithm requires the cell membrane to be stained with a fluorescent dye. The open-source software Cell-ID [13], on the other hand, uses bright field images for cell recognition. The actual image analysis methods used have not yet been published.

We here present algorithms for the entire process of image analysis, from thresholding and segmentation of images, to refined precise single cell contours. The main novel contribution in this work is the method for refined cell contour extraction. We utilize a shape model similar to the deformable template model used by Rue and Husby [10], with constraints between three consecutive points on the cell contour. Our model, on the other hand, enables computations of much lower computational complexity and, hence, higher speed. Figure 1 illustrates the main steps of our method. The initial steps involve segmentation using a new adaptive thresholding technique, for finding suitable candidate cell centers, see Figs. 1(a)–(c). The final step is the cell contour extraction, where a more refined cell contour is searched for. The final resulting contours are shown in Fig. 1(d). The cell contour extraction algorithm computes the optimal solution in polynomial time, using a dynamic programming scheme. The high speed of the algorithm makes it suitable for studies where large data sets are involved, such as time-lapse or high-throughput studies of cell populations. Our method is shown to be fast and, for the kind of images considered, we found that 96% of the cells were satisfactorily recognized. The algorithms are also found to be robust and adaptive, in the sense that they work without human intervention and are able to identify cells with high precision even in the presence of several neighboring cells, at different illumination levels or if the cells are slightly out of focus.

The paper is organized as follows: first the experimental and biological setup, used for producing the images in this paper, is presented. The image analysis methods are described in the subsequent section which is the main part and focus of the paper. We then evaluate our image analysis method, and exemplify its use in a biological application. Details regarding the cell contour algorithm have been moved to the Appendix.

## 2. Methods

#### 2.1. Image acquisition

Images were collected using an inverted epi-fluorescence microscope (Nikon TE2000E) equipped with a back illuminated, electron multiplying gain CCD camera (Ixon DV887-DCSBV, Andor Technology). Bright field images, used to describe the image analysis method, were recorded using either a 60X objective (Plan Apo, NA=1.40) or a 100X objective (Plan Apo, NA=1.45). Images used in the rest of the paper were recorded with the 60X objective. For the biological study, images were recorded in bright field and fluorescence mode every 10 minutes for 6 hours. To ensure that all nuclei were recorded, the cells were imaged in three image planes with a separation of approximately 2 *µ*m. A mercury lamp was used for fluorescence excitation and a filter set (excitation: HQ470/40x, emission: HQ515/30m, dichroic: 505DCLP, Chroma Corporation, USA) were used for GFP imaging. The fluorescence images were recorded with an integration time of 500 ms using an excitation intensity of approximately 0.1 W/cm^{2} at the sample.

#### 2.2. Strains and cultivation

TheMCM4 strain was obtained from the GFP-library [6]. The cells were kept at agar plates with YPD medium containing 1% (w/v) yeast extract (Merck), 2% (w/v) Bacto-peptone (Merck), and 2% (w/v) glucose (Merck). For microscopy studies, cells were taken from the plate and placed in YNB medium that was composed of 0.17% (w/v) yeast nitrogen base without amino acids (Difco), 0.5% (w/v) ammonium sulfate (Merck), 2% (w/v) glucose and aminoacid supplement lacking histidine (BIO101 Inc.). The pH of all media was adjusted to 6.0. The cells were pre-grown using an orbital shaker at 30°C. At early log phase, 10 *µ*l of the suspension was placed between two cover slides separated by the secure seal spacer S24737 from Molecular Probes, creating an enclosed well. The cells were kept at approximately 20 °C during the experiments.

## 3. Cell contour recognition algorithm

#### 3.1. Adaptive thresholding and segmentation

The segmented image is computed via thresholding on the gradient image *I _{mag}*, which is computed from the original bright field image

*I*. The reasons for using

*I*are twofold. First of all, the distribution of gradient pixel values is uni-modal, which makes it easier to find a suitable threshold, see Fig. 2. Operating on

_{mag}*I*also has the benefit of making the segmentation invariant to differences in the overall illumination level of the image. In practice, this invariance also holds for a slowly varying background illumination level. Regions in the image where cell contours are located typically exhibit a large gradient magnitude. The opposite is true for background regions. This relationship between

_{mag}*I*and

*I*can be seen by comparing Figs. 1(a) and (b).

_{mag}The gradient image is computed using Prewitt’s method [14]. The original bright field image *I* is filtered with two masks, one for each direction, where each mask enhances differences in the corresponding direction. This results in two new images, *I _{x}* and

*I*, each being a measure of the pixelwise variation in the

_{y}*x*and

*y*directions, respectively. After this, we let

*I*=(

_{mag}*I*

^{2}

_{x}+

*I*

^{2}

_{y})

^{1/2}for each pixel location. Pixels with gradient values above a certain threshold

*β*will now be assigned as foreground regions. In Fig. 1(c) we show the thresholded gradient image after filling of holes and running a morphological opening [15].

The crucial part in the segmentation step is to automatically find the threshold *β*. It is particularly important to adapt to the illumination level and the density of cells in the image. Depending on the experimental setup, the preferred threshold level may vary substantially. Adaptive thresholding techniques found in the literature, e.g. Otsu’s nonparametric method [16] or the parametric method of fitting the histogram to a mixture of Gaussians [14], typically rely on bimodality of the histogram or that pixel values have a specific distribution. From the histograms in Fig. 2, we see that bi-modality is not the case here. Furthermore, the distribution for large gradient pixel values is hard to categorize. Although not displayed in the figure, the apparent uniformity for large values continuous up to values above 200 where it finally starts to decline. The maximum value of these particular histograms are 588 and 770. The method we present here does not make any assumptions on the distribution of gradient pixel values of large magnitude, the only assumption made is that the distribution of noise in background regions are approximately normal distributed.

Gradient values below the median, shown as vertical lines in Figs. 2(b) and (d), are fitted to a Rayleigh distribution function,

where *σ* is the parameter that is varied until the best fit is found and *x* is the pixel value. The rationale behind this procedure is that gradient pixel values of *I _{mag}* will be Rayleigh distributed under the assumptions that the values of

*I*and

_{x}*I*are identical, independent, and normally (Gaussian) distributed [17]. Even though this is a crude assumption, we see from the fitted curves in Figs. 2(b) and (d), that Eq. (1) gives an excellent description of the gradient value distribution below the median.

_{y}The parameter *σ* constitutes a good measure of the pixel intensity variation for low gradient pixel values, i.e. for pixels in the background regions. Therefore, pixels which have gradient values far larger than *σ* should be rejected as background and instead be considered as foreground. Here we let the threshold relate to *σ* through *β*=7.5*σ*. This corresponds to declaring pixels with gradient values more than 6 standard deviations larger than the mean of the estimated distribution of background pixels as foreground pixels. It should be mentioned that it is not necessary to specifically use the median as the delimiter for fitting the Rayleigh distribution.

To obtain the final segmented image, the holes (black regions encircled by white regions) in the thresholded image are filled. Furthermore, in order to remove small structures due to noise, we apply a morphological opening [15] with a circular structure element of radius 4 pixels. In Fig. 3, these two steps are illustrated for the image in Fig. 1.

#### 3.2. Finding candidate cell centers

Since each connected component in the segmented image represents a possible cluster of cells, these clusters need to be disconnected into individual cells. This is accomplished by searching for circle segments along the outer contours of objects in the segmented image, using a variation of the circular Hough transform [14]. The principle is illustrated in Fig. 4. While traversing the outer boundary of a connected region in the segmented image, an approximate normal to the boundary is computed at each point. Along each normal, we step-wise go inwards and register the distances to the current boundary point. In this way, an accumulation matrix is formed where each element holds the number of times it has been “hit” by a normal from the boundary and keeps track of the corresponding distances. When each point along a cell cluster boundary has been visited, we look for local maxima in this accumulation matrix. The resulting circles in Fig. 1(c) are displayed in green. The centers of the circles will then act as candidate cell centers for the final contour extraction.

A common approach for finding suitable object centers from segmented images in similar applications [9, 15] is to look for local maxima in the distance transform of the segmented image. However, the distance transform only works when cells are sparsely clustered, up to approximately three or four cells in each cluster. Although somewhat more computationally intensive, the method described above works for heavier clustering and is hence more generally applicable. Nevertheless, both approaches depend on that all cells must have at least a part of the boundary exposed to the background region in the segmented image.

#### 3.3. Cell contour extraction using dynamic programming

Reasonable yeast cell contours are fairly smooth and, at least to a certain degree, convex in shape. In order to find the “most suitable” cell contour, such shape constraints should be balanced with the data in the image, for example gradients and directional derivatives. In the literature, a shape model that is adapted according to surrounding edges is often referred to as a deformable template, see e.g. [18, 11, 10]. The reason behind using a shape model is that it will guide the solution to pick the edges which correspond best to the model, when image information is either missing or when it contains inconsistent and ambiguous edges. Examples are loss of contrast at a specific part of a cell boundary or nearby occluding cells. Concerning the image information part of the cell extraction, it turns out that the dark diffraction fringe that is evident around yeast cells imaged in bright field, is located at approximately the same distance to the true cell contour, irrespective of the distance from the focal plane. In this paper we utilize this feature to locate cell contours, but this can easily be modified depending on the application.

In principle, each closed sequence of connected pixels surrounding a candidate cell center is a candidate for a cell contour, and therefore the space of possible contours is immense. In order to regularize the problem, we introduce a reference system in the form of a polar plot relative to each candidate center. Figure 5(a) displays the directional derivatives *I _{x}* and

*I*at

_{y}*M*=32 rays emanating from a candidate cell center, where each ray is sampled at

*N*=30 equally spaced radial points. Here we let suitable contour points to be those that are located in between positions with large derivatives directed in opposite directions. This conforms with the notion that the true cell boundary lies at the dark fringe around the cell. Formally,

*f*(

*r*,

*ϕ*) denotes the contour point criterion function for a point at distance

*r*and angle

*ϕ*to be a contour point. The angle

*ϕ*is measured in the counter-clockwise direction, starting at the black arrow in Fig. 5(a). Let

for *i*∊{1,2,…,*N*} and *m*∊{1,2,…,*M*}. *I _{x}* (

*r*,

*ϕ*) and

*I*(

_{y}*r*,

*ϕ*) are interpolated values (where needed) at distance

*r*and angle

*ϕ*from the candidate cell center in the derivative images

*I*and

_{x}*I*, respectively. The cosine and sine operations correspond to a projection along the current ray. This will result in less impact of edges that are not oriented in the same direction as the ray.

_{y}In Fig. 5(b), the criterion function *f* (*r*,*ϕ*) is displayed for the candidate center in Fig. 5(a). Bright pixels in Fig. 5(b) represent high values of *f* and hence “good” positions. The idea is now to pick a path from left to right in this polar plot, visiting each column once. To find the optimal path, we use dynamic programming [19]. Note that simply taking the maximum point *r* for each ray in direction *ϕ* will in general not produce a continuous and closed contour.

When picking a path from left to right in the polar plot, restrictions on radial transitions will affect the variety of shapes of the final contour. In other words, the restrictions will define our deformable template model. When using dynamic programming [19], it is only possible to put local restrictions on these transitions; they cannot for example depend on the entire path up to the present angle *ϕ*. Global shape constraints, such as convexity, are therefore not possible to enforce directly. It is not even possible to guarantee that the final contour will be closed, i.e. that it starts and ends in the same radial position. In the special cases of enforcing convexity and closedness, there are however methods to circumvent this problem.

Here we use a method that penalizes transitions in the polar plot that correspond to right turns in the original image coordinates as we encircle the candidate center in a counter-clockwise direction. If we in addition can guarantee that the resulting contour will be closed, this local convexity condition will produce a globally convex contour, i.e. the contour of a shape that is geometrically convex. Closedness can in turn be enforced implicitly by running the algorithm separately for each of the *N* possible starting points. Nevertheless, we use a heuristic algorithm that runs the dynamic programming scheme for three consecutive laps and uses the middle part of the final path as the solution. Details regarding the dynamic programming scheme and the transition conditions are presented in the Appendix.

In Fig. 6, we demonstrate the contour extraction method. In the upper example, we used a simple transition rule that only allows straight or diagonal transitions between consecutive columns in the polar plot. In the lower example, we used a condition that penalizes transitions that corresponds to making right-turns, also called the local convexity condition. With the latter condition, the algorithm manages to retrieve the contour even though the candidate center is heavily off-centered. This is the typical behavior of the algorithm. By using the local convexity rule, the deformable template is flexible enough to handle heavily off-centered candidate cell centers while still being able to enforce the convexity of the resulting contour. Hence, besides being more theoretically appealing than the transition rule exemplified in the upper example of Fig. 6 which is commonly used in contour extraction using dynamic programming [20], the local convexity rule also has increased robustness towards poor positioning of candidate cell centers. In both examples, we used the heuristic three laps-methods for obtaining closed contours, where the middle part (between the two vertical red lines) is used as optimal path and displayed in the corresponding images on the right. Note that only using one lap in the polar plot of Fig. 6, the upper example would not have resulted in a closed contour, since the optimal path deviates upwards from the correct one as it approaches the boundary on the right.

## 4. Performance

The cell contour recognition algorithms were run on 1163 cells in 25 independent images, each containing between 24 to 65 cells. Typical images exhibited a relatively low degree of cell clustering and cell density and are well represented by the examples in Figs. 1 and 2. The density and clustering of the images in the study are comparable to the density and clustering in the example images of the references [13, 7, 8].

Visual examination of the defined cell contours showed that 96% were correctly defined, and the algorithms were proven to be robust both in regards to differences in cell sizes as well as the degree of off focus positioning. The remaining 4% of the cells were either missed, incorrectly defined, or were false hits. These classes are exemplified in Figs. 7(a)–(d). Cells are missed mainly as a consequence of being part of a cluster. Even if they are not completely surrounded by cells, the outer border of cells located inside clusters could be too short for a candidate cell center to be defined using the method illustrated in Fig. 4. This is case for the missed cells in Fig. 7(d). Incorrectly defined contours are generally only generated for buds. Here, it could be argued that buds should not be included as a separate cell at all. However, if they should, a more narrow range of possible radial distances in the refined contour extraction step should be used for buds. In either case, a last step could fairly easily be included, discarding unreasonable contours, e.g. those that are too small or have a too large overlap with other cells. In this way, the number of incorrect contours and false hits could be reduced.

To estimate the speed of the algorithms, cpu-times were measured with the algorithms running on a PC with an Intel Duo-Core CPU at 1.66GHz with 2GB RAM, in Windows XP and Matlab R2006a. Time-critical algorithms were implemented in C and linked to Matlab using the mex-utility. Cpu-times were measured by alternate use of the “tic” and “toc” commands in Matlab. For three example images, containing 49, 96 and 173 cells, the cpu-times for cell recognition from bright field image to refined cell contours were measured to be approximately 1.4, 2.0 and 2.8 seconds, respectively. The convex contour extraction part, using the heuristic three-laps method, takes about 10 ms per candidate center to compute for *M*=32 and *N*=30.

## 5. Example of application

After successful cell contour recognition, analysis of individual cells from transmission and fluorescence images becomes straightforward. This can be used to measure and quantify cell specific parameters, such as cell size or fluorescence intensity. Contour recognition can also be used as the basic step to more advanced image analysis, e.g. determining the subcellular localization of the fluorescence signal. By matching cell contours in time-lapse studies, individual cell quantities can be analyzed as a function of time.

We here illustrate the applicability and functionality of the method through some examples from a pilot study of protein translocation between cytoplasm and nucleus. Specifically, we analyzed the localization of the minichromosome-maintenance protein Mcm4p over the cell cycle for individual cells. As part of the prereplication complexes (pre-RC), which is assembled at DNA origins in G1 phase to initiate replication, Mcm4p is relatively well studied [21]. In particular, Mcm4p is a suitable test case for measurements and analyzes of dynamic nuclear localization processes of proteins [22, 23, 24]. Using the algorithms described above, various parameters of the Mcm4p translocation processes can be extracted. Here we illustrate this by measuring the temporal variation of Mcm4p nuclear signal and by measuring the residence time of Mcm4p in the nucleus and the cytoplasm. It should, however, be emphasized that the presented analysis only is attended as an inspirational example off application and solid biological conclusions require further study, possible involving synchronized cells.

Figure 8(a) shows the localization dynamics of Mcm4p over approximately one and a half cell cycle. As reported elsewhere [22, 23, 24], Mcm4p enters the nucleus in early G1 phase and is transported out back to the cytoplasm during S phase. Immunoblot analysis has shown that Mcm4p levels remain constant over the cell cycle [22]. As the localization is from the cytoplasm to a single volume within the cells, we can estimate the dynamics of the translocation processes by measuring the fluorescence intensity for individual cells over time. The cell recognition algorithms were first applied to the bright field images using the first image in the z-stack to find cell contours for each time step. Then the identified cell contours were matched throughout the time-stack. In the corresponding fluorescence images, the nucleus is typically found in one of the three z-planes, or sometimes faintly in two of them. We used the mean of the two most intense z-plane’s maximum intensities as a rough estimate of the fluorescence signal in the nucleus. In Fig. 8(b), the intensity variation is plotted for a mother cell and it’s two emerging daughter cells over a time span of 5 hours. The protein accumulation to the nucleus appears to be nearly identical for the mother and the first daughter cell, but the exportation back to the cytoplasm differ substantially. This is probably caused by the prolonged G1 phase known for daughter cells [25]. The cell cycle duration for the mother cell is found to be approximately 3 hours, which is roughly twice the time expected for cells grown under optimal conditions. This is probably due to the low growth temperature, i.e. 20 °C.

Given that the cell contour has been determined it is fairly straightforward to automatically detect fluorescence localization to the nucleus. Except during mitosis, the fluorescence in a nucleus appears as a single circular local maximum with diameter proportional to the cell size [26]. Here, we used algorithms searching for this feature within the detected cell contours as a means of determining Mcm4p nuclear localization at each time point in the time series. The duration of nuclear and cytoplasmic localization was then extracted by measuring the time between the first and second localization transitions. As seen in Fig. 8(c), the examined population, consisting of 90 cells, exhibits a broad distribution of duration times, in particular for nuclear localization. A plot of the duration times versus cell diameter, see Fig. 8(d), indicate that the most deviant duration times correspond to cells smaller than approximately 4.5 *µ*m in diameter, i.e. most probably daughter cells.

## 6. Concluding remarks

We have described a cell contour recognition method that is fast and exhibits excellent performance for yeast cells that are not too clustered. The novel contributions in this method are an adaptive thresholding technique for gradient images and a refined contour extraction algorithm for convex shapes. In the latter case, the use of dynamic programming enables particularly fast computations. The enhanced state space can incorporate transition rules and constraints involving three neighbouring (radial) positions. These rules define and specify a deformable template. By penalizing right-turns in the contour search, convex shapes are favored. This local convexity rule increases robustness towards poor positioning of candidate cell centers. Since the convexity is not related to size, the contour extraction step also works for a wide range of different cell sizes. “Snakes”, as proposed by Xu and Prince [18], among others, similarly specify constraints on three neighbouring positions. The present model, however, obtains the optimal solution in a single sweep through the possible radial positions, whereas “snakes” need several iterations. The method is further regularized through the use of candidate cell centers. This significantly increases both robustness and computational speed, which in turn facilitates automation.

We find that dynamic programming, in conjunction with the local convexity rule, is well suited as a general tool for finding the optimal path surrounding a candidate cell center. The objective function for candidate radial positions along each ray, i.e. Eq. (2), is flexible and can easily be modified. For example, it is possible to add a term that gives larger weights to radial positions located near the boundary of the segmented image. This can be an advantage for clustered cells, for which the resulting contour will rely more on external information through the gradient image at positions that are close to the background. For radial positions within a cluster, on the other hand, the resulting contour relies more on the specification given through the deformable template model.

As an inspirational example, we illustrated the method through an analysis of cell cycle dependent translocation of the protein Mcm4p. This kind of spatio-temporal analysis of protein localizations is substantially simplified by the determination of cell contours. We showed that by measuring the intensity in individual cells, we could follow the increase and decrease in fluorescence signal caused by Mcm4p accumulation to and dispersion from the cell nucleus. For non-migrating proteins, a similar measure of intensity levels in individual cells can be used to study the distribution of expression levels in different populations. This is similar to what can be done using flow cytometry, but with the advantage that each cell can be followed over time. In the last example, we measured the cell sizes to automatically discriminate between daughter and mother cells. This is useful for example when studying cell cycle regulation, aging or size phenotypes.

Even though the presented method exhibits excellent overall performance, possibilities for a number of future improvements exist. A refined algorithm for identifying candidate cell centers could, for example, improve performance for images that exhibit cell clustering. This might be achieved by comparing final contours with the segmented image obtained in the initial steps and then assigning candidate cell centers to foreground areas in the segmented image where no cells have been detected. Another possible improvement could be to include a final step that checks for unreasonable contours. An attractive subject for future development of the algorithms would be automatic detection of the advent of a bud on a mother cell. Protein expression levels or localization could then automatically be correlated to the cell cycle for each individual cell. Furthermore, such an algorithm would allow identification of mother/daughter relationships over the time span of an experiment and enable automatic quantification of protein expression levels across generations.

## A. Appendix: Details on the refined contour extraction

## A.1. Dynamic programming

The radial distances *r _{i}* will henceforth only be referred to through the row index

*i*∊{1,2,…,

*N*} and analogously for the angles

*ϕ*with column index

_{m}*m*∊{1,2,…,

*M*}. A radial distances

*r*at angle

_{i}*ϕ*will be referred to as state

_{m}*i*in column

*m*. We associate a cost matrix

*C*of size

*N*times

*M*for the cost of visiting row

*i*in column

*m*. Low values are preferable and here we let

*C*(

*i*,

*m*)=-

*f*(

*r*,

_{i}*ϕ*) where

_{m}*f*is defined in Eq. (2). The transition penalty function

*T*(

*k*,

*i*) defines rules for possible transitions between states

*k*and

*i*in two consecutive columns

*m*-1 and

*m*. Different

*T*relates to difference in shapes of the final contour. The cost

*S*(

*p*) for a contour or path

*p*=(

*p*

_{1},

*p*

_{2},…,

*p*) from left to right is $S\left(p\right)=C({p}_{1},1)+\sum _{m=2}^{M}\left\{C({p}_{m},m)+T({p}_{m-1},{p}_{m})\right\}$ .

_{M}The optimal path *p**, is the path that minimizes the sum *S*(*p*). Note that if *T*(*k*, *i*)=0 for all *k* and *i*, the optimal path is the trivial path achieved by letting each *p _{m}* be the column wise minimizer of

*C*(

*i*,

*m*) as

*i*∊{1,2,…,

*N*} for each

*m*. Dynamic programming [19] successively finds the optimal path from the leftmost column to each of the states in column

*m*as we increase

*m*from 2 to

*M*. The algorithm executes in

*𝒪*(

*N*

^{2}

*M*) operations as follows. First, define a cumulative cost matrix

*Q*and a matrix

*P*of pointers.

*Q*(

*i*,

*m*) is the cost for the optimal path from column 1 to

*m*ending in state

*i*.

*P*(

*i*,

*m*) holds the pointer to the state in the previous column. Then proceed with:

**Initialization**(*m*=1):*Q*(*i*,1)=*C*(*i*,1) for*i*∊{1,2,…,*N*}**Recursion**(*m*=2,*M*): For each*i*=1,…,*N*calculate the current optimal cumulative cost$Q(i,m)=C(i,m)\underset{k\in \{1,2,...,N\}}{\mathrm{min}}\left\{Q(k,m-1)+T(k,i)\right\},$

and save the pointer to the corresponding minimizer:

*P*(*i*,*m*)=argmin_{k}{*Q*(*k*,*m*-1)+*T*(*k*,*i*)}**Termination:**The minimal cost is$S\left({p}^{*}\right)=\underset{i}{\mathrm{min}}Q(i,M)$

and the last state of the corresponding path is

*p**_{M}=argmin_{i}*Q*(*i*,*M*)**Traceback:**The full optimal path is found by backtracing using the pointer matrix*P*: For*m*=*M*,…,2*p**_{m-1}=*P*(*p**_{m},*m*)

The local restriction on the transition penalty *T* makes it impossible to explicitly enforce global conditions on the final contour such as convexity and closedness of the contour. By closedness of a path *p* we mean that the last and first element in the path *p _{M}* and

*p*

_{1}, are compatible with the rule for transitions between states. There are however methods that can be used to enforce convexity and closedness.

## A.2. Convexity

A necessary condition on the contour of a convex shape is that the contour only makes left-turns as we go along the contour in a counter-clockwise direction. The opposite is not true in general, but if the contour is guaranteed to be closed, a left-turn condition will implicate convexity. However, a left-turn condition introduces restrictions on possible configurations involving three consecutive states in the polar plot. Therefore we have to enhance the state space so that each state includes two consecutive radial positions in the polar plot. Only by using an enhanced state space like this can we penalize right-turns.

Let each state be two dimensional *i*=(*i*
_{1},*i*
_{2}) where the components refer to radial indices in consecutive columns of the polar plot. A transition between state *k* in column *m*-1 to state *i* in column *m* is penalized according to

$T(k,i)=\{\begin{array}{cc}\infty \phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}& \mathrm{if}{k}_{2}\ne {i}_{1}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\\ \alpha g\left(\theta \right)\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}& \mathrm{otherwise}\end{array}$

for some function *g*≥0 where *θ* is defined as the turning angle, see Fig. 9(a). The case with the infinite weight above is a consequence of the fact that two consecutive enhanced states share the same “middle” radial position. The tuning parameter *α* determines the relative weight of the transition penalty in relation to the cost matrix. In this paper, we let *g* be

where *θ*
_{0} is the maximal left-turn angle, taken to be twice the nominal turn rate per column 2*π*/*M*, e.g. *π*/8 when *M*=32. See Fig. 9(b) for a plot of *g*. This particular choice of *g* which penalizes right-turns and restricts too sharp left-turns has been chosen on empirical grounds to be suitable for our application. A hard penalty condition with *g*(*θ*)=∞ for *θ*<0 is not appropriate due to the discrete nature of the state space.

Since the state space is increased quadratically, we would expect the computational complexity of using this enhanced state space to be *𝒪*(*N*
^{4}
*M*). However, since the middle radial position is the same in two consecutive enhanced states, the computational complexity for solving this enhanced problem is *𝒪*(*N*
^{3}
*M*).

## A.3. Closedness

Closedness has to be guaranteed automatically, without human intervention, and therefore it needs to be inherent to the formulation of the problem. We will present two methods that can be used to produce a closed contour. The first method to enforce closedness is by finding the optimal path in an extended cost matrix *C̃* of size *N* times *M*+1, where the last column in *C̃* is the same as the first. The optimal path is then found for each possible initial position *k* separately using *C̃* as cost matrix where *C̃*(1,ℓ)=*C̃*(*M*+1,ℓ)=∞ for ℓ≠=*k* and *C̃*(1,*k*)=*C̃*(*M*+1,*k*)=*C*(1,*k*). When the optimal solution has been found for all *N* possible starting positions, we pick the one with overall lowest total cost. This method increases the computational complexity by a factor *N*.

An alternative, heuristic resort is instead to transverse the polar plot from left to right three times and find the optimal extended path of this extended cost matrix using the same method as explained above. We then take the middle part of the extended path as the optimal path. In this way the free boundary conditions at the leftmost and rightmost column in the extended matrix have less effect on the middle part and the optimal path. This heuristic method increases the complexity by a factor 3. Experience of using this method has shown that the resulting contour will practically always be closed. Therefore if for example *N*=30, the computational complexity is 10 times lower using this heuristic algorithm. This is a significant decrease in our application and therefore we use this method. The final contour is checked so that it is closed, and if not, the method in the previous paragraph will be used.

## Acknowledgement

This work was supported by the Swedish Research Council, the Swedish Foundation for Strategic Research and Chalmers Biocenter. The authors wish to thank two anonymous referees for comments that resulted in substantial improvements of the paper.

## References and links

**1. **S. Di Talia, J. Skotheim, J. Bean, E. Siggia, and F. Cross, “The effects of molecular noise and size control on variability in the budding yeast cell cycle,” Nature **448**, 947–951 (2007). [CrossRef] [PubMed]

**2. **A. Colman-Lerner, A. Gordon, E. Serra, T. Chin, O. Resnekov, D. Endy, C. Pesce, and R. Brent, “Regulated cell-to-cell variation in a cell-fate decision system,” Nature **437**, 699–706 (2005). [CrossRef] [PubMed]

**3. **J. Bean, E. Siggia, and F. Cross, “Coherence and timing of cell cycle start examined at single-cell resolution,” Molecular Cell **21**, 3–14 (2006). [CrossRef] [PubMed]

**4. **D. Botstein, S. Chervitz, and M. Cherry, “Yeast as a model organism,” Science **277**, 1259–1260 (1997). [CrossRef] [PubMed]

**5. **S. Forsburg, “The art and design of genetic screens: Yeast,” Nat. Rev. Genet. **2**, 659–668 (2001). [CrossRef] [PubMed]

**6. **W. Huh, J. Falvo, L. Gerke, A. Carroll, R. Howson, J. Weissman, and E. O’Shea, “Global analysis of protein localization in budding yeast,” Nature **425**, 686–691 (2003). [CrossRef] [PubMed]

**7. **S.-C. Chen, T. Zhao, G. J. Gordon, and R. F. Murphy, “Automated image analysis of protein localization in budding yeast,” Bioinformatics **23**, 66–71 (2007). [CrossRef]

**8. **M. A. de Carvalho, R. de A. Lotufo, and M. Couprie, “Morphological segmentation of yeast by image analysis,” Image Vis. Comput. **25**, 34–39 (2007). [CrossRef]

**9. **A. Niemistö, J. Selinummi, R. Saleem, I. Shmulevich, J. Aitchison, and O. Yli-Harja, “Extraction of the number of perozisomes in yeast cells by automated image analysis,” in *Proc. of the 28th IEEE EMBS*.

**10. **H. Rue and O. K. Husby, “Identification of partly destroyed objects using deformable templates,” Stat. Comput. **8**, 221–228 (1998). [CrossRef]

**11. **K. V. Mardia, W. Qian, D. Shah, and K. M. de Souza, “Deformable Template Recognition of Multiple Occluded Objects,” IEEE Trans. Pattern Anal. and Mach. Intell. **19**, 1035–1042 (1997). [CrossRef]

**12. **T. Saito, J. Sese, Y. Nakatani, F. Sano, M. Yukawa, Y. Ohya, and S. Morishita, “Data mining tools for the *Saccharomyces cerevisiae* morphological database,” Nucleic Acids Res. **33**, 753–757 (2005). [CrossRef]

**13. **A. Gordon, A. Colman-Lerner, T. Chin, K. Benjamin, R. Yu, and R. Brent, “Single-cell quantification of molecules and rates using open-source microscope-based cytometry,” Nat. Methods **4**, 175–181 (2007). [CrossRef] [PubMed]

**14. **R. Gonzales and R. Woods, *Digital Image Processing* (Prentice-Hall, Upper Saddle River, New Jersey, 2002).

**15. **P. Soille, *Morphological Image Analysis:principles and applications* (Springer, Berlin, 2003).

**16. **N. Otsu, “A threshold selection method from gray-level histograms,” IEEE Trans. on Systems, Man, and Cybernetics **9**, 62–66 (1979). [CrossRef]

**17. **J. A. Rice, *Mathematical Statistics and Data Analysis*, 2nd ed. (Duxbury Press, Belmont, California, 2006).

**18. **C. Xu and J. Prince, “Snakes, Shapes, and Gradient Vector Flow,” IEEE Trans. Image Proc.7, 359–369 (1998). [CrossRef]

**19. **P. Pedregal, *Introduction to Optimization* (Springer, New York, 2004).

**20. **M. Sonka, V. Hlavac, and R. Boyle, *Image Processing and Machine Vision*, 3rd ed. (Thomson Learning, London, 2007).

**21. **S. Forsburg, “Eukaryotic MCM proteins: Beyond replication initiation,” Microbiol. Mol. Biol. Rev. **68**, 109–131 (2004). [CrossRef] [PubMed]

**22. **K. Labib, J. Diffley, and S. Kearsey, “G1-phase and B-type cyclins exclude the DNA-replication factor Mcm4 from the nucleus,” Nat. Cell Biol. **1**, 415–422 (1999). [CrossRef] [PubMed]

**23. **V. Nguyen, C. Co, K. Irie, and J. Li, “Clb/Cdc28 kinases promote nuclear export of the replication initiator proteins Mcm2-7,” Curr. Biol. **10**, 195–205 (2000). [CrossRef] [PubMed]

**24. **Y. Sheu and B. Stillman, “Cdc7-Dbf4 phosphorylates MCM proteins via a docking site-mediated mechanism to promote S phase progression,” Mol. Cell **24**, 101–113 (2006). [CrossRef] [PubMed]

**25. **L. Hartwell and L. Unger, “Unequal division in *Saccharomyces cerevisiae* and its implications for the control of cell division,” J. Cell Biol. **75**, 422–435 (1977). [CrossRef] [PubMed]

**26. **P. Jorgensen, N. P. Edgington, B. L. Schneider, I. Rupes, M. Tyers, and B. Futcher, “The Size of the Nucleus Increases as Yeast Cells Grow,” Mol. Biol. Cell **18**, 3523–3532 (2007). [CrossRef] [PubMed]