Neu(t)ralMC: energy-efficient open source Monte Carlo algorithm for assessing photon transport in turbid media

Abigail Clennell; Vinh Nguyen; Vladislav V. Yakovlev; Alexander Doronin

doi:10.1364/OE.496516

1. Introduction

Optical-based technologies are extensively utilized for assessing object shape, visual appearance and material properties and have generated considerable interest spanning across a wide range of practical applications, spanning from space missions conducted by NASA to modern biology, medicine, and consumer goods/wearable devices markets [1–4]. Nevertheless, when applied to turbid mediums such as biological tissues, multiple scattering of light greatly hinders the ability of light-based methods to deliver high-quality imaging/treatment results [5]. In practice, conceptual design of a specific optical diagnostic system for non-invasive in vivo measurements of structural modifications in biological tissues, as well as alterations in their physiological properties, involves a meticulous in silico modelling of various technical parameters such as wavelength, coherence, polarization and intensity profile of the incident optical radiation, the sensitivity of the detector, the size, and the geometry and relative position of the source and detector, among others [6].

Formally, this requires solving the so-called Radiative Transfer Equation (RTE). The RTE is an integro-differential equation that describes the equilibrium of radiative energy in a turbid medium with a set of emission, absorption, and scattering properties:

(1)$$\begin{aligned}\frac{1}{c}\frac{\partial L(\textbf{p}, \vec{\omega}, t)}{\partial t} &+ \vec{\omega} \cdot \nabla L(\textbf{p}, \vec{\omega}, t) + \mu_t(\textbf{p}, \vec{\omega}, t) L(\textbf{p}, \vec{\omega}, t) \\ &= \mu_s(\textbf{p}, \vec{\omega}, t) \int_{4\pi} P(\textbf{p}, \vec{\omega}' \rightarrow \vec{\omega}, t) L(\textbf{p}, \vec{\omega}', t) d\vec{\omega}' + Q(\textbf{p}, \vec{\omega}, t)\end{aligned}$$

Here, $L(\textbf {p}, \vec {\omega }, t)$ corresponds to the radiance at position $\textbf {p}$, moving in direction $\vec {\omega }$, at time $t$; $\mu _t$ and $\mu _s$ are the total and scattering coefficients respectively; $P(\textbf {p}, \vec {\omega }' \rightarrow \vec {\omega }, t)$ is the phase function representing the probability that photon with $\vec {\omega }'$ will scatter into new direction $\vec {\omega }$; $Q(p, \vec {\omega }, t)$ is the source term representing any emission of radiation within the medium; and $c$ is the speed of light. Two major classes of numerical solutions are usually employed for providing solutions to the RTE: analytical (deterministic) and stochastic methods. Analytical solutions often fail due to the complexity of the equation, especially when considering inhomogeneous and irregular structure of turbid media where scattering and absorption properties vary volumetrically and stochastic alternatives, such as Monte Carlo methods, which can easily handle these complexities, are often used [5].

The Monte Carlo (MC) method itself has a rich history in the modelling of photon transport in turbid media, originating from the field of nuclear physics. The method itself was originally introduced during the Manhattan Project by Stanislaw Ulam, John von Neumann, and Nicholas Metropolis in the 1940s. It initially served as a tool to study neutron diffusion in nuclear fission and shielding problems [7]. The fundamental idea of the MC method is to employ random sampling techniques to solve deterministic problems by approximating the solution through the generation of random numbers. Over the years MC method found a number of applications in a variety of disciplines and the initial use of the MC method for simulating light propagation in biological tissues was proposed by Wilson and Adam in 1983 [8]. Their work laid the foundation for further development and adaptation of the MC method in the field of biomedical optics. Groundbreaking work by Steven Jacques, Marleen Keijzer, Scott Prahl and Lihong Wang resulted in the development of well-recognized MCML package which significantly contributed to the development of computational methods in biomedical optics [9,10].

The MC method has since been enhanced by numerous research groups and extensively applied in the analysis of photon transport for a variety of applications with several models being actively developed/supported at present [11–16]. MC has been so crucial in understanding the light-tissue interactions and the potential of optical methods for diagnosis and therapy that a “gold standard” term was coined. From the computational perspective, MC performs tracking of the paths of multiple photon packets in order to estimate quantities of interest, e.g., absorbed energy, polarization, flux of exiting photons and the radiance $L(p, \vec {\omega }, t)$. MC solutions are statistical in nature, and the accuracy (for unbiased solver) improves with the square root of the number of photon packets simulated following Central Limit Theorem (CLT) rule [17]. The intrinsic computational inefficiency of this process is its core drawback, yet it simultaneously presents an advantage when it comes to parallelization strategies [18].

In the early 2000s, Central Processing Unit (CPU) based MCs initially dominated the field, leveraging the processing power of single/multiple cores to handle complex calculations. These traditional CPU-based simulations provided reliable results, but at the expense of longer processing times due to limited parallelization capabilities. With the advent of general-purpose computing on graphics Processing Units (GPGPU) in the 2010s, MC simulations experienced a notable shift in performance. Graphics processing units (GPUs) offer massively parallel architectures, enabling efficient execution of multiple tasks simultaneously. This feature has greatly accelerated MC simulations, reducing the computation time to seconds [17–19]. Increased parallelization capabilities, enabled researchers to handle more complex and intricate light-tissue interaction problems. MC method for photon transport in turbid media has been continuously evolving, with research focusing on increasing computational efficiency, developing accurate and detailed tissue models, and exploring novel applications in biomedical optics, e.g., spectra, polarization and coherent properties, light’s angular momentum, fluorescence, training of Artificial-Intelligence (AI) based classifiers, etc. [6,20–27].

Nevertheless, despite impressive modern GPU computational capabilities, MCs can be associated with several "energy-heavy" calculations, e.g., beam focusing/shaping in turbid medium [28], propagation of coherent polarized and spatially-shaped light [6,23], spatially-offset spectroscopy and microscopy [29], nonlinear interactions of different colors [29], etc. It has been reported that the power consumption of GPUs can be several times higher than that of traditional CPUs [30]. While these technologies have significantly accelerated computational processes and enabled groundbreaking research, they have also raised concerns regarding their environmental impact [31]. The increasing energy consumption of these systems directly contributes to greenhouse gas emissions and exacerbates global warming [32]. The environmental impact of GPUs and (High-Performance Systems) HPC systems has prompted researchers and industries to explore energy-efficient designs and optimization strategies [33]. For instance, novel GPU architectures, such as NVIDIA Ampere, have been developed to minimize power consumption while maintaining high-performance capabilities [34]. Most recently, Apple’s M-family processors have made headlines with its remarkable power efficiency and high-performance capabilities. Introduced in 2020, the M1 chip utilizes Arm-based architecture, which has been a longstanding choice for power-efficient mobile devices [35]. It is built on a 5-nanometer process and integrates the CPU, GPU, Neural Engine, and other components together, resulting in substantial improvements in both performance and power consumption [35,36].

In light of the pressing concerns surrounding global warming and climate change, it is imperative to take action to address the environmental impact of computational methods, including the development of next-generation MC models. This research introduces an innovative, open-source Neu(t)ralMC algorithm for photon migration in biological tissues. The algorithm’s name subtly hints at two core principles: power efficiency and enhancement through Machine Learning (ML) techniques. Within the framework of this paper, we do not delve into the ML component and purely focus on optimizing the algorithm for the energy-conserving Apple processors. Our findings substantiate that the introduced algorithm stands as a simultaneously nearly real-time and power-efficient trade-off solution. It can serve as a viable addition to present techniques, facilitating eco-friendly and efficient computational strategies in biomedical optics while retaining previously developed capabilities. This research paves the way for wider utilization of energy-saving processors in the biomedical optical field, as well as greener and more sustainable modeling solutions.

2. MC method and its energy-efficient implementation

Classical MC technique simulates energy transmission within a medium in the form of so-called photon packets in order to provide solution to the RTE [5]. MC relies on stochastic processes and random sampling to imitate the interactions of photon packets with the medium, providing a comprehensive understanding of light propagation, and the influence of optical properties such as absorption $\mu _a$, scattering $\mu _s$ coefficients and anisotropy $g$ factor [9,18]. In practice, MC simulates the paths that individual photon packets take through the turbid medium with respect to probabilities of scattering, absorption, and reflection. A photon packet is associated with a few fundamental properties such as position $\mathbf {p}_{\text {i}}$, direction $\vec {\omega }_{i}$ and statistical weight $W_{i}$ at the i-th scattering event. Generation of a photon packet trajectory can be broken down into a series of distinct steps including initialization, path length determination, photon propagation, interaction with the media (scattering and absorption) and termination. This process is repeated multiple times to obtain statistical estimates of the quantities of interest, e.g., diffuse reflectance or distributions of absorbed energy within the medium (see Algorithm 1).

Algorithm 1. Photon transport in turbid media using MC method

View Table | View all tables in this article

In the proposed MC, the critical factor eventually contributing to its speed and power efficiency is the novel design of the algorithm programming pipeline specifically targeting M-family hardware (see Fig. 1). Internally, M-family processors feature both high-performance and high-efficiency CPU cores and intelligently allocates tasks to them, ensuring optimal performance while minimizing energy consumption. Apple’s chip also incorporates an integrated GPU that shares the same memory layout with the CPU, enabling much faster data access and reduced power usage as shown in Fig. 1 (two middle pictures). The GPU on the M1 processor offers considerable performance improvements over its predecessors and now caters to the needs of various general-purpose applications, including gaming, video editing, and rendering.

Fig. 1. The computational pipeline and execution flow of the developed MC method are schematically presented. In this pipeline, the input/output of simulations is processed by energy-efficient CPU cores. Meanwhile, the GPU, organized around the concept of execution units, runs the actual MC kernel code. This GPU architecture manages multiple MC threads and enables seamless reading/writing to the unified memory space, effectively eliminating the traditional overheads associated with CPU-GPU data transfers.

Download Full Size | PDF

For the actual programming, we utilize the Metal Compute pipeline, Apple-developed framework for GPU-accelerated computing, which plays a pivotal role in harnessing the capabilities of the M processor for MC simulations [37]. Metal offers a low-overhead hardware-accelerated graphics and compute Application Programming Interface (API) in C-style language and specifically designed for parallel data processing on the GPU. We utilize a key component of the Metal Compute framework, the compute shader, which is a programmable kernel function, to implement specific MC algorithms on the GPU.

The Apple M-series GPUs are split into multiple cores which are organized around the concept of execution units rather than traditional streaming multiprocessors (SMs) as observed in, e.g., NVIDIA chips. Each core consists of multiple execution units (E-units) and specialized hardware, enabling efficient simultaneous processing of thousands of threads. Apple GPUs employ a multi-threading model, which optimally utilizes the GPU’s resources and enhances power-efficient workload partitioning of jobs [37]. Ultimately, each thread executes the same MC compute shader but operates on distinct output data, facilitating concurrent processing of multiple elements in the problem domain, e.g., sampling photon propagation/trajectories. We have implemented and optimized MC kernels using Metal language, primarily aiming to maintain code structure and compatibility with previously developed solutions. Our shaders are structured into grids and subdivided into smaller $M \times N$ thread groups. Each thread group is further segmented into individual MC threads that execute the actual simulation algorithm. For example, the M1 Pro (given our specific memory constrains) has the capability to concurrently execute a group of $2\times 10^4$ of photons in parallel batches.

Memory management within Neu(t)ralMC is structured along three primary Metal memory spaces: threadgroup memory (hundreds of kilobytes), private memory (order of kilobytes), and device/unified memory (order of gigabytes) [37]. Threadgroup memory is a fast, shared memory space that is accessible by all threads within the same threadgroup, facilitating data sharing and communication among threads. Private memory is unique to each thread and is used for storing local variables such as photon position and direction or intermediate MC simulation results. For instance, $\vec {\omega }_{i}'$ is determined at each scattering event, using a Henyey–Greenstein phase function [38] introduced into MC by Prahl [10]:

(2)$$p_{HG}(cos(\theta)) = \frac{1}{4\pi}\frac{ 1-g^2}{{(1+g^2-2 g cos(\theta))}^{3/2}},$$

where $\theta$ is the polar scattering angle and $g$ is the anisotropy factor or the mean cosine of the scattering angle. On the other hand, device/unifed memory is accessible by all threads (including CPU) and typically used for storing input and output data such as tissue structure, diffuse reflectance, etc. By organizing MC compute shaders into grids and threadgroups, we can efficiently distribute the workload and harness the full power of current and future Apple chips, making the overall solution scalable. Our MC simulation is terminated when a desired number of photon packets have been processed, or when the remaining photon packets’ statistical weights or number of scattering events does not meet predefined thresholds.

We have conducted comprehensive tests to evaluate our Neu(t)ralMC implementation in terms of speed, power efficiency, and stability (see Fig. 2). Initially, we performed a comparison between the photon packet throughput of the developed model and the previously established NVIDIA CUDA and CPU-based algorithms. Our method is demonstrating exceptional results, with the capability of tracing 2 (M1 Pro chip) to 5 (M2 chip) million photons per second for homogeneous media. Pilot assessments indicate that the currently available solutions [15,17,18,39,40] utilizing high-end NVIDIA graphics cards generally outperform Apple’s GPUs in terms of the total number of traced photons per second (2-3 times). However, this increased speed comes at the cost of significant energy consumption. For instance, the GeForce RTX 3090 can process approximately twice as many photons per second but draws around 360 [W] at peak, in contrast to the M1 Pro GPU which utilizes only about 10 [W] under full load. On the other hand, the CPU versions of MCXYZ.C (skin vessel model) [16] and mcsub.c [41] achieve rates of only $\sim 5\times 10^{3}$ and $\sim 5\times 10^{4}$ photons per second using a single performance core of the M2 chip, respectively. It should also be noted that a consistent trend has been observed across all implementations. When an inhomogeneous voxelized representation of the media is utilized, it leads to a speed decrease of 8-10 times. The actual performance is contingent upon the substantial differences in algorithm complexity, hardware architecture, programming frameworks/languages, and development kits.

Fig. 2. Simulation time as a function number of photons performed for a semi-infinite scattering media with $\mu _s$ = 100 [cm-1], $\mu _a$ = 1 [cm-1], g = 0.9, n = 1.4 calculated for M1 Pro and M2 processors. (left). GPU power drawn by M2 processor for simulations involving $10^6$ photons as a function of scattering, absorption and anisotropy parameters for semi-infinite media with n = 1.4 (right). Estimated power consumed by various electronic devices/appliances in one second is shown for the visual reference.

Download Full Size | PDF

Therefore, our approach has been evaluated by performing MC simulations for a variety of scattering, absorption and anisotropy parameters (see Fig. 2). Notoriously, with an increase in the scattering, the number of interactions also increases, which subsequently results in higher energy consumption. On the opposite side, an increase in absorption leads to an earlier termination of photon packets, resulting in less energy being utilized. In 60% of the simulation cases, the power draw remains below the threshold of 1 Watt represented by an isosurface (red) in Fig. 2, right. This demonstrates highly-efficient power usage in a majority of our simulation scenarios. In contrast, many modern GPUs tend to use between 10 to 30 Watts of power while only idling, which rises to hundreds of watts when useful computational work is performed [42].

Impressive speed ensures that the simulation of light transport in complex biological tissues can be performed at a much faster rate compared to the previously developed approaches. On the other hand, the energy efficiency of our newly developed method is a more significant improvement with up to two orders of magnitude, compared to other GPU approaches [15,17,18,39]. Furthermore, in order to assess stability of our method, we conducted long-duration tests where simulation was run continuously for two weeks, with $5\times 10^{12}$ photons traced during this period. Zero recorded failures and continuous improvements in RTE solution accuracy indicate that the method maintains a high level of numerical and computational stability, showcasing its reliability for extended periods of use.

3. Validation and discussion

We validated our approach through comparisons with known analytical solutions and other techniques. At first we examined the values of total diffuse reflectance $R_d$ produced by analytical solutions for a semi-infinite medium, adding-doubling method, and other MC’s outcomes [9,18,43,44]. This metric is particularly relevant when studying the scattering of light by particles, molecules, or surfaces that exhibit diffuse reflection [2,39]. In the context of photon transport calculations, $R_d$ represents the proportion of light scattered by the medium compared to the incident light. Mathematically, $R_d$ is calculated by adding the specular component reflected at the air-medium interface $S$, the radiant flux of light that escaped the medium in all directions $E$, and then dividing the result by the total radiant flux of the incident light, which is represented by the total number of incident photon packets $N_{photons}$ and their initial statistical weight $W_{init} = 1.0$ (for non-polarized simulations) [5,9,41] such as:

(3)$$R_d = \frac{S+E}{W_{init} \times N_{photons} }$$

Our simulations outcomes have been found to be in excellent agreement for a standard benchmark (see, e.g., [9,18]) semi-infinite scattering medium with the following optical properties: $\mu _s$ = 90 [$cm^{-1}$], $\mu _a$ = 10 [$cm^{-1}$], g = 0 (isotropic scattering), n = 1.5 and are comparatively presented in Table 1. For each case, ten instances with $5\times 10^4$ photon packets have been run in order to compute the average/standard error. When simulations are performed for $10^6$ photons, Neu(t)ralMC error is reduced to 0.00013 with Rd = 0.26013.

Table 1. $R_d$ values derived from analytical solutions and relevant models in comparison with developed method.

View Table | View all tables in this article

From the practical point of view, a so-called fluence rate distribution $\Phi$ has always played an important role in understanding the behavior of light propagation in complex media such as biological tissues [9]. It provides valuable information on the density of photons within the medium, which is essential for applications such as photodynamic therapy and optical imaging. Therefore we performed further comparison of the fluence rate, fraction of escaping flux and calculated relative difference between developed method and standard mcsub.c benchmark [41] shown in Fig. 3. Moreover, in biomedical research and clinical applications, voxelized MC simulations of photon transport have recently become a tool of a more popular choice [45]. These simulations offer a powerful computational method for modelling light-media interactions in complex, heterogeneous biological tissues with embedded advanced geometric structures (such as blood vessels). Voxel-based MC models facilitate work with high-resolution, three-dimensional representations of tissue structures, enhancing the accuracy of the simulation results and assessment metrics. The improved understanding of light-tissue interactions provided by these simulations has critical implications for numerous modalities, such as photodynamic therapy, optical imaging, and diagnostic techniques in biophotonics and biomedical optics.

Fig. 3. Neu(t)ralMC validation. (a) Fluence rate distribution simulated by the developed model. (b) Fluence rate distribution counted by mcsub.c [41]. (c) Escaping flux for three different anisotropy parameters. (d) The relative difference between two models calculated as $Diff = (\Phi _{a} - \Phi _{b}) / (\Phi _{a} + \Phi _{b})$. MC simulations have been performed for a semi-infinite scattering media with $\mu _s$ = 100 [$cm^{-1}$], $\mu _a$ = 1 [$cm^{-1}$], g = 0.9, n = 1.4.

Download Full Size | PDF

In particular, MCXYZ.C program by Jacques, a voxel-based Monte Carlo simulation tool, stands out for its ability to model light transport in intricate 3D tissue structures [16]. This open-source software has been used widely to study photon migration in various biological tissues, and its inherent flexibility allows for customization and adaptation to a variety of applications. MCXYZ.C considers volumetrically inhomogeneous optical properties of the medium, such as absorption, scattering, and anisotropy, providing a robust and accurate representation of light propagation in complex media. Neu(t)ralMC has full support for the input/output data structures of MCXYZ.C and a direct comparison between two implementations for the skin vessel model provided is presented in Fig. 4. Skin vessel model incorporates all the essential elements of tissue, including the epidermis, dermis, and a blood vessel structure, capturing the complexities and heterogeneities of real biological media [16]. It is a valuable starting point for those looking to investigate skin tissue’s optical properties and light interactions, enabling the ease of extensions for novel diagnostic and therapeutic techniques in lungs, brain, vascular medicine, and other related fields.

Fig. 4. Fluence rate and corresponding energy deposition counted by Neu(t)ralMC (top row) and MCXYZ.C (bottom row) for blood vessel model included with MCXYZ.C.

Download Full Size | PDF

In addition, we demonstrate how computer graphics visualization techniques can be applied for MC generated outputs in the context of a practical optical therapy application (presented in Fig. 5). We implemented two widely-used approaches for visualizing 3D data: volume rendering [46,47] and slice planes [48]. Volume rendering is a comprehensive method that considers the entire MC output, including both fluence and reflectance information, to generate semitransparent visualizations of the underlying complex tissue structure and/or investigate delivery of therapeutic doses of optical radiation in applications such as Photodynamic therapy (PDT) [49,50]. In contrast, slice planes offer a simpler yet effective visualization approach by displaying cross-sectional views of the 3D volume data. These 2D slices can be interactively manipulated and positioned to explore different regions within the tissues, providing a clear and detailed view of the internal structure. Slice planes are particularly useful in biomedical imaging, where they have been employed to examine anatomical structures and malformations in modalities, e.g., Optical Coherence Tomography (OCT) and Photoacoustic tomography (PAT) scans [49,50].

Fig. 5. Schematic presentation of an optical therapy system where the delivered dose of light needs to be evaluated (left). 3D visualization of the full voxelized output produced by Neu(t)ralMC: here (a) represents volumetric rendering and (b) corresponds to slice plane of recorded fluence rate. Simulations have been performed using $10^{10}$ of photon packets for the blood vessel model included with MCXYZ.C.

Download Full Size | PDF

The data, code and related scripts for the developed model and previously developed NVIDIA CUDA-based MC models have been made readily accessible to the public through GitHub [40,51], ensuring that researchers and developers worldwide can benefit from their capabilities. Neu(t)ralMC code is released under the BSD 3-Clause License fostering a wide range of potential use cases and the growth of the research community. By making our model code available we commit to open science, enabling others to build upon, validate and verify our and their work and contribute to ongoing developments in the field.

4. Conclusions

In conclusion, we introduced a novel, open-source MC algorithm for photon transport in biological tissues, specifically designed for energy-efficient Apple processors. Our results confirm that the proposed method offers a competitive addition to existing techniques, enabling more sustainable and energy efficient (by at least two orders of magnitude) computational approaches in biomedical optics research. This concept work paves the way for the broader adaptation of energy-conserving processors in biomedical imaging and therapeutic applications, ultimately contributing to the ongoing efforts towards more environmentally friendly and cost-effective solutions in healthcare.

Neu(t)ralMC is continuously evolving, we are incorporating cutting-edge capabilities to simulate a wide range of optical phenomena, such as polarized light, orbital angular momentum, fluorescence, Raman [23,29,52] and advanced 3D graphics rendering capabilities, etc. The algorithm, is also set to incorporate recent ML-based denoising/sample reconstruction techniques [53], which is the topic of a subsequent paper. Furthermore, the model is being integrated into a previously developed online cloud-based MC simulation platform for the needs of Biophotonics, Biomedical Optics and Computer Graphics, which offers a user-friendly responsive HTML5/Python interface for performing simulations and analyzing results [15]. This seamless workflow and continuous expansion of the model’s capabilities ensures that it remains one of the state-of-the-art tools for advancing research in the biomedical optics community.

Funding

Royal Society Te Apārangi (E3929); Air Force Office of Scientific Research (FA9550-20-1-0366, FA9550-23-1-0599); U.S. Department of Defense (W81XWH2010777); National Science Foundation (CMMI-1826078); National Institutes of Health (1R21CA269099, 1R21GM142107, R01GM127696); U.S. Food and Drug Administration (80ARC023CA002E).

Acknowledgments

We would like to express our sincere gratitude to Professor Steven Jacques, the pioneer in the field of Monte Carlo methods for photon transport in biological tissues, for his valuable suggestions and early testing of this work. His expertise and insights have been instrumental in shaping our approach and ensuring the accuracy and performance of the presented method.

Disclosures

The authors declare no conflicts of interest

Data availability

Data and code underlying the results presented in this paper are readily available from [51].

References

1. V. V. Tuchin, Handbook of Optical Biomedical Diagnostics, Second Edition, Volume 2: Methods (SPIE, 2016).

2. V. V. Tuchin, Handbook of Optical Biomedical Diagnostics, Second Edition, Volume 1: Light-Tissue Interaction (SPIE, 2016).

3. H. H. Barrett and K. J. Myers, Foundations of Image Science (Wiley-Interscience, 2004).

4. D. Robinson, K. Hoong, W. Kleijn, A. Doronin, J. Rehbinder, J. Vizet, A. Pierangelo, and T. Novikova, “Polarimetric imaging for cervical pre-cancer screening aided by machine learning: ex vivo studies,” J. Biomed. Opt. 28(10), 1 (2023). [CrossRef]

5. I. V. Meglinski, A. Doronin, A. N. Bashkatov, E. A. Genina, and V. V. Tuchin, “Dermal component-based optical modeling of skin translucency: Impact on skin color,” in Computational Biophysics of the Skin (Routledge, 2014).

6. A. Doronin, C. M. Macdonald, and I. V. Meglinski, “Propagation of coherent polarized light in turbid highly scattering medium,” J. Biomed. Opt. 19(2), 025005 (2014). [CrossRef]

7. N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller, “Equation of state calculations by fast computing machines,” J. Chem. Phys. 21(6), 1087–1092 (1953). [CrossRef]

8. B. C. Wilson and G. Adam, “A monte carlo model for the absorption and flux distributions of light in tissue,” Med. Phys. 10(6), 824–830 (1983). [CrossRef]

9. L. Wang, S. L. Jacques, and L. Zheng, “Mcml–monte carlo modeling of light transport in multi-layered tissues,” Comput. Methods Programs Biomedicine 47, 131 (1995). [CrossRef]

10. S. L. Jacques and G. Adam, “History of monte carlo modeling of light transport in tissues using mcml.c,” Biomed 27(08), 083002 (2022). [CrossRef]

11. D. Marti, R. N. Aasbjerg, P. E. Anersen, and A. K. Hansen, “Mcmatlab: an open-source, user-friendly, matlab-integrated three-dimensional monte carlo light transport solver with heat diffusion and tissue damage,” J. Biomed. Opt. 23(12), 1 (2018). [CrossRef]

12. J. Jönsson and E. Berrocal, “Multi-scattering software: part i: online accelerated monte carlo simulation of light transport through scattering media,” Opt. Express 28(25), 37612–37638 (2020). [CrossRef]

13. “Virtual photonics technology initiative,” https://virtualphotonics.org (2023).

14. “MCX, Monte Carlo extreme, a GPU-accelerated photon transport simulator,” http://mcx.space (2023).

15. “Cloud based Monte Carlo tool for photon transport,” http://www.lighttransport.net (2023).

16. “mcxyz.c: A Monte Carlo model of light transport in 3D turbid media,” https://omlc.org/software/mc/mcxyz/index.html, accessed: 2023-05-03.

17. A. Doronin and I. V. Meglinski, “Peer-to-peer monte carlo simulation of photon migration in topical applications of biomedical optics,” J. Biomed. Opt. 17(9), 0905041 (2012). [CrossRef]

18. I. V. Meglinski and A. Doronin, “Online monte carlo for biomedical optics,” Biomed. Opt. Express 2(9), 2461–2469 (2011). [CrossRef]

19. E. Alerstam, T. Svensson, and S. Andersson-Engels, “Parallel computing with graphics processing units for high-speed monte carlo simulation of photon migration,” J. Biomed. Opt. 13(6), 060504 (2008). [CrossRef]

20. I. V. Meglinski and S. J. Matcher, “Computer simulation of the skin reflectance spectra,” Comput. Methods Programs Biomedicine 70(2), 179–186 (2003). [CrossRef]

21. A. Doronin, A. J. Radosevich, V. Backman, and I. V. Meglinski, “Two electric field monte carlo models of coherent backscattering of polarized light,” J. Opt. Soc. Am. A 31(11), 2394 (2014). [CrossRef]

22. G. I. Petrov, A. Doronin, H. T. Whelan, I. V. Meglinski, and V. V. Yakovlev, “Human tissue color as viewed in high dynamic range optical spectral transmission measurements,” Biomed. Opt. Express 3(9), 2154–2161 (2012). [CrossRef]

23. A. Doronin, N. Vera, J. P. Staforelli, P. Coelho, and I. V. Meglinski, “Propagation of cylindrical vector laser beams in turbid tissue-like scattering media,” Photonics 6(2), 56 (2019). [CrossRef]

24. V. V. Dremin, Z. Marcinkevics, E. A. Zherebtsov, A. Popov, A. Grabovskis, H. Kronberga, K. Geldnere, A. Doronin, I. V. Meglinski, and A. V. Bykov, “Skin complications of diabetes mellitus revealed by polarized hyperspectral imaging and machine learning,” IEEE Trans. Med. Imaging 40(4), 1207–1216 (2021). [CrossRef]

25. J. C. Ramella-Roman, S. A. Prahl, and S. L. Jacques, “Three monte carlo programs of polarized light transport into scattering media: part i,” Opt. Express 13(12), 4420 (2005). [CrossRef]

26. J. C. Ramella-Roman, S. A. Prahl, and S. L. Jacques, “Three monte carlo programs of polarized light transport into scattering media: part ii,” Opt. Express 13(25), 10392 (2005). [CrossRef]

27. A. Doronin, L. Tchvialeva, I. Markhvida, T. K. Lee, and I. V. Meglinski, “Backscattering of linearly polarized light from turbid tissue-like scattering medium with rough surface,” J. Biomed. Opt. 21(7), 071117 (2016). [CrossRef]

28. B. H. Hokr, J. N. Bixler, G. Elpers, B. G. Zollars, R. J. Thomas, V. V. Yakovlev, and M. O. Scully, “Modeling focusing gaussian beams in a turbid medium with monte carlo simulations,” Opt. Express 23(7), 8699 (2015). [CrossRef]

29. Z. Di, B. H. Hokr, H. Cai, K. Wang, V. V. Yakovlev, A. V. Sokolov, and M. O. Scully, “Spatially offset raman microspectroscopy of highly scattering tissue: theory and experiment,” J. Mod. Opt. 62(2), 97–101 (2015). [CrossRef]

30. S. Hong and H. Kim, “An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness,” in International Symposium on Computer Architecture (2009).

31. K. Ma, X. Li, W. Chen, C. Zhang, and X. Wang, “Greengpu: A holistic approach to energy efficiency in GPU-CPU heterogeneous architectures,” 41st International Conference on Parallel Processing pp. 48–57 (2012).

32. A. Shehabi, S. J. Smith, D. Sartor, R. E. Brown, M. K. Herrlin, J. G. Koomey, E. R. Masanet, N. Horner, I. M. L. Azevedo, and W. Lintner, United States data center energy usage report (2016).

33. R. Chen, J. Shi, Y. Chen, B. Zang, H. Guan, and H. Chen, “Powerlyra: Differentiated graph computation and partitioning on skewed graphs,” ACM Trans. Parallel Comput. 5(3), 1–39 (2018). [CrossRef]

34. NVIDIA, “Nvidia ampere architecture,” https://www.nvidia.com/en-us/data-center/nvidia-ampere-architecture/ (2020).

35. Apple Inc., “Apple unleashes m1,” https://www.apple.com/nz/newsroom/2020/11/apple-unleashes-m1/ (2020).

36. J. Webb, “Inside Apple’s m1 chip,” AnandTech (2020).

37. Apple Inc., Metal Shading Language Specification (2021).

38. L. C. Henyey and J. L. Greenstein, “Diffuse radiation in the galaxy,” The Astrophys. J. 93, 70–83 (1941). [CrossRef]

39. I. Meglinski and A. Doronin, Monte Carlo Modeling of Photon Migration for the Needs of Biomedical Optics and Biophotonics (Taylor & Francis, 2012), pp. 1–72.

40. A. Doronin, “mcxyzCUDA: NVIDIA CUDA version of mcxyz Monte Carlo program,” https://github.com/aledoronin/mcxyzCUDA, accessed: 2023-08-23.

41. S. L. Jacques, “mcsub.c: a subroutine written in C language that can be called to run a Monte Carlo simulation,” https://omlc.org/software/mc/mcsub/index.html, accessed: 2023-05-03.

42. “List of GPUs by power consumption,” https://www.eatyourbytes.com/list-of-gpus-by-power-consumption (2023).

43. R. G. Giovanelli, “Reflection by semi-infinite diffusers,” J. Mod. Opt. 2(4), 153–162 (1955). [CrossRef]

44. S. A. Prahl, “The adding-doubling method,” in Optical-Thermal Response of Laser-Irradiated Tissue (Springer, 1995).

45. S. L. Jacques, “Tutorial on monte carlo simulation of photon transport in biological tissues [invited],” Biomed. Opt. Express 14(2), 559–576 (2023). [CrossRef]

46. M. Levoy, “Display of surfaces from volume data,” IEEE Comput. Grap. Appl. 8(3), 29–37 (1988). [CrossRef]

47. R. A. Drebin, L. C. Carpenter, and P. Hanrahan, “Volume rendering,” 15th Annual Conference on Computer Graphics and Interactive Techniques (1988).

48. B. Hamann, H.-C. Hege, and L. Linsen, Visualization in Medicine and Life Sciences III (Springer, 2016).

49. V. V. Tuchin, ed. Handbook of Optical Biomedical Diagnostics (SPIE Press, 2016), 2nd ed.

50. V. V. Tuchin, Tissue Optics: Light Scattering Methods and Instruments for Medical Diagnosis (SPIE Press, 2015), 3rd ed.

51. A. Doronin, “NMC,” GitHub (2023), https://github.com/aledoronin/NMC.

52. B. H. Hokr and V. V. Yakovlev, “Raman signal enhancement via elastic light scattering,” Opt. Express 21(10), 11757 (2013). [CrossRef]

53. K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising,” IEEE Trans. on Image Process. 26(7), 3142–3155 (2016). [CrossRef]

*Reference*	Rd	*Error*
Giovanelli (1955)	0.2600	-
Adding-doubling, Prahl (1995)	0.26079	0.00079
MCML, Wang (1995)	0.25907	0.00170
03MC, Doronin (2011)	0.25957	0.00043
Neu(t)ralMC	0.26048	0.00048

*Reference*	Rd	*Error*
Giovanelli (1955)	0.2600	-
Adding-doubling, Prahl (1995)	0.26079	0.00079
MCML, Wang (1995)	0.25907	0.00170
03MC, Doronin (2011)	0.25957	0.00043
Neu(t)ralMC	0.26048	0.00048

Neu(t)ralMC: energy-efficient open source Monte Carlo algorithm for assessing photon transport in turbid media

Abstract

1. Introduction

2. MC method and its energy-efficient implementation

3. Validation and discussion

4. Conclusions

Funding

Acknowledgments

Disclosures

Data availability

References

Data availability

Cited By

Figures (5)

Tables (2)

Equations (3)

Optics Express

Vladislav V. Yakovlev	https://orcid.org/0000-0002-4557-1013
Alexander Doronin	https://orcid.org/0000-0003-4615-9220