Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

A generalized expression for accelerating beamlet decomposition simulations

Open Access Open Access

Abstract

Paraxial diffraction modeling based on the Fourier transform has seen widespread implementation for simulating the response of a diffraction-limited optical system. For systems where the paraxial assumption is not sufficient, a class of algorithms has been developed that employs hybrid propagation physics to compute the propagation of an elementary beamlet along geometric ray paths. These “beamlet decomposition” algorithms include the well-known Gaussian beamlet decomposition (GBD) algorithm, of which several variants have been created. To increase the computational efficiency of the GBD algorithm, we derive an alternative expression of the technique that utilizes the analytical propagation of beamlets to tilted planes. We then use this accelerated algorithm to conduct a parameter-space search to find the optimal combination of free parameters in GBD to construct the analytical Airy function. The experiment is conducted on a consumer-grade CPU, and a high-performance GPU, where the new algorithm is 34 times faster than the previously published algorithm on CPUs, and 67,513 times faster on GPUs.

© 2024 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement

1. Introduction

Physical optics models are integral to the design and tolerancing of diffraction-limited optical systems. Traditional diffraction theory derived from the Huygens-Fresnel principle enforces the assumption that the optical system is scalar and paraxial in order to express the propagated field in terms of the Fourier transform. To support plane-to-plane propagation, we approximate the propagation of an optical field ($E(r)$) as a projection onto a parabolic phase kernel, resulting in the Fresnel diffraction integral given in Eq. (1),

$$E_{2}(\mathbf{r}_{2},z) = \frac{e^{ikz}}{i\lambda z} e^{\frac{ik}{2z}|\mathbf{r}_{2}|^{2}}\iint_{-\infty}^{\infty} E_{1}(\mathbf{r}_{1},0) e^{\frac{ik}{2z}|\mathbf{r}_{1}|^{2}} e^{\frac{-i2\pi}{\lambda z} (\mathbf{r}_{1} \cdot \mathbf{r}_{2})}d^{2}\mathbf{r}_{1}.$$

Here $\mathbf {r}_{2}$ is the radial coordinate at the plane of evaluation at distance $z$, $\mathbf {r}_{1}$ is the radial coordinate at the plane $z=0$ where the propagation begins, $d^{2}\mathbf {r}_{1}$ is the differential for the two dimensional integration, $k$ is the wavenumber of light and $\lambda$ is the wavelength. Equation (1) is expressed such that the Fourier kernel (rightmost exponential term in Eq. (1)) is separated from the parabolic phase term over the source coordinates $\mathbf {r}_{1}$. In the limit where $z \gg |\mathbf {r}_{1}|$, the parabolic phasor in the integrand disappears and we are left with the Fraunhofer diffraction integral in Eq. (2),

$$E_{2}(\mathbf{r}_{2},z) = \frac{e^{ikz}}{i\lambda z} e^{\frac{ik}{2z}|\mathbf{r}_{2}|^{2}}\iint_{-\infty}^{\infty} E_{1}(\mathbf{r}_{1},0) e^{\frac{-i2\pi}{\lambda z} (\mathbf{r}_{1} \cdot \mathbf{r}_{2})}d^{2}\mathbf{r}_{1},$$
which represents a simple Fourier transform of the field $E_{1}$. Equations (1) and (2) form the cornerstone of modern scalar diffraction modeling and have proven to be an excellent tool for the modern era in open-source diffraction modeling packages [14]. However, these formulations enforce that the optical system is paraxial and neglects “ray aberration” effects like wavefront and pupil aberration. These effects are captured best by ray trace models of optical systems, which ignore diffraction. To capture these effects simultaneously, one can employ a “hybrid” approach to propagation physics where several models are linked to form a more physical simulation. This can be done by computing the scalar wavefront error with a ray trace to the exit pupil and then using the diffraction integrals above to propagate to the image plane. However, linking multiple models simultaneously is error-prone, requiring considerable effort to ensure the propagation physics are appropriately synched. Furthermore, the scalar and paraxial assumption is insufficient to model some optical systems. One example is high-numerical aperture systems, like microscope objectives [5] or highly aspheric mirrors [6].

A more elegant approach would be to perform the diffraction and ray trace simulations simultaneously. To do so, we consider the Gaussian beamlet decomposition technique (GBD) [712]. GBD is a method of physical optics propagation where the propagated field is simulated as a finite summation of Gaussian beams (as shown in Eq. (3)) propagated normal to the local wavefront (as shown in Fig. 1)

$$E_{2}(\mathbf{r}_{2},z) = \sum_{j = 0}^{N} \frac{E_{o,j}}{q(z)_{j}} e^{(\frac{-ik|\mathbf{r}_{2}-\mathbf{r}_{j}|^{2}}{2q(z)_{j}})},$$
where the argument of the sum shows the field for the $j-th$ Gaussian beam shifted by $\mathbf {r}_{j}$ in the entrance pupil and whose complex beam parameter is $q(z)_{j}$. The complex beam parameter is related to the waist radius ($w(z)$) and wavefront radius of curvature ($R(z)$) of a Gaussian beam by Eq. (4)
$$q(z)^{{-}1} = \frac{1}{R(z)}+i\frac{\lambda}{\pi w(z)^2}.$$

 figure: Fig. 1.

Fig. 1. Illustration of Gaussian beamlet decomposition. An input wavefront (left) is decomposed into a sum of Gaussian beams. They are propagated through an optical system, and the resultant aberrated wavefront is represented by a change in optical path of each of the beamlets (right).

Download Full Size | PDF

The field propagation in Eq. (3) is valid as long as the beamlets that constructed it represent the initial field well. Traditional GBD employs solely the fundamental Gaussian mode, which is incapable of representing the sharp edges seen in conventional imaging systems well. However, GBD has recently seen substantive development by Worku and Gross, which generalized the propagation of a Gaussian beam to include curved [10], truncated [13], polarized [11], and spectrally chirped [12] Gaussian beamlets to use in decomposition to increase the accuracy of the simulation. This modified GBD (MGBD) approach illustrates the flexibility of beamlet decomposition for high-fidelity integrated modeling of ray aberration, diffraction, and polarization. In a prior work [14], we derived an implementation of MGBD and implemented it in an open-source propagation package [15] to determine if GBD was suitable for astronomical high-contrast imaging simulations. That study found the runtime required to conduct highly-sampled simulations was a major limitation. This was due to MGBD requiring the propagation of every Gaussian beamlet to a single point in the plane where the propagated field would be evaluated. Furthermore, since the decomposition of a wavefront into a discrete set of Gaussian beams does not have a unique solution, we need to explore the free parameters available in MGBD simulations in order to determine what combination delivers the most accurate result. This requires iterating on the different parameters, which can take a considerable amount of time.

In this study, we derive an alternative expression for MGBD inspired by the work of Weber configuring the general Collins integral for misaligned optical elements [16] that enables the propagation of a Gaussian beam to a tilted plane instead of a single point. We then validate the accuracy of the proposed “plane-evaluation” method against the “point-evaluation” method we previously derived [14,15], and perform a runtime comparison on CPUs and GPUs. In Section 2, we review the point-evaluation method derived for solutions of the Collins integral. In Section 3, we describe Weber’s expression of the Collins integral for misaligned elements and our modification to their derivation to support efficient MGBD without loss of generality. In Section 4, we perform self-consistency tests between the plane-evaluation algorithm, Fresnel diffraction, and the point-evaluation algorithm to demonstrate the accuracy of the propagation physics. In Section 5 we quantify the runtime gains achievable through using the plane-evaluation algorithm over the point-evaluation algorithm. In Section 6, we use the new algorithm to perform a parameter space search of the optimal sampling conditions for the various degrees of freedom used in GBD, including overlap factor and number of beamlets used.

2. Methods

2.1 General Collins integral

The general Collins integral [17] is a reformulation of second-order diffraction theory that relates the diffraction from one plane orthogonal to the propagation axis to another via a nonorthogonal ray transfer matrix whose elements are matrices $\mathbf {A},\mathbf {B},\mathbf {C},\mathbf {D}$. The rotationally symmetric version is a direct analog to the Fresnel diffraction integral in Eq. (1) and is expressed as,

$$E_{2}(\mathbf{r}_{2}) = \frac{1}{\lambda B} e^{D|\mathbf{r}_{2}|^{2}} \iint_{-\infty}^{\infty} E_{1}(\mathbf{r}_{1}) exp[\frac{i\pi}{\lambda B}(A|\mathbf{r}_{1}|^{2} - 2(\mathbf{r}_{1} \cdot \mathbf{r}_{2}) + D|\mathbf{r}_{2}^{2}|)d^{2}\mathbf{r}_{1},$$
where $A, B,$ and $D$ are elements of the paraxial 2$\times$2 ray transfer matrix [18]. The general integral for nonorthogonal optical systems is given by [13,17] Eq. (6),
$$E_{2}(\mathbf{r}_{2}) = \frac{i\ n_{1}}{\lambda\ det(\mathbf{B})^{1/2}} exp({-}ikl_{o}) \iint_{-\infty}^{\infty} E(\mathbf{r}_{1}) exp({-}ikl_{1}) d^{2}\mathbf{r}_{1},$$
where $n_{1},n_{2}$ are the refractive indices of the incident and exiting media, $k$ is the wave number, $l_{o}$ is the axial propagation distance, and $l_{1}$ represents the transformation of the field by the ABCD optical system, which is given by Eq. (7),
$$l_{1} = \frac{1}{2} \begin{pmatrix} \mathbf{r}_{1} \\ \mathbf{r}_{2} \\ \end{pmatrix}^{T} \begin{pmatrix} n_{1}\mathbf{B}^{{-}1} \mathbf{A} & -n_{1}\mathbf{B}^{{-}1} \\ n_{2}(\mathbf{C} - \mathbf{D} \mathbf{B}^{{-}1} \mathbf{A}) & \mathbf{D} \mathbf{B}^{{-}1} \\ \end{pmatrix} \begin{pmatrix} \mathbf{r}_{1} \\ \mathbf{r}_{2} \\ \end{pmatrix},$$
where the superscript $T$ denotes the transpose and $\mathbf {r}_{1},\mathbf {r}_{2}$ are the radial coordinates in the transverse plane before and after propagation, respectively. In free space, the refractive indices of the incident and exiting media are equal to unity $n_{1} = n_{2} = 1$. Furthermore, since the ray transfer matrix is symplectic, it obeys the following relations:
$$\mathbf{B}\mathbf{A}^{T} = \mathbf{A}\mathbf{B}^{T}$$
$$\mathbf{B}^{T}\mathbf{D} = \mathbf{D}^{T}\mathbf{B}$$
$$\mathbf{D}\mathbf{C}^{T} = \mathbf{C}\mathbf{D}^{T}$$
$$\mathbf{A}^{T}\mathbf{C} = \mathbf{C}^{T}\mathbf{A}$$
$$\mathbf{A}\mathbf{D}^{T} - \mathbf{B}\mathbf{C}^{T} = \mathbf{I},$$
where $\mathbf {I}$ is the identity matrix. Nazarathy showed that Eq. (6) can alternatively be expressed as Eq. (13) [19].
$$E_{2}(\mathbf{r}_{2}) = K \iint_{-\infty}^{\infty} E(\mathbf{r}_{1}) exp(\frac{-ik}{2}[\langle{\mathbf{r}_{2}}|\mathbf{D}\mathbf{B}^{{-}1}|{\mathbf{r}_{2}}\rangle + \langle{\mathbf{r}_{1}}|\mathbf{B}^{{-}1}\mathbf{A}|{\mathbf{r}_{1}}\rangle - 2\langle{\mathbf{r}_{1}}|\mathbf{B}^{{-}1}|{\mathbf{r}_{2}}\rangle]) d^{2}\mathbf{r}_{1},$$
where $K$ are the constants in front of the integral in Eq. (6), and the “Bra-Ket” notation is chosen to be consistent with Weber’s notation [16]. Since the vectors and matrices are real-valued, $\langle {\mathbf {v}}|$ is equivalent to $|{\mathbf {v}}\rangle ^{T}$. Equation (13) has been solved by several authors to determine the propagation laws for Hermite [20], Laguerre [21], truncated [13], and pulsed [12] Gaussian beams. All of these solutions can be employed in beamlet decomposition for diffraction simulation.

2.2 Point-evaluation approach

To perform GBD for an optical system, we decompose the wavefront in the optical system’s entrance pupil into a sum of Gaussian beams. The result of this decomposition is a set of beamlets whose propagation are described by 5 parabasal rays for each beamlet [7,9,13]. This list of rays in the entrance pupil are represented by $\mathbf{\mathcal{R}}_{EP}$. We use these rays in a GBD simulation by carrying out the algorithm below for each point ($\mathbf {r}_{eval}$) for which the field is evaluated:

  • 1. Propagate the rays $\mathbf{\mathcal{R}}_{EP}$ to the plane where the diffracted field will be evaluated $\mathbf{\mathcal{P}}_{eval}$ with ray tracing, resulting in a new collection of rays $\mathbf{\mathcal{R}}_{eval}$. A member of $\mathbf{\mathcal{R}}_{EP}$ is shown as a black arrow in Fig. 2.
  • 2. Determine the propagation distance from the rays at the evaluation plane $\mathbf{\mathcal{R}}_{eval}$ (shown in red on Fig. 2) to the plane normal to the rays and intersecting the point at which the field will be evaluated $\mathbf {r}_{eval}$, called the transversal plane $\mathbf{\mathcal{P}}_{trans}$.
  • 3. Propagate $\mathbf{\mathcal{R}}_{eval}$ to $\mathbf{\mathcal{P}}_{trans}$ and rotate them into the local coordinate system of the transversal plane, resulting in a new collection of rays $\mathbf{\mathcal{R}}_{trans}$ (shown in blue on Fig. 2).
  • 4. Compute the differential ABCD matrix from the propagation of $\mathbf{\mathcal{R}}_{EP}$ to $\mathbf{\mathcal{R}}_{trans}$ using differential ray tracing.
  • 5. Using the ABCD matrix and $\mathbf{\mathcal{R}}_{trans}$, evaluate the Gaussian field at $\mathbf {r}_{eval}$.
  • 6. Repeat steps 1-5 for every beamlet used in the decomposition of the entrance pupil wavefront, and coherently sum them at $\mathbf {r}_{eval}$.

 figure: Fig. 2.

Fig. 2. Schematic of the “traditional” GBD algorithm referred to in this work. The decomposition is referred to at the entrance pupil of the optical system $\mathbf{\mathcal{P}}_{EP}$, where rays that emanate normal to the center of a beamlet ($\mathbf{\mathcal{R}}_{EP}$) are propagated through an optical system to the plane where we desire to evaluate the field $\mathbf{\mathcal{R}}_{eval}$. In order to perform the field evaluation, the rays must be transformed to the transversal plane $\mathbf{\mathcal{P}}_{trans}$ and the Gaussian field evaluated at $\mathbf {r}_{eval}$ expressed in the coordinates of the transversal plane. The procedure to do so is described in [14].

Download Full Size | PDF

The precise mathematics of this procedure are illustrated in [14], but the simplification offered above reveals the inefficiency we wish to address. A substantial amount of computation is done in steps 2-4, where the traced rays are propagated from $\mathbf{\mathcal{P}}_{eval}$ to $\mathbf{\mathcal{P}}_{trans}$ for every $\mathbf {r}_{eval}$. It would be optimal to instead evaluate the contribution of a beamlet at the plane of points $\mathbf{\mathcal{P}}_{eval}$ from the outset since we are frequently interested in the distribution of fields aligned to a plane (e.g., a detector in an imaging system). However, recall from Section 2.1 that the use of the Collins integral requires that the fields before and after propagation be defined on planes orthogonal to propagation [17,19]. To solve for the influence of a beamlet on a plane tilted with respect to the propagation direction, we need an expression similar to the Collins integral to diffract to tilted planes.

3. Proposed plane-evaluation approach

Weber [16] proposed an alternative expression of the general Collins integral to model wave propagation between misaligned optical elements. Weber’s formulation generalizes the Collins integral to fields that propagate along a “center of gravity” vector $\mathbf {v}_{CG}$ that is generally not aligned to the optical axis $\mathbf {v}_{OA}$. A schematic of this propagation scheme is shown in Fig. 3. The misalignment vector is given by the difference between the center of gravity vector and the optical axis vector, as shown in Eq. (14).

$$\mathbf{v}_{jM} = \mathbf{v}_{jCG} - \mathbf{v}_{jOA} = \begin{pmatrix} \mathbf{r}_{jM} \\ \mathbf{\theta}_{jM} \\ \end{pmatrix} ; j = 1,2$$

 figure: Fig. 3.

Fig. 3. Illustration of Weber’s misaligned optical system given by an arbitrary ray transfer matrix. Nominally, the Collins integral relates the field at the object plane (left side, dashed vertical line) to the evaluation plane (right side, dashed vertical line). Weber’s formulation reconsiders the Collins integral propagation along the center of gravity vector in object space $\mathbf {v}_{CG,O}$ to the one in evaluation space $\mathbf {v}_{CG,I}$ by adding a misalignment vector in position ($\mathbf {r}_{1,2M}$) and angle ($\mathbf {\theta }_{1,2M}$) and evaluating the integral on the plane transverse to the centers of gravity $\mathbf {\tilde {r}}_{1,2}$.

Download Full Size | PDF

To generalize the propagation through the Collins integral, Weber performs a change of variables to integrate over the transversal plane in object space ($\mathbf {\tilde {r}}_{1}$) instead of the coordinates of the object plane ($\mathbf {r}_{1}$). Since the coordinates of the transversal plane are just a shift of the object plane coordinates by $\mathbf {r}_{1}$ and a rotation by $\mathbf {\theta }_{1}$, the transformed coordinates are given by Eq. (15).

$$\mathbf{v}_{1} = \tilde{\mathbf{v}}_{1} + \mathbf{v}_{1M}.$$

Weber performs this substitution in Eq. (13) which results in the Collins integral that maps a beam propagating along a center of gravity vector in object space ($\mathbf {v}_{CG,O}$) to the coordinates of the evaluation plane ($\mathbf {r}_{2}$), given by Eq. (16).

$$\begin{aligned} E_{2}(\mathbf{r}_{2}) = & K \iint_{-\infty}^{\infty} E(\mathbf{\tilde{r}}_{1}) exp(\frac{-ik}{2}[\langle{\mathbf{r}_{2}}|\mathbf{D}\mathbf{B}^{{-}1}|{\mathbf{r}_{2}}\rangle\\ &+\langle{\mathbf{\tilde{r}}_{1} + \mathbf{r}_{1M}}|\mathbf{B}^{{-}1}\mathbf{A}|{\mathbf{\tilde{r}}_{1} + \mathbf{r}_{1M}}\rangle\\ & -2\langle{\mathbf{\tilde{r}}_{1} + \mathbf{r}_{1M}}|\mathbf{B}^{{-}1}|{\mathbf{r}_{2}}\rangle]) d^{2}\mathbf{\tilde{r}}_{1} \label{eq:collins_misaligned} \end{aligned}$$

Weber moves on to make the same substitution for the radial coordinate on the transversal plane in evaluation space. To see the proof for this, we direct the reader to their work [16]. The result is rather elegant, and given in Eq. (17) (Eq. (10) in Weber’s paper). We have also reproduced part of the derivation in Appendix A of this work.

$$E_{2}(\tilde{\mathbf{r}_{2}}) = E_{2,aligned}(\tilde{\mathbf{r}_{2}}) exp(\frac{-ik}{2}[\langle{\mathbf{r}_{1M}}||{\mathbf{\theta}_{1M}}\rangle - \langle{\mathbf{r}_{2M}}||{\mathbf{\theta}_{2M}}\rangle]),$$

$E_{2,aligned}(\tilde {\mathbf {r}_{2}})$ is the solution to the original Collins integral given by Eq. (6). This relation is robust because it generalizes the propagation of all solutions to the integral to be generally misaligned with respect to the optical axis. However, for beamlet decomposition techniques it is not immediately useful. Since the field in Eq. (17) is expressed in coordinates of the plane normal to a beamlet’s propagation, all beamlets cannot be coherently summed at a common plane.

Thus, we desire an expression for the solution of the Collins integral where the field at all points of a common evaluation plane can be computed simultaneously for a single beamlet. To arrive at this expression, we return to Weber’s derivation starting at Eq. (16). We expand the terms of Eq. (16) and follow Weber’s assertion that the terms linear in $\mathbf {\tilde {r}}_{1}$ vanish, so the resulting integral is given by Eq. (18).

$$\begin{aligned} E_{2}(\mathbf{r}_{2}) = K \iint_{-\infty}^{\infty} E(\mathbf{\tilde{r}}_{1}) exp(\frac{-ik}{2}[&\langle{\mathbf{r}_{2}}|\mathbf{D}\mathbf{B}^{{-}1}|{\mathbf{r}_{2}}\rangle\\ + & \langle{\mathbf{\tilde{r}}_{1}}|\mathbf{B}^{{-}1}\mathbf{A}|{\mathbf{\tilde{r}}_{1}}\rangle\\ - & 2\langle{\mathbf{\tilde{r}}_{1}}|\mathbf{B}^{{-}1}|{\mathbf{r}_{2}}\rangle\\ + &\langle{\mathbf{r}_{1M}}| \mathbf{B}^{{-}1}\mathbf{A} |{\mathbf{r}_{1M}}\rangle\\ - & 2\langle{\mathbf{r}_{1M}}| \mathbf{B}^{{-}1} |{\mathbf{r}_{2}}\rangle]) d^{2}\mathbf{\tilde{r}}_{1}. \end{aligned}$$

Equation (18) is identical to the original Collins integral (Eq. (13)), with the addition of two phase terms (shown in Eq. (19)) that are not functions of the variable of integration ($\tilde {\mathbf {r}}_{1}$).

$$\Phi_{misaligned} = exp(-\frac{ik}{2}[\langle{\mathbf{r}_{1M}}| \mathbf{B}^{{-}1}\mathbf{A} |{\mathbf{r}_{1M}}\rangle - 2\langle{\mathbf{r}_{1M}}| \mathbf{B}^{{-}1} |{\mathbf{r}_{2}}\rangle]])$$

Factoring these terms out of the integrand reveals that we have arrived at an expression similar to Weber, where the field at the evaluation plane is proportional to the aligned Collins integral solution. However, the result (shown in Eq. (20)) is instead expressed in terms of the common evaluation plane ($\mathbf {r}_{2}$) orthogonal to the optical axis.

$$\begin{array}{r} E_{2}(\mathbf{r}_{2}) = \Phi_{misaligned}( K \iint_{-\infty}^{\infty} E(\mathbf{\tilde{r}}_{1}) exp(\frac{-ik}{2}[\langle{\mathbf{r}_{2}}|\mathbf{D}\mathbf{B}^{{-}1}|{\mathbf{r}_{2}}\rangle +\\ \langle{\mathbf{\tilde{r}}_{1}}|\mathbf{B}^{{-}1}\mathbf{A}|{\mathbf{\tilde{r}}_{1}}\rangle -\\ 2\langle{\mathbf{\tilde{r}}_{1}}|\mathbf{B}^{{-}1}|{\mathbf{r}_{2}}\rangle]) d^{2}\mathbf{\tilde{r}}_{1}.), \end{array}$$
$$E_{2}(\mathbf{r}_{2}) = \Phi_{misaligned}E_{2,aligned}(\mathbf{r}_{2}).$$

Using the relation in Eq. (20), a single beamlet can be propagated from a location in the entrance pupil of an optical system to every point in the evaluation plane simultaneously. Compared to the point-evaluation approach, this reduces the number of propagations done in beamlet decomposition algorithms by a factor equal to the number of points simulated in the evaluation plane and will result in favorable performance improvements. Equation (20) illustrates that the standard Collins integral is used to propagate the complex amplitude of the beamlet, and $\Phi _{misaligned}$ applies a phase factor that aligns the beamlet to the evaluation plane.

4. Validation of the plane-evaluation algorithm

The plane-evaluation algorithm described in Section 3 was implemented in Poke, an open-source ray-based physical optics package for Python 3.8+ [15]. Poke was, in part, inspired by the desire to develop an open-source platform to explore the limits of beamlet decomposition for use in diffraction simulation. Poke is capable of saving ray data from commercial and open-source ray tracers and compiling it into a writeable Rayfront object that serves as the interface for conducting beamlet decomposition experiments.

4.1 Test against Fresnel diffraction

Weber’s formulation of the Collins’ integral was originally for paraxial systems that could be represented by a single ray transfer matrix. To validate our approach, we first test the proposed formulation against a paraxial system composed of a single lens that focuses a decentered Gaussian beam. This can be equivalently modeled by the Fresnel diffraction integral by invoking Eq. (1) and setting $U_{1}(r_{1})$ to be a vertically shifted Gaussian beam. The plane-evaluation algorithm can model this by invoking the Gaussian solution to the Collins integral, given by Eq. (21).

$$E_{2}(\mathbf{r}_{2}) = \frac{e^{{-}ikl_{o}}}{\sqrt{|\mathbf{A} +\mathbf{B}\mathbf{Q}_{1}^{{-}1}}|} e^{-\frac{ik}{2}\langle{\mathbf{r}_{2}}|\mathbf{Q}_{2}^{{-}1}|{\mathbf{r}_{2}}\rangle}$$
where $\mathbf {Q}_{2}^{-1}$ is given by the well-known propagation law of the complex curvature matrix [19].
$$\mathbf{Q}_{2}^{{-}1} = (\mathbf{C} + \mathbf{D}\mathbf{Q}_{1}^{{-}1})(\mathbf{A} + \mathbf{B}\mathbf{Q}_{1}^{{-}1})^{{-}1}$$

The ray transfer matrix of interest is given by matrix multiplication of the ray transfer matrices that describe the refraction of a ray by a paraxial lens of focal length $f$ and propagating a distance equal to $f + \delta x$ and is given by Eq. (23)

$$\begin{bmatrix} \mathbf{A} & \mathbf{B} \\ \mathbf{C} & \mathbf{D} \\ \end{bmatrix} = \begin{bmatrix} 1 & 0 & f + \delta x & 0 \\ 0 & 1 & 0 & f + \delta x \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ -1/f & 0 & 1 & 0 \\ 0 & -1/f & 0 & 1 \\ \end{bmatrix}.$$

Here, we set up our lens to have a 12.7mm radius (half of the clear aperture) and a focal length of 100mm. The Gaussian beam is given a beam waist radius of 3mm, a wavelength of $1\mu m$, and shifted 10mm above the optical axis in the entrance pupil. We then propagate the Gaussian beam at a distance of 101mm so that it is outside of the focal region.

Shown in Fig. 4 are the results of this simulation. Upon inspection, the Gaussian beams appear to be identical. When we compare the RMS difference of the two irradiances as a function of the oversampling factor (how much the Fresnel simulation was zero-padded), we observe that the difference asymptotically approaches zero. This is a strong indicator that we have derived the physics describing the propagation of a decentered Gaussian beam and can move forward to test their use in beamlet decomposition simulations.

 figure: Fig. 4.

Fig. 4. Consistency test of traditional Fresnel diffraction (left) and the proposed plane-evaluation algorithm (middle) for the case of a decentered Gaussian beam propagating to 1mm outside of focus. The Fresnel diffraction simulation on the left is plotted with an oversampling of 2. The results show that the irradiance profiles are nearly identical. Shown on the right is the RMS difference of the plots on the left and middle as a function of the oversampling of the Fresnel simulation. We see that we asymptotically approach zero as a function of oversampling, indicating that our proposed algorithm can correctly propagate a shifted Gaussian beam through an ABCD optical system.

Download Full Size | PDF

4.2 Test against point-evaluation algorithm

In a prior study, we published the point-evaluation method of GBD to evaluate its suitability in simulations for astronomical high-contrast imaging. The methods are published in their entirety in Ashcraft et al. 2023 [14] but are also a part of the Poke Python package in which we developed the plane-evaluation algorithm for this work. Using Poke as a simulation platform, we are able to easily compare the two implementations of GBD. We begin by testing both the point-evaluation and plane-evaluation algorithms against an analytical solution to the far-field diffraction pattern from a circular aperture: the Airy disk. We perform this simulation on a Ritchey-Chretien telescope with a 2.4m primary mirror modeled after the Hubble Space Telescope (HST) but remove the obscuration by the spiders and secondary mirror so that we can compare against the analytical solution. The prescription for this observatory is given in Table 2 of Appendix B.

The analytical Airy pattern was simulated using the POPPY (Physical Optics Propagation in Python) optical propagation package [22] using the poppy.misc.airy_2D function. Shown in Fig. 5 are the results of computing the difference of the GBD PSFs with the analytical Airy function. The residuals shown in the right column are very similar, but note that the plane-evaluation algorithm actually reconstructs the core of the PSF slightly better and has a lower RMS error to the Airy function. We suspect that the decrease in RMS error is due to the fewer propagations required by the plane-evaluation approach. GBD is very sensitive to the differential computation of the ray transfer matrix, and the point-evaluation method computes a different ray transfer matrix for every evaluated point. The cumulative error from using differential ray tracing to propagate to each evaluated point likely results in a less accurate simulation. This is a strong indicator of the proposed algorithm’s suitability to be used in beamlet decomposition simulations, but the biggest improvement occurs in the runtime of the simulations. The proposed algorithm was computed $11.5$ times faster than the point-evaluation method, meaning that we’ve gained an order of magnitude reduction in computation time in a PSF simulation that achieves twice the accuracy.

 figure: Fig. 5.

Fig. 5. Comparison of the GBD PSFs (left) with their absolute differences with the analytical Airy function (right). The plane-evaluation algorithm produced the result in (a), and its absolute difference is in (b). The point-evaluation algorithm developed in [14] is shown in (c), and its absolute difference is in (d). All data are $Log_{10}$ scaled to highlight the faint structure in the PSF. The high-frequency PSF residuals are very similar and of low magnitude, but the plane-evaluation algorithm shows smaller residuals, particularly in the PSF core, than the point-evaluation result. The RMS ($\sigma$) of the differences are given in the titles on the right, where the plane-evaluation algorithm has a lower RMS difference by about a factor of 2. The runtimes of the plane-evaluation and point-evaluation simulations were 113.78s and 1309.11s, respectively, using a computer with a 3GHz CPU and 16GB of RAM.

Download Full Size | PDF

4.3 Aberrating the PSF

For a final test of the functionality of the proposed algorithm, we simulated the aberration of the GBD PSF through surface errors on the primary mirror of the Hubble Space Telescope. We added an aperture with secondary support spiders and an aspheric sag to the primary mirror to aberrate the PSF and compare the plane-evaluation algorithm with that produced by Zemax’s Huygens PSF simulation. The results are shown in Fig. 6.

 figure: Fig. 6.

Fig. 6. Comparison of the plane-evaluation algorithm’s response to an aberrated optical system (a) to the Zemax OpticStudio Huygens PSF simulation (b) and their difference (c). (a) and (b) are plotted on a log scale to better resolve the PSF structure. We apply an aspheric sag to the primary mirror of the HST model described by Noll Zernike polynomials ($\Phi = \alpha Z_{7} + \alpha Z_{10}$, where $\alpha =10^{-7}$) to maintain the best focus position of the PSFs. Upon inspection, the fields in (a) and (b) are nearly identical, with an RMS difference of $\sigma = 2.7e^{-5}$. The largest features in (c) are on the order of 0.3${\%}$ difference. They can be explained by the discrepancies between GBD and more exact scalar diffraction methods rather than inherent inaccuracies in the proposed algorithm.

Download Full Size | PDF

This tests the plane-evaluation algorithm’s ability to recreate the diffraction effects from both sharp-edges and scalar aberration, indicating that, indeed, the proposed algorithm is suitable for imaging simulations while being much faster than the point-evaluation algorithm. We can now move forward to compare the speed of the plane-evaluation algorithm versus the point-evaluation algorithm to empirically examine how the proposed algorithm’s efficiency scales on central processing units (CPUs) and graphical processing units (GPUs).

5. Runtime comparison

Developing propagation routines with widely-used languages like Python enables us to easily scale up our computational power by using GPUs and high-performance computing resources. This is particularly useful in optical system modeling for rapid optimization and tolerancing. In this Section we report the key result of this study, the decrease in runtime achievable through use of the plane-evaluation algorithm we derived in Section 3. In Table 1, we report on the two machines used to test the runtime of the beamlet decomposition algorithms.

Tables Icon

Table 1. Description of the computational resources used in this investigation. The CPU results represent what an “average" machine can do since it is performed on a consumer-grade laptop. The GPU results represent how the algorithms can scale to supercomputers. TFLOPS is teraFLOPS, or floating point operations per second, assuming double-precision floats.

The Apple M1 processor is a consumer-grade CPU that is representative of what a typical researcher is likely to have available to them. The NVIDIA V100S GPU better represents how the proposed algorithm can scale to more expensive computational architectures. To best take advantage of these processing architectures, we construct the new beamlet decomposition method using two open-source Python packages.

numexpr is a Python package that allows for efficient multi-threading of elementary operations in Python and allows us to accelerate the remainder of the algorithm using CPUs. This package was used as the “accelerated math” option for the POPPY optical propagation package [23] before GPUs were formally supported. On GPUs we anticipate even greater accelerations due to the parallelizability of the GBD algorithm. The problem can be broken up along each beamlet and each pixel, and distributed over the thousands of cores implemented in GPUs. The cupy Python package is a numpy-compatible library for scientific computing on NVIDIA GPUs. We integrated the interchangeable backend system of prysm (see Section 4A of Dube et al. [2]) into Poke for easy migration of our calculations onto GPUs.

The resulting algorithms are available on the Poke repository [15], which are vectorized to maximize computational efficiency. In Appendix C, we also review the accelerations that were made to the elementary linear algebra functions that improved the runtime by 100x on CPUs. The results of our runtime comparison study are shown in Fig. 7.

 figure: Fig. 7.

Fig. 7. Runtime comparison of the point-evaluation (blue) and plane-evaluation (red) beamlet decomposition algorithms and their runtimes on the CPU (left) and GPU (right) listed in Table 1. Each point represents a simulation where we compute the PSF of a telescope with a circular aperture with an entrance pupil diameter of 2.4m, a $\lambda = 551nm$, and a focal length of 57.6m on a grid of 256 x 256 pixels with a pixel scale of 2.8mas. The points are located at the runtime given a number of beamlets across the pupil. On the CPU, the mean runtime of the plane-evaluation algorithm is 34 times faster. On the GPU, the point-evaluation algorithm is 39 times faster, and the plane-evaluation algorithm is 1,760 times faster. The advancement made by this study is given by the relative runtime of the plane-evaluation algorithm on the GPU, which is 67,513 times faster than the point-evaluation algorithm on CPUs.

Download Full Size | PDF

Across all cases studied, the plane-evaluation algorithm is at least one order of magnitude faster than the point-evaluation method. This is in part due to the analytical propagation of each beamlet to every point on the evaluation plane. A large portion of the computational resources of the point-evaluation method is spent determining the propagation of a beamlet to each point on the evaluation plane. Representing this information as a phasor instead allows for the rapid propagation of Gaussian beams. This also improves the memory efficiency because less information needs to be held by the RAM per beamlet. Less RAM per beamlet means that more can be processed at any given time, which is advantageous for our vectorized approach to GBD. The total runtime improvement contributed by this study using the data from Fig. 7 is the relative runtime decrease between the point-evaluation algorithm on CPUs (left blue) and the plane-evaluation algorithm on GPUs (right red). The mean runtime improvement per point comes out to about 67,513 times faster (not including the accelerations discussed in Appendix C). Another interesting result is that the runtime of the plane-evaluation algorithm on commercial CPUs is very close to the point-evaluation algorithm on a high-performance computing center GPU. This comparison highlights the contributions of this study: we’ve constructed an expression of GBD that is more accessible to the average researcher. These algorithmic advances in an open-source environment make GBD an accessible physical optics propagation technique, which we can use to learn more about degrees of freedom in a GBD simulation.

6. Optimal beamlet parameter search

One area of research that is not formally addressed in the literature is the best spatial distribution to decompose a circular aperture with Gaussian beams. Fundamental Gaussian modes do not represent a complete set, so an analytical decomposition of a plane wave truncated by a circular aperture is not derivable. Harvey et al. [7] introduced the overlap factor (OF) as a parameter for adjusting the width of evenly-spaced Gaussian beams. The OF is given in Eq. (24)

$$OF = \frac{N_{g} 2 \omega_{o}}{W},$$
where $N_{g}$ is the number of beamlets across an aperture, and $W$ is the width of the aperture. The goal is to determine the beamlet waist $\omega _{o}$, which is a function of the number of beamlets and the overlap factor. The optimal solution of these combinations is notionally suggested in Harvey et al. to be 1.5 (see Section 5 of [7]), but an investigation into the tradeoffs between the number of beamlets and overlap factor was not explored for imaging systems. The proposed algorithm’s efficiency simplifies this task, which we present in Fig. 8.

 figure: Fig. 8.

Fig. 8. The $Log_{10}$ RMS difference to the analytical Airy function for a circular aperture as a function of the number of beamlets traced across the pupil and the overlap factor used. The white circle indicates the minimum error case for this parameter space. The total runtime to compute these PSFs on GPUs is 24 seconds.

Download Full Size | PDF

In this experiment we evaluate the RMS difference of the analytical Airy function and a GBD simulation that aims to construct the Airy function. We evaluate this for overlap factors between 1 and 2 to explore the cases where adjacent beamlets are capable of smoothly reconstructing an aperture function. Note that extreme cases (e.g. $OF < 1$), beamlets do not meaningfully overlap and would have a much larger bearing on the accuracy of this simulation. Note also that the actual values of the RMS difference strongly depend on the pixel scale of the array computed and the array size, so we are principally interested in finding the lowest relative value in this trade space. The parameters used for the simulations in this section are given in Table 3 of Appendix D.

For imaging applications, it is clear that the overlap factors in this range do not have the largest bearing on the accuracy of the simulation. More critical is the number of beamlets used in the simulation. The RMS of the difference also more heavily weighs GBD’s ability to capture the structure of the PSF core. Another way of saying this is that the result in Fig. 8 shows off the best combination of $OF$ and number of beamlets to simulate a radiometrically correct PSF. To better understand how a number of beamlets and overlap factors affect different spatial frequencies, we can compute the RMS difference of different annular zones in the result.

In Fig. 9, we compute the RMS difference of the simulated PSF with the analytical Airy function considering only certain regions in the PSF. We simulate the Airy function from 0 to 15 $\lambda /D$ for the model observatory and break it into three annular regions (0-5, 5-10, 10-15$\lambda /D$).

 figure: Fig. 9.

Fig. 9. Comparison of the RMS difference between the simulated GBD PSFs and the analytical Airy function while varying overlap factor and number of beamlets across the pupil. (Left) The RMS difference of the PSFs from 0-5 $\lambda /D$. (Middle) The RMS difference of the PSFs from 5-10 $\lambda /D$. (Right) The RMS difference of the PSFs from 10-15 $\lambda /D$. The leftmost simulation follows the contour from Fig. 8, but the higher-frequency information follows a different curve entirely. Generally, a PSF simulation is more accurate when more rays are used. However, in under-sampled cases, small overlap factors are better for low frequencies but worse for high frequencies. A white circle shows the minimum error case for a given parameter space.

Download Full Size | PDF

Figure 9 illustrates a peculiarity of the degrees of freedom available in GBD. The overlap factor and number of beamlets impact different structures in the PSF differently. Low-spatial frequency information generally favors lower overlap factors, with a minimum of around 1.2. High-frequency information generally favors larger overlap factors, likely due to the mitigation of the amplitude ripple that is left from the beamlet decomposition [7]. This example illustrates the necessity of more efficient algorithms for beamlet decomposition. We are able to rapidly explore the free parameters in the decomposition and determine that the optimal set of parameters is generally dependent on what structure in the PSF is critical in simulation. Of interest is that for the higher-frequency information, we’ve recovered the recommended overlap factor of 1.5 from Harvey et al. [7].

7. Discussion

We present a new perspective on Weber’s formulation of the general Collins integral that enables the expeditious computing of beamlet decomposition algorithms. The plane-evaluation method is tested against numerical and analytical results to verify its accuracy and is shown to outperform the point-evaluation method in both speed and accuracy. Moreover, our solution to the Collins integral maintains Weber’s generality. The accelerated algorithm can be applied to the beamlet decomposition problem using any known solution to the Collins integral, including higher-order Gaussian beams and Worku and Gross’ truncated and pulsed beam solutions. The algorithm is developed in the open-source Python package Poke, making it accessible to all investigators interested in developing beamlet decomposition as a viable physical optics propagation technique.

For the case of GBD, we determined that in general, more beamlets are critical to the accuracy of simulations and that the overlap factor has little bearing on the results once a sufficient number of beamlets are traced. However, this is solely for the case of propagating from a circular aperture to focus. Other applications, such as in non-imaging or illumination systems, may consider different figures of merit and degrees of freedom to optimize their beamlet distribution for maximum simulation accuracy.

The acceleration granted by the plane-evaluation algorithm enables the rapid computation of fields produced by GBD that is nearly as general as prior published methods. The only additional assumption imposed is that the points where we are evaluating the field fall on a plane ($\mathbf {r}_{2}$ in Fig. 3) that is orthogonal to the optical axis. Using the rapid evaluations enabled by this algorithm, we can use GBD propagation in optimization problems. We can also generate more highly sampled simulations in a shorter amount of time, which was one of the limitations of our prior study [14]. The critical result of this study is that our derived propagation technique makes highly-sampled GBD practical. A reasonably highly-sampled simulation can be conducted in under a minute. Should the user have access to a GPU, they can take advantage of the parallelizability of GBD for even more accelerated simulations. We develop this algorithm in the Poke open-source Python package [15] so that it is freely available to all interested in pursuing the development of GBD as a propagation technique.

7.1 Future work

We present a preliminary exploration into the “optimal” combination of a number of beamlets and overlap factor for simulating the PSF of a circular aperture. However, Worku and Gross have already proven that the accuracy of a simulation can be increased by employing truncated Gaussian beams to simulate aperture edges [13]. Performing a study similar to that of Section 4 that considers the distribution and number of truncated beams would be an optimal next step for using this algorithm in imaging simulations.

Non-imaging systems can also benefit from the proposed propagation method. High-energy laser systems, for example, require a precise understanding of the irradiance distribution of a given laser beam. With the proposed propagation technique, we can conduct highly sampled and rapid simulations of a higher-order Gaussian beam through an optical system without imposing the paraxial assumption on the system.

8. Appendix A: Summary of Weber’s formulation of the Collins integral for misaligned optical elements

In this work, we depart from Weber’s reformulation of the Collins Integral to arrive at the expression of the field common to several beamlets propagating through a given optical system. For a review of Weber’s initial derivation, we briefly reproduce the results of Section 2 of their paper [16] to illustrate the differences.

Weber continues from where we left off in Eq. (16) by also substituting the coordinate $\mathbf {r}_{2}$ with the sum of the coordinate on the transversal plane $\mathbf {\tilde {r}_{2}}$ and the misalignment vector $\mathbf {r_{2M}}$. The result of the fully-expanded Collins integral for misaligned elements is given by Eq. (25).

$$\begin{aligned} E_{2}(\mathbf{\tilde{r}}_{2}) = & K \iint_{-\infty}^{\infty} E(\mathbf{\tilde{r}}_{1}) exp(\frac{-ik}{2}[\langle{\mathbf{\tilde{r}_{2}} + \mathbf{r_{2M}}}|\mathbf{D}\mathbf{B}^{{-}1}|{\mathbf{\tilde{r}_{2}} + \mathbf{r_{2M}}}\rangle\\ &+\langle{\mathbf{\tilde{r}}_{1} + \mathbf{r}_{1M}}|\mathbf{B}^{{-}1}\mathbf{A}|{\mathbf{\tilde{r}}_{1} + \mathbf{r}_{1M}}\rangle\\ & -2\langle{\mathbf{\tilde{r}}_{1} + \mathbf{r}_{1M}}|\mathbf{B}^{{-}1}|{\mathbf{\tilde{r}_{2}} + \mathbf{r_{2M}}}\rangle]) d^{2}\mathbf{\tilde{r}}_{1} \end{aligned}$$

Expanding the phasor in Eq. (25) permits the separation of the terms that are linear in $\mathbf {\tilde {r}}_{1}$ and $\mathbf {\tilde {r}}_{2}$, as shown in Eq. (26),

$$\begin{aligned} E_{2}(\mathbf{\tilde{r}}_{2}) & = K \iint_{-\infty}^{\infty} E(\mathbf{\tilde{r}}_{1}) exp(-\frac{ik}{2}[\\ & \langle{\mathbf{\tilde{r}}_{2}}|\mathbf{D}\mathbf{B}^{{-}1}|{\mathbf{\tilde{r}}_{2}}\rangle + 2\langle{\mathbf{\tilde{r}}_{2}}|\mathbf{B}^{{-}1}|{\mathbf{\tilde{r}}_{1}}\rangle + \langle{\mathbf{\tilde{r}}_{1}}|\mathbf{B}^{{-}1}\mathbf{A}|{\mathbf{\tilde{r}}_{1}}\rangle +\\ & \langle{\mathbf{r}_{1M}}|\mathbf{B}^{{-}1}\mathbf{A}|{\mathbf{r}_{1M}}\rangle + \langle{\mathbf{r}_{2M}}|\mathbf{D}\mathbf{B}^{{-}1}|{\mathbf{r}_{2M}}\rangle + 2\langle{\mathbf{r}_{2M}}|\mathbf{B}^{{-}1}|{\mathbf{r}_{1M}}\rangle +\\ & \langle{\mathbf{\tilde{r}}_{2}}|\mathbf{D}\mathbf{B}^{{-}1}|{\mathbf{r}_{2M}}\rangle + \langle{\mathbf{r}_{2M}}|\mathbf{D}\mathbf{B}^{{-}1}|{\mathbf{\tilde{r}}_{2}}\rangle + \langle{\mathbf{\tilde{r}}_{1}}|\mathbf{B}^{{-}1}\mathbf{A}|{\mathbf{r}_{1M}}\rangle +\\ & \langle{\mathbf{r}_{1M}}|\mathbf{B}^{{-}1}\mathbf{A}|{\mathbf{\tilde{r}}_{1}}\rangle + 2\langle{\mathbf{\tilde{r}}_{1}}|\mathbf{B}^{{-}1}|{\mathbf{r}_{2M}}\rangle + 2\langle{\mathbf{r}_{1M}}|\mathbf{B}^{{-}1}|{\mathbf{\tilde{r}}_{2}}\rangle])d^{2}\mathbf{\tilde{r}}_{1}, \end{aligned}$$
where the top two rows of the phasor are the terms that aren’t linear in $\mathbf {\tilde {r}}_{1}$ and $\mathbf {\tilde {r}}_{2}$ and the bottom two rows are the terms that are. Using the relation $\langle {\mathbf {r}_{1M}}|\mathbf {B}^{-1}\mathbf {A}|{\mathbf {\tilde {r}}_{1}}\rangle = \langle {\mathbf {\tilde {r}}_{1}}|(\mathbf {A}\mathbf {B}^{-1})^{T}|{\mathbf {r}_{1M}}\rangle$ and the relations for symplectic matrices in Eqs. (8)–(12), Weber shows that the terms linear in $\mathbf {\tilde {r}}_{1}$ and $\mathbf {\tilde {r}}_{2}$ vanish. What remains are the top two rows of the phasor in Eq. (26). The first row corresponds to the phasor of the aligned Collins integral (Eq. (13)), and the phasor in the second row that isn’t a function of the variable of integration. Consequently, it can be factored out of the equation, resulting in the propagation law defined by Eq. (27).
$$E_{2}(\mathbf{\tilde{r}}_{2}) = E_{2}(\mathbf{\tilde{r}}_{2})_{aligned} exp(\frac{-ik}{2}[\langle{\mathbf{r}_{1M}}|\mathbf{B}^{{-}1}\mathbf{A}|{\mathbf{r}_{1M}}\rangle + \langle{\mathbf{r}_{2M}}|\mathbf{D}\mathbf{B}^{{-}1}|{\mathbf{r}_{2M}}\rangle - 2\langle{\mathbf{r}_{1M}}|\mathbf{B}^{{-}1}|{\mathbf{r}_{2M}}\rangle]),$$
where $E_{2}(\mathbf {\tilde {r}}_{2})_{aligned}$ is the aligned solution to the Collins integral in Eq. (13). Applying the propagation formulas given in Eqs. (28) and (29), this can be simplified to Weber’s original solution (Eq. (17)), which is the Collins integral expressed along the center of gravity vector that the beam propagates along.
$$\mathbf{r}_{2M} = \mathbf{A} \mathbf{r}_{1M} + \mathbf{B} \mathbf{\theta}_{1M},$$
$$\mathbf{\theta}_{2M} = \mathbf{C} \mathbf{r}_{1M} + \mathbf{D} \mathbf{\theta}_{1M}.$$

 figure: Fig. 10.

Fig. 10. Runtime comparison of elementary functions used in the GBD algorithm. Error bars represent the standard deviation of 25 trials on arrays of size $N_{pix} \times N_{pix} \times 2 \times 2$. numpy’s linear algebra library can operate on arrays of arbitrary shape, but the GBD algorithm only requires these operations on $2\times 2$ matrices. For large arrays, our approach is $\approx$50 times faster for computing the determinant and 150-250 times faster for the inverse and eigenvalue computation on an M1 CPU.

Download Full Size | PDF

9. Appendix B: Hubble Space Telescope model

Tables Icon

Table 2. Optical system prescription for the RC telescope based on the HST used in this investigation. All distances are given in meters. RoC stands for Radius of Curvature, and the sign convention is chosen such that negative values are concave and positive values are convex.

10. Appendix C: Notes on elementary function computation

GBD requires the evaluation of a determinant to compute the amplitude factor, a matrix inverse to compute the propagated $\mathbf {Q}_{2}^{-1}$, and the determination of eigenvalues to compute the Gouy phase $\mathbf {\Phi }_{gouy}$ [13]. In Python, one would typically call the numpy.linalg library to compute these values. But after profiling our algorithm using the Viztracer [24] Python profiler, we found that the largest contributors to our runtime were these functions.

Shown in Fig. 10 are the results of a runtime comparison between these functions calling numpy.linalg and our implementation of the same functions for 2x2 matrices, which can be found in college-level linear algebra textbooks. We only need these operations for 2x2 matrices, so they were trivial to implement in our algorithm. This shows that for large arrays (e.g. 500x500x2x2 complex-valued matrices), we can improve the determinant calculation by a factor of 50x and the inverse and eigenvalue calculations by a factor of 100-200x. These data were generated using numpy version 1.23.5.

11. Appendix D: PSF simulation parameters for the optimal beamlet parameter search

Tables Icon

Table 3. Parameters used in the optimal beamlet parameter search. These data were used with the Ansys Zemax OpticStudio ray tracing engine and the Poke open-source physical optics package. They can be found in the experiments/weber directory of the Poke GitHub repository.

Acknowledgments

This research made use of several open-source Python packages, including POPPY [22], HCIPy [4], prysm [3], numpy [25], matplotlib [26], ipython [27], scipy [28], and numexpr [29]. This research made use of high performance computing (HPC) resources supported by the University of Arizona TRIF (Technology and Research Initiative Fund), UITS, and Research, Innovation, and Impact (RII) and maintained by the UArizona Research Technologies department. This work was supported by a NASA Space Technology Graduate Research Opportunity.

Disclosures

The authors declare no conflicts of interest.

Data Availability

Data underlying the results presented in this paper are available at the Poke repository on GitHub [15,30].

References

1. M. Perrin, J. Long, E. Douglas, et al., “POPPY: Physical Optics Propagation in PYthon,” Astrophysics Source Code Library, record ascl:1602.018 (2016).

2. B. D. Dube, A. J. Riggs, B. D. Kern, et al., “Exascale integrated modeling of low-order wavefront sensing and control for the Roman Coronagraph instrument,” J. Opt. Soc. Am. A 39(12), C133 (2022). [CrossRef]  

3. B. Dube, “prysm: A python optics module,” J. Open Source Softw. 4(37), 1352 (2019). [CrossRef]  

4. E. H. Por, S. Y. Haffert, V. M. Radhakrishnan, et al., “High Contrast Imaging for Python (HCIPy): an open-source adaptive optics and coronagraph simulator,” in Adaptive Optics Systems VI, vol. 10703 of Proc. SPIE (2018).

5. M. Mansuripur, “Distribution of light at and near the focus of high-numerical-aperture objectives,” J. Opt. Soc. Am. A 3(12), 2086 (1986). [CrossRef]  

6. J. E. Krist, L. Pueyo, and S. B. Shaklan, “Practical numerical propagation of arbitrary wavefronts through PIAA optics,” (San Diego, California, USA, 2010), p. 77314N.

7. J. E. Harvey, R. G. Irvin, and R. N. Pfisterer, “Modeling physical optics phenomena by complex ray tracing,” Opt. Eng. 54(3), 035105 (2015). [CrossRef]  

8. A. W. Greynolds, “Fat rays revisited: a synthesis of physical and geometrical optics with Gaußlets,” in International Optical Design Conference 2014, vol. 9293M. Figueiro, S. Lerner, J. Muschaweck, et al., eds., International Society for Optics and Photonics (SPIE, 2014), pp. 521–529.

9. A. W. Greynolds, “Vector Formulation Of The Ray-Equivalent Method For General Gaussian Beam Propagation,” in Current Developments in Optical Engineering and Diffraction Phenomena, vol. 0679R. E. Fischer, J. E. Harvey, and W. J. Smith, eds., International Society for Optics and Photonics (SPIE, 1986), pp. 129–133.

10. N. G. Worku and H. Gross, “Vectorial field propagation through high NA objectives using polarized Gaussian beam decomposition,” in Optical Trapping and Optical Micromanipulation XIV, K. Dholakia and G. C Spalding, eds. (SPIE, San Diego, United States, 2017), p. 33.

11. N. G. Worku, R. Hambach, and H. Gross, “Decomposition of a field with smooth wavefront into a set of gaussian beams with non-zero curvatures,” J. Opt. Soc. Am. A 35(7), 1091–1102 (2018). [CrossRef]  

12. N. G. Worku and H. Gross, “Gaussian pulsed beam decomposition for propagation of ultrashort pulses through optical systems,” J. Opt. Soc. Am. A 37(1), 98–107 (2020). [CrossRef]  

13. N. G. Worku and H. Gross, “Propagation of truncated gaussian beams and their application in modeling sharp-edge diffraction,” J. Opt. Soc. Am. A 36(5), 859–868 (2019). [CrossRef]  

14. J. N. Ashcraft, E. S. Douglas, D. Kim, et al., “Hybrid propagation physics for the design and modeling of astronomical observatories: a coronagraphic example,” SPIE Journal of Astronomical Telescopes Insturments and Systems, arXiv, arXiv:2310.20026 (2023). [CrossRef]  

15. J. N. Ashcraft, E. S. Douglas, D. Kim, et al., “Poke: an open-source, ray-based physical optics platform,” in Optical Modeling and Performance Predictions XIII, vol. 12664M. A Kahan, ed., International Society for Optics and Photonics (SPIE, 2023), p. 1266404.

16. H. Weber, “Collins’ integral for misaligned optical elements,” J. Mod. Opt. 53(18), 2793–2801 (2006). [CrossRef]  

17. S. A. Collins, “Lens-system diffraction integral written in terms of matrix optics*,” J. Opt. Soc. Am. 60(9), 1168–1177 (1970). [CrossRef]  

18. J. W. Goodman, “Introduction to fourier optics,” Introduction to Fourier optics, 3rd ed., by JW Goodman. Englewood, CO: Roberts & Co. Publishers, 2005 1 (2005).

19. A. E. Siegman, Lasers (University Science Books, Mill Valley, CA, 1986). Includes bibliographical references and index.

20. Y. Cai and Q. Lin, “The elliptical hermite–gaussian beam and its propagation through paraxial systems,” Opt. Commun. 207(1-6), 139–147 (2002). [CrossRef]  

21. Z. Mei, J. Guo, and D. Zhao, “Decentred elliptical laguerre–gaussian beam,” J. Opt. A: Pure Appl. Opt. 8(1), 77–84 (2006). [CrossRef]  

22. M. D. Perrin, R. Soummer, E. M. Elliott, et al., “Simulating point spread functions for the James Webb Space Telescope with WebbPSF,” in Space Telescopes and Instrumentation 2012: Optical, Infrared, and Millimeter Wave, vol. 8442M. C. Clampin, G. G. Fazio, H. A. MacEwen, et al., eds., International Society for Optics and Photonics (SPIE, 2012), pp. 1193–1203.

23. E. S. DouglasM. D. Perrin, “Accelerated modeling of near and far-field diffraction for coronagraphic optical systems,” in Space Telescopes and Instrumentation 2018: Optical, Infrared, and Millimeter Wave, vol. 10698M. Lystrup, H. A. MacEwen, G. G. Fazio, et al., eds., International Society for Optics and Photonics (SPIE, 2018), pp. 864–877.

24. T. G., “Viztracer,” https://github.com/gaogaotiantian/viztracer (2022).

25. C. R. Harris, K. J. Millman, S. J. van der Walt, et al., “Array programming with NumPy,” Nature 585(7825), 357–362 (2020). [CrossRef]  

26. J. D. Hunter, “Matplotlib: A 2d graphics environment,” Comput. Sci. Eng. 9(3), 90–95 (2007). [CrossRef]  

27. F. Pérez and B. E. Granger, “IPython: a system for interactive scientific computing,” Comput. Sci. Eng. 9(3), 21–29 (2007). [CrossRef]  

28. P. Virtanen, R. Gommers, T. E. Oliphant, et al., “SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python,” Nat. Methods 17(3), 261–272 (2020). [CrossRef]  

29. R. McLeod, F. Alted, A. Valentino, et al., “pydata/numexpr: Numexpr v2.6.9,” (2018).

30. J. Ashcraft, “Jashcraf/poke: v1.0.1,” Github (2023) [accessed 10 Mar 2024], github.com/Jashcraf/poke

Data Availability

Data underlying the results presented in this paper are available at the Poke repository on GitHub [15,30].

15. J. N. Ashcraft, E. S. Douglas, D. Kim, et al., “Poke: an open-source, ray-based physical optics platform,” in Optical Modeling and Performance Predictions XIII, vol. 12664M. A Kahan, ed., International Society for Optics and Photonics (SPIE, 2023), p. 1266404.

30. J. Ashcraft, “Jashcraf/poke: v1.0.1,” Github (2023) [accessed 10 Mar 2024], github.com/Jashcraf/poke

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (10)

Fig. 1.
Fig. 1. Illustration of Gaussian beamlet decomposition. An input wavefront (left) is decomposed into a sum of Gaussian beams. They are propagated through an optical system, and the resultant aberrated wavefront is represented by a change in optical path of each of the beamlets (right).
Fig. 2.
Fig. 2. Schematic of the “traditional” GBD algorithm referred to in this work. The decomposition is referred to at the entrance pupil of the optical system $\mathbf{\mathcal{P}}_{EP}$, where rays that emanate normal to the center of a beamlet ($\mathbf{\mathcal{R}}_{EP}$) are propagated through an optical system to the plane where we desire to evaluate the field $\mathbf{\mathcal{R}}_{eval}$. In order to perform the field evaluation, the rays must be transformed to the transversal plane $\mathbf{\mathcal{P}}_{trans}$ and the Gaussian field evaluated at $\mathbf {r}_{eval}$ expressed in the coordinates of the transversal plane. The procedure to do so is described in [14].
Fig. 3.
Fig. 3. Illustration of Weber’s misaligned optical system given by an arbitrary ray transfer matrix. Nominally, the Collins integral relates the field at the object plane (left side, dashed vertical line) to the evaluation plane (right side, dashed vertical line). Weber’s formulation reconsiders the Collins integral propagation along the center of gravity vector in object space $\mathbf {v}_{CG,O}$ to the one in evaluation space $\mathbf {v}_{CG,I}$ by adding a misalignment vector in position ($\mathbf {r}_{1,2M}$) and angle ($\mathbf {\theta }_{1,2M}$) and evaluating the integral on the plane transverse to the centers of gravity $\mathbf {\tilde {r}}_{1,2}$.
Fig. 4.
Fig. 4. Consistency test of traditional Fresnel diffraction (left) and the proposed plane-evaluation algorithm (middle) for the case of a decentered Gaussian beam propagating to 1mm outside of focus. The Fresnel diffraction simulation on the left is plotted with an oversampling of 2. The results show that the irradiance profiles are nearly identical. Shown on the right is the RMS difference of the plots on the left and middle as a function of the oversampling of the Fresnel simulation. We see that we asymptotically approach zero as a function of oversampling, indicating that our proposed algorithm can correctly propagate a shifted Gaussian beam through an ABCD optical system.
Fig. 5.
Fig. 5. Comparison of the GBD PSFs (left) with their absolute differences with the analytical Airy function (right). The plane-evaluation algorithm produced the result in (a), and its absolute difference is in (b). The point-evaluation algorithm developed in [14] is shown in (c), and its absolute difference is in (d). All data are $Log_{10}$ scaled to highlight the faint structure in the PSF. The high-frequency PSF residuals are very similar and of low magnitude, but the plane-evaluation algorithm shows smaller residuals, particularly in the PSF core, than the point-evaluation result. The RMS ($\sigma$) of the differences are given in the titles on the right, where the plane-evaluation algorithm has a lower RMS difference by about a factor of 2. The runtimes of the plane-evaluation and point-evaluation simulations were 113.78s and 1309.11s, respectively, using a computer with a 3GHz CPU and 16GB of RAM.
Fig. 6.
Fig. 6. Comparison of the plane-evaluation algorithm’s response to an aberrated optical system (a) to the Zemax OpticStudio Huygens PSF simulation (b) and their difference (c). (a) and (b) are plotted on a log scale to better resolve the PSF structure. We apply an aspheric sag to the primary mirror of the HST model described by Noll Zernike polynomials ($\Phi = \alpha Z_{7} + \alpha Z_{10}$, where $\alpha =10^{-7}$) to maintain the best focus position of the PSFs. Upon inspection, the fields in (a) and (b) are nearly identical, with an RMS difference of $\sigma = 2.7e^{-5}$. The largest features in (c) are on the order of 0.3${\%}$ difference. They can be explained by the discrepancies between GBD and more exact scalar diffraction methods rather than inherent inaccuracies in the proposed algorithm.
Fig. 7.
Fig. 7. Runtime comparison of the point-evaluation (blue) and plane-evaluation (red) beamlet decomposition algorithms and their runtimes on the CPU (left) and GPU (right) listed in Table 1. Each point represents a simulation where we compute the PSF of a telescope with a circular aperture with an entrance pupil diameter of 2.4m, a $\lambda = 551nm$, and a focal length of 57.6m on a grid of 256 x 256 pixels with a pixel scale of 2.8mas. The points are located at the runtime given a number of beamlets across the pupil. On the CPU, the mean runtime of the plane-evaluation algorithm is 34 times faster. On the GPU, the point-evaluation algorithm is 39 times faster, and the plane-evaluation algorithm is 1,760 times faster. The advancement made by this study is given by the relative runtime of the plane-evaluation algorithm on the GPU, which is 67,513 times faster than the point-evaluation algorithm on CPUs.
Fig. 8.
Fig. 8. The $Log_{10}$ RMS difference to the analytical Airy function for a circular aperture as a function of the number of beamlets traced across the pupil and the overlap factor used. The white circle indicates the minimum error case for this parameter space. The total runtime to compute these PSFs on GPUs is 24 seconds.
Fig. 9.
Fig. 9. Comparison of the RMS difference between the simulated GBD PSFs and the analytical Airy function while varying overlap factor and number of beamlets across the pupil. (Left) The RMS difference of the PSFs from 0-5 $\lambda /D$. (Middle) The RMS difference of the PSFs from 5-10 $\lambda /D$. (Right) The RMS difference of the PSFs from 10-15 $\lambda /D$. The leftmost simulation follows the contour from Fig. 8, but the higher-frequency information follows a different curve entirely. Generally, a PSF simulation is more accurate when more rays are used. However, in under-sampled cases, small overlap factors are better for low frequencies but worse for high frequencies. A white circle shows the minimum error case for a given parameter space.
Fig. 10.
Fig. 10. Runtime comparison of elementary functions used in the GBD algorithm. Error bars represent the standard deviation of 25 trials on arrays of size $N_{pix} \times N_{pix} \times 2 \times 2$. numpy’s linear algebra library can operate on arrays of arbitrary shape, but the GBD algorithm only requires these operations on $2\times 2$ matrices. For large arrays, our approach is $\approx$50 times faster for computing the determinant and 150-250 times faster for the inverse and eigenvalue computation on an M1 CPU.

Tables (3)

Tables Icon

Table 1. Description of the computational resources used in this investigation. The CPU results represent what an “average" machine can do since it is performed on a consumer-grade laptop. The GPU results represent how the algorithms can scale to supercomputers. TFLOPS is teraFLOPS, or floating point operations per second, assuming double-precision floats.

Tables Icon

Table 2. Optical system prescription for the RC telescope based on the HST used in this investigation. All distances are given in meters. RoC stands for Radius of Curvature, and the sign convention is chosen such that negative values are concave and positive values are convex.

Tables Icon

Table 3. Parameters used in the optimal beamlet parameter search. These data were used with the Ansys Zemax OpticStudio ray tracing engine and the Poke open-source physical optics package. They can be found in the experiments/weber directory of the Poke GitHub repository.

Equations (30)

Equations on this page are rendered with MathJax. Learn more.

E 2 ( r 2 , z ) = e i k z i λ z e i k 2 z | r 2 | 2 E 1 ( r 1 , 0 ) e i k 2 z | r 1 | 2 e i 2 π λ z ( r 1 r 2 ) d 2 r 1 .
E 2 ( r 2 , z ) = e i k z i λ z e i k 2 z | r 2 | 2 E 1 ( r 1 , 0 ) e i 2 π λ z ( r 1 r 2 ) d 2 r 1 ,
E 2 ( r 2 , z ) = j = 0 N E o , j q ( z ) j e ( i k | r 2 r j | 2 2 q ( z ) j ) ,
q ( z ) 1 = 1 R ( z ) + i λ π w ( z ) 2 .
E 2 ( r 2 ) = 1 λ B e D | r 2 | 2 E 1 ( r 1 ) e x p [ i π λ B ( A | r 1 | 2 2 ( r 1 r 2 ) + D | r 2 2 | ) d 2 r 1 ,
E 2 ( r 2 ) = i   n 1 λ   d e t ( B ) 1 / 2 e x p ( i k l o ) E ( r 1 ) e x p ( i k l 1 ) d 2 r 1 ,
l 1 = 1 2 ( r 1 r 2 ) T ( n 1 B 1 A n 1 B 1 n 2 ( C D B 1 A ) D B 1 ) ( r 1 r 2 ) ,
B A T = A B T
B T D = D T B
D C T = C D T
A T C = C T A
A D T B C T = I ,
E 2 ( r 2 ) = K E ( r 1 ) e x p ( i k 2 [ r 2 | D B 1 | r 2 + r 1 | B 1 A | r 1 2 r 1 | B 1 | r 2 ] ) d 2 r 1 ,
v j M = v j C G v j O A = ( r j M θ j M ) ; j = 1 , 2
v 1 = v ~ 1 + v 1 M .
E 2 ( r 2 ) = K E ( r ~ 1 ) e x p ( i k 2 [ r 2 | D B 1 | r 2 + r ~ 1 + r 1 M | B 1 A | r ~ 1 + r 1 M 2 r ~ 1 + r 1 M | B 1 | r 2 ] ) d 2 r ~ 1
E 2 ( r 2 ~ ) = E 2 , a l i g n e d ( r 2 ~ ) e x p ( i k 2 [ r 1 M | | θ 1 M r 2 M | | θ 2 M ] ) ,
E 2 ( r 2 ) = K E ( r ~ 1 ) e x p ( i k 2 [ r 2 | D B 1 | r 2 + r ~ 1 | B 1 A | r ~ 1 2 r ~ 1 | B 1 | r 2 + r 1 M | B 1 A | r 1 M 2 r 1 M | B 1 | r 2 ] ) d 2 r ~ 1 .
Φ m i s a l i g n e d = e x p ( i k 2 [ r 1 M | B 1 A | r 1 M 2 r 1 M | B 1 | r 2 ] ] )
E 2 ( r 2 ) = Φ m i s a l i g n e d ( K E ( r ~ 1 ) e x p ( i k 2 [ r 2 | D B 1 | r 2 + r ~ 1 | B 1 A | r ~ 1 2 r ~ 1 | B 1 | r 2 ] ) d 2 r ~ 1 . ) ,
E 2 ( r 2 ) = Φ m i s a l i g n e d E 2 , a l i g n e d ( r 2 ) .
E 2 ( r 2 ) = e i k l o | A + B Q 1 1 | e i k 2 r 2 | Q 2 1 | r 2
Q 2 1 = ( C + D Q 1 1 ) ( A + B Q 1 1 ) 1
[ A B C D ] = [ 1 0 f + δ x 0 0 1 0 f + δ x 0 0 1 0 0 0 0 1 ] [ 1 0 0 0 0 1 0 0 1 / f 0 1 0 0 1 / f 0 1 ] .
O F = N g 2 ω o W ,
E 2 ( r ~ 2 ) = K E ( r ~ 1 ) e x p ( i k 2 [ r ~ 2 + r 2 M | D B 1 | r ~ 2 + r 2 M + r ~ 1 + r 1 M | B 1 A | r ~ 1 + r 1 M 2 r ~ 1 + r 1 M | B 1 | r ~ 2 + r 2 M ] ) d 2 r ~ 1
E 2 ( r ~ 2 ) = K E ( r ~ 1 ) e x p ( i k 2 [ r ~ 2 | D B 1 | r ~ 2 + 2 r ~ 2 | B 1 | r ~ 1 + r ~ 1 | B 1 A | r ~ 1 + r 1 M | B 1 A | r 1 M + r 2 M | D B 1 | r 2 M + 2 r 2 M | B 1 | r 1 M + r ~ 2 | D B 1 | r 2 M + r 2 M | D B 1 | r ~ 2 + r ~ 1 | B 1 A | r 1 M + r 1 M | B 1 A | r ~ 1 + 2 r ~ 1 | B 1 | r 2 M + 2 r 1 M | B 1 | r ~ 2 ] ) d 2 r ~ 1 ,
E 2 ( r ~ 2 ) = E 2 ( r ~ 2 ) a l i g n e d e x p ( i k 2 [ r 1 M | B 1 A | r 1 M + r 2 M | D B 1 | r 2 M 2 r 1 M | B 1 | r 2 M ] ) ,
r 2 M = A r 1 M + B θ 1 M ,
θ 2 M = C r 1 M + D θ 1 M .
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.