Optimize

Optimization Utilities for PyTorch-SciPy Integration

This submodule provides a set of tools for constrained and unconstrained optimization of PyTorch models using SciPy optimizers. It bridges the gap between SciPy’s powerful optimization routines and PyTorch’s autograd system, enabling flexible and efficient hybrid optimization workflows.

Key Features:

  • Seamless wrapping of PyTorch-based objective functions for use with SciPy.

  • Automatic gradient computation using PyTorch’s autograd.

  • Support for parameter bounds, including custom mask-based bounds.

  • Caching and reuse of recent function/gradient evaluations.

  • Integration with SciPy’s minimize.

  • Optional tracking of optimization history (function values and gradient norms).

  • Utility functions for flattening/unpacking tensor parameters.

  • Conversion of PyTorch parameters to SciPy-compatible formats with bounds.

  • Support for custom constraints and callback functions.

Optimization Constraints in Optical Systems

When using optimization procedures to attain parameters of an optical system, it is important to have constraints that ensure that the optical system can be manufactured. The following demonstrates the implementation of different types of constraints in our library, with a specific focus on the positive air spacing and minimum glass thickness constraints.

Constraint optimization problems can often be expressed as a nonlinear program, which is defined as follows (see [GZ18]):

\[\min_{p} \quad m(p)\]
\[\text{subject to} \quad \hat{g}_i(p) \leq 0, \quad i = 1, \ldots, N_1,\]
\[\text{subject to} \quad \hat{h}_j(p) = 0, \quad j = 1, \ldots, N_2,\]

where: - \(p \in \mathbb{R}^n\) is the vector of parameters. - \(m: \mathbb{R}^n \to \mathbb{R}\) is the nonlinear objective (merit) function. - \(\hat{g}_i: \mathbb{R}^n \to \mathbb{R}\) are the inequality constraint functions. - \(\hat{h}_j: \mathbb{R}^n \to \mathbb{R}\) are the equality constraint functions.

For this type of problem, multiple numerical schemes are available in the Python library SciPy. Some optimization schemes also require derivative information for functions that describe constraints. For example, Sequential Least Squares Programming (SLSQP) uses the derivatives of the constraint functions \(\hat{g}_i\) and \(\hat{h}_j\) to find local minima.

By combining the libraries PyTorch and SciPy, we leverage the strengths of two sophisticated and established libraries:

  1. PyTorch: Efficiently calculates the derivatives of the merit function \(m\) and the constraint functions \(\hat{g}_i\) and \(\hat{h}_j\) using automatic differentiation. Additionally, it allows evaluation of these functions and their derivatives on a graphics card, providing significant speedups.

  2. SciPy: Provides well-tested traditional algorithms to find local minima. While PyTorch also has a wide variety of optimization algorithms, its main application is stochastic gradient descent in deep learning, which may not be the best choice for optimizing optical systems.

Types of Constraints

In our library, we implemented three ways to define constraints:

  1. Bounds Most numerical schemes in SciPy support bounding box constraints, allowing the definition of minimum and maximum values for each parameter. These bounds can be interpreted as constraints in the form \(\hat{g}_i(p) = p - C_i\) or \(\hat{g}_i(p) = C_i - p\), where \(C_i \in \mathbb{R}\). This is particularly useful for distance transformations, where we can ensure that the distance parameter is never smaller than 0. For example:

    >>> import diffinytrace as dit
    >>> import torch
    >>> dist_transform = dit.transforms.Distance(10.)
    >>> dist_transform.distance.bounds = torch.tensor([5.0, torch.inf])
    

    Here, torch.inf indicates that the distance can be arbitrarily large, with no upper bound.

  2. Constant Variables If a specific parameter should be fixed, PyTorch allows disabling gradient computation for that parameter. For example:

    >>> import diffinytrace as dit
    >>> distance_transform = dit.transforms.Distance(10.)
    >>> distance_transform.distance.requires_grad = False
    

    Note: While it is easy to set specific parameters as constants, it is not possible to disable gradient computation for individual parameters if the variable contains multiple values. For instance, in the case of a B-spline surface, it is not possible to disable gradient computation for individual B-spline coefficients.

  3. Arbitrary Constraint Functions Our library also supports defining nonlinear inequality constraint functions \(\hat{g}_i\) and equality constraint functions \(\hat{h}_i\). Some local optimization methods require derivative information for these nonlinear constraint functions. To efficiently evaluate these derivatives, we use automatic differentiation. This is achieved by defining the constraint functions \(\hat{g}_i\) with PyTorch and calculating their derivatives with respect to the parameters of the optical system. This approach eliminates the need for finite differences, which could significantly slow down the optimization procedure.

diffinytrace.optimize.make_bounds_from_param(param)[source]

Creates default bounds (-∞, ∞) for each element of the input tensor.

This function returns a tensor of shape param.shape + [2], where the last dimension represents the lower and upper bounds for each element in param.

Parameters:

param (torch.Tensor) – A tensor for which bounds should be created.

Returns:

A tensor of shape param.shape + [2] where […, 0] = -inf (lower bounds) and […, 1] = inf (upper bounds), with the same dtype and device as param.

Return type:

torch.Tensor

diffinytrace.optimize.make_parameter_from_input(input, bounds=None, dtype=None, device=None, bounds_attr_name='bounds')[source]

Converts input to a torch.nn.Parameter and attaches bounds as an attribute.

Parameters:
  • input (array-like or torch.Tensor) – Input data.

  • bounds (torch.Tensor, optional) – Bounds to attach to the parameter.

  • dtype (torch.dtype, optional) – Desired tensor data type.

  • device (torch.device, optional) – Device to store the parameter on.

  • bounds_attr_name (str) – Attribute name used to store bounds.

Returns:

The parameter with bounds attached as an attribute.

Return type:

torch.nn.Parameter

diffinytrace.optimize.pack_tensors(tensor_list: List[Tensor]) Tensor[source]

Flattens and concatenates a list of tensors into a single 1D tensor.

Parameters:

tensor_list (list of torch.Tensor or torch.Tensor) – Input tensor(s).

Returns:

A 1D tensor.

Return type:

torch.Tensor

diffinytrace.optimize.unpack_tensors(packed_tensor: Tensor, shapes: List[Tuple[int]]) List[Tensor][source]

Unpacks a 1D tensor into a list of tensors with specified shapes.

Parameters:
  • packed_tensor (torch.Tensor) – The flat tensor to unpack.

  • shapes (list of tuple) – Target shapes for unpacked tensors.

Returns:

Unpacked tensors with original shapes.

Return type:

list of torch.Tensor

diffinytrace.optimize.apply_vec_to_params(vec: ndarray, params: list[Parameter], device=None, dtype=None)[source]

Updates PyTorch parameters with values from a flattened NumPy vector.

This function is used in optimization workflows to update parameter values during SciPy optimization. It takes a flat vector of parameter values and distributes them back to the original parameter tensors, preserving their original shapes.

Parameters:
  • vec (np.ndarray) – A 1D NumPy array containing new parameter values. The length must match the total number of elements across all parameters.

  • params (list[torch.nn.Parameter]) – List of PyTorch parameters to update. Each parameter will be reshaped from the corresponding portion of vec.

  • device (torch.device, optional) – Target device for the parameters. If None, uses the device of the first parameter. Defaults to None.

  • dtype (torch.dtype, optional) – Target data type for the parameters. If None, uses the dtype of the first parameter. Defaults to None.

Raises:

RuntimeError – If vec is not a NumPy array.

Example

>>> import torch
>>> import numpy as np
>>> import diffinytrace as dit
>>>
>>> # Create some parameters
>>> params = [
...     torch.nn.Parameter(torch.ones((2,2)))*0.25,
...     torch.nn.Parameter(torch.ones(3))
... ]
>>> # Flatten parameters to create a vector
>>> vec = dit.optimize.pack_tensors(params).detach().cpu().numpy()
>>> print(f"Vector length: {len(vec)}")  # Should be 2*2 + 3 = 7
>>> # Modify the vector
>>>
>>> print(params)
>>>
>>> vec_new = vec * 2.0
>>> # Update parameters with new values
>>> dit.optimize.apply_vec_to_params(vec_new, params)
>>>
>>> # Parameters are now updated with doubled values
>>> print(params)

Note

  • This function modifies parameters in-place using param.data = …

  • The function uses torch.no_grad() to avoid building computation graphs

  • Parameter shapes are preserved during the update process

  • Commonly used with pack_tensors() and unpack_tensors() for optimization

diffinytrace.optimize.set_full_if_nan(input: ndarray, fill_value: float) ndarray[source]

Replaces NaNs in input with a specified fill value.

Parameters:
  • input (np.ndarray) – A NumPy array or scalar.

  • fill_value (float) – Value to use in place of NaNs.

Returns:

Modified input with no NaNs.

Return type:

np.ndarray or float

class diffinytrace.optimize.ParameterFunHelper(orginal_fun, params, nan_fallback=inf)[source]

Bases: object

Helper class for evaluating PyTorch functions and gradients in SciPy optimization.

This class bridges PyTorch’s automatic differentiation with SciPy’s optimization routines by providing function and gradient evaluations in NumPy format. It includes caching to avoid redundant computations and handles NaN values gracefully during optimization.

Parameters:
  • original_fun (Callable) – PyTorch function to be optimized. Should return a scalar tensor.

  • params (List[torch.nn.Parameter]) – List of PyTorch parameters to optimize over.

  • nan_fallback (float, optional) – Value to return if NaN is detected in function or gradient evaluation. Defaults to float(“inf”).

original_fun

The objective function being optimized.

Type:

Callable

params

Parameters for optimization.

Type:

List[torch.nn.Parameter]

nan_fallback

Fallback value for NaN handling.

Type:

float

last_x_fun_numpy

Cache of last input for function evaluation.

Type:

np.ndarray

last_fun_val_numpy

Cache of last function value in NumPy format.

Type:

float

last_fun_val_torch

Cache of last function value as PyTorch tensor.

Type:

torch.Tensor

last_x_grad_numpy

Cache of last input for gradient evaluation.

Type:

np.ndarray

last_grad_val_numpy

Cache of last gradient in NumPy format.

Type:

np.ndarray

Example

>>> import torch
>>> import diffinytrace as dit
>>> import numpy as np
>>>
>>> # Define parameters and objective function
>>> params = [torch.nn.Parameter(torch.randn(5))]
>>> def objective():
...     return torch.sum(params[0]**2)
>>>
>>> # Create helper for SciPy optimization
>>> helper = dit.optimize.ParameterFunHelper(objective, params)
>>>
>>> # Use with SciPy
>>> x0 = np.ones((5,))*3.
>>> fun_val = helper.fun(x0)        # Evaluate function 5*3^2 = 45
>>> grad_val = helper.jac(x0)       # Evaluate gradient 2*3 = 6
>>> fun_val, grad_val = helper.fun_jac(x0)  # Evaluate both
>>>
>>> print(fun_val, grad_val)  # (45.0, array([6., 6., 6., 6., 6.]))

Note

  • Function and gradient evaluations are cached to avoid redundant computations when SciPy requests the same point multiple times.

  • All NaN values in function outputs or gradients are replaced with nan_fallback.

  • Parameters are automatically updated with new values during evaluation.

fun(x)[source]

Evaluates the objective function at a given input.

Parameters:

x (np.ndarray) – Flat input array.

Returns:

Function value with NaNs replaced if needed.

Return type:

float

jac(x)[source]

Computes the gradient of the objective function at input x.

Parameters:

x (np.ndarray) – Flat input array.

Returns:

Gradient with NaNs replaced if needed.

Return type:

np.ndarray

fun_jac(x)[source]

Evaluates both function value and gradient at once.

Parameters:

x (np.ndarray) – Flat input array.

Returns:

Function value and gradient.

Return type:

Tuple[float, np.ndarray]

diffinytrace.optimize.create_fun_and_gradient(merit_fun, params, nan_fallback, device, dtype)[source]

Wraps a PyTorch merit function and returns a callable that evaluates both the function and its gradient in NumPy format.

Parameters:
  • merit_fun (Callable) – PyTorch function to optimize.

  • params (list) – List of torch.nn.Parameter objects.

  • nan_fallback (float) – Value to use if NaNs are encountered.

  • device (torch.device) – Target device.

  • dtype (torch.dtype) – Target dtype.

Returns:

Function that returns (value, gradient) as NumPy arrays.

Return type:

Callable

diffinytrace.optimize.remove_bounds(params, bounds_attr_name) None[source]

Removes the bounds attribute from parameters if present.

Parameters:
  • params (list) – List of torch.nn.Parameter objects.

  • bounds_attr_name (str) – Attribute name of bounds to remove.

diffinytrace.optimize.get_bounds(params, bounds_attr_name='bounds')[source]

Extracts and concatenates bounds for all parameters.

Parameters:
  • params (list) – List of torch.nn.Parameter objects.

  • bounds_attr_name (str) – Name of attribute storing bounds.

Returns:

Array of shape (N, 2) with all bounds.

Return type:

np.ndarray

diffinytrace.optimize.get_scipy_constraint(constraint, params, nan_fallback)[source]

Converts a constraint into SciPy-compatible format.

Parameters:
  • constraint (Constraint) – A custom constraint object.

  • params (list) – List of parameters for the optimization.

  • nan_fallback (float) – Fallback value for NaNs.

Returns:

A dictionary compatible with SciPy constraints.

Return type:

dict

diffinytrace.optimize.create_callback(callback_fun, params, device, dtype)[source]

Wraps a PyTorch callback function for use in SciPy.

Parameters:
  • callback_fun (Callable) – A function taking no arguments.

  • params (list) – List of parameters to update before calling.

  • device (torch.device) – Device of the parameters.

  • dtype (torch.dtype) – Data type of the parameters.

Returns:

A callback function for SciPy optimizers.

Return type:

Callable

diffinytrace.optimize.minimize(fun, params, constraints: List = [], method=None, tol: float = 1e-09, callback: Callable = <function <lambda>>, options: dict | None = None, nan_fallback: float = inf, bounds_attr_name: str = 'bounds', save_history: bool = False, call_before_minimize: bool = False) dict[source]

Minimizes a function using SciPy’s minimize, supporting bounds and constraints.

Parameters:
  • fun (Callable) – Objective function.

  • params (list) – Parameters to optimize.

  • constraints (list) – List of constraints.

  • method (str) – SciPy optimization method (e.g., ‘L-BFGS-B’).

  • tol (float) – Tolerance for convergence.

  • callback (Callable) – Optional callback function.

  • options (dict) – Optimizer options.

  • nan_fallback (float) – Value to use if function returns NaN.

  • bounds_attr_name (str) – Name of bounds attribute.

  • save_history (bool) – If True, saves function values and gradient norms.

  • call_before_minimize (bool) – Whether to evaluate once before optimization.

Returns:

Dictionary containing optimization results (and optionally history).

Return type:

dict

diffinytrace.optimize.copy_bounds_to_attr_name(params, bounds_attr_name_new, bounds_attr_name_old='bounds', replace_existing_once=True)[source]

Copies bounds from one attribute name to another.

Parameters:
  • params (list) – List of parameters.

  • bounds_attr_name_new (str) – New attribute name.

  • bounds_attr_name_old (str) – Existing attribute name.

  • replace_existing_once (bool) – Whether to skip copying if new attribute exists.

diffinytrace.optimize.set_bounds_from_params_mask(params, mask: list | Tensor, bounds_attr_name_new, bounds_attr_name_old='bounds')[source]

Sets bounds for parameters based on a mask. Parameters with mask=False get fixed bounds (equal lower and upper bounds).

Parameters:
  • params (list) – List of parameters.

  • mask (list or torch.Tensor) – Mask specifying which elements are free.

  • bounds_attr_name_new (str) – Attribute name to store new bounds.

  • bounds_attr_name_old (str) – Attribute name to read old bounds from.