Physics as an Optimization Problem

The fundamental laws of physics can be understood through the lens of optimization theory. At the heart of classical mechanics lies the principle of least action, which states that physical trajectories minimize (or more precisely, extremize) a quantity called the action. This variational principle provides a unifying framework that connects the dynamics of physical systems to optimization problems, revealing deep connections between physics and numerical optimization.

The Variational Principle

The principle of least action, first formulated by Maupertuis and later refined by Euler and Lagrange, states that the actual path taken by a physical system between two points in configuration space is the one that extremizes the action functional. For Lagrangian systems, the action is defined as the integral of the Lagrangian \(L(q, \dot{q}, t)\) over time:

\[S[q] = \int_{t_1}^{t_2} L(q, \dot{q}, t) \, dt,\]

where \(q\) represents the configuration of the system and \(\dot{q}\) is its time derivative (velocity). The principle of least action asserts that physical trajectories satisfy \(\delta S = 0\), where \(\delta\) denotes the first variation. This variational condition leads directly to the Euler-Lagrange equations, which govern the system's evolution.

For Hamiltonian systems, this variational principle takes a particularly elegant form. The Hamiltonian action is defined as:

\[S[q, p] = \int_{t_1}^{t_2} \left( p \dot{q} - H(q, p) \right) dt,\]

where \(H(q, p)\) is the Hamiltonian function and \(p\) is the conjugate momentum. The variation of this action with respect to both \(q\) and \(p\) independently yields Hamilton's equations:

\[\dot{q} = \frac{\partial H}{\partial p}, \quad \dot{p} = -\frac{\partial H}{\partial q}.\]

This formulation reveals that Hamilton's equations are the optimality conditions for the variational problem of extremizing the Hamiltonian action. The physical trajectories are precisely those paths in phase space that satisfy these optimality conditions, establishing a fundamental connection between physics and optimization.

Variational Formulation of Hamiltonian Systems

The variational structure of Hamiltonian systems provides a geometric foundation for understanding their dynamics. The Hamiltonian action \(S[q, p]\) is a functional on the space of paths in phase space, and the condition \(\delta S = 0\) selects the physical trajectories.

To derive Hamilton's equations from the variational principle, we consider variations \(\delta q\) and \(\delta p\) that vanish at the endpoints. The first variation of the action is:

\[\delta S = \int_{t_1}^{t_2} \left( \delta p \cdot \dot{q} + p \cdot \delta \dot{q} - \frac{\partial H}{\partial q} \cdot \delta q - \frac{\partial H}{\partial p} \cdot \delta p \right) dt.\]

Integrating by parts the term \(p \cdot \delta \dot{q}\) and using the boundary conditions \(\delta q(t_1) = \delta q(t_2) = 0\), we obtain:

\[\delta S = \int_{t_1}^{t_2} \left[ \left( \dot{q} - \frac{\partial H}{\partial p} \right) \cdot \delta p - \left( \dot{p} + \frac{\partial H}{\partial q} \right) \cdot \delta q \right] dt.\]

For this variation to vanish for all admissible \(\delta q\) and \(\delta p\), the integrand must be zero, yielding Hamilton's equations:

\[\dot{q} = \frac{\partial H}{\partial p}, \quad \dot{p} = -\frac{\partial H}{\partial q}.\]

This derivation shows that Hamilton's equations are not arbitrary evolution equations, but rather the necessary and sufficient conditions for a path to extremize the Hamiltonian action. The physical interpretation is that nature "chooses" the trajectory that optimizes the action functional, making physics fundamentally an optimization problem.

The variational structure also explains the symplectic geometry of Hamiltonian systems. The symplectic form \(\omega = \sum_i dq^i \wedge dp_i\) emerges naturally from the variational principle, and the preservation of this structure under the Hamiltonian flow ensures that the optimization problem maintains its geometric properties throughout the evolution.

Constraint Projections as Optimization Problems

In numerical integration of constrained Hamiltonian systems, the enforcement of holonomic constraints \(\Phi(q) = 0\) is achieved through projection operations that can be understood as optimization problems. The RATTLE algorithm, used in SOFAx, employs two types of projections that each solve a specific optimization problem.

Position Projection

The position projection step in RATTLE projects an unconstrained configuration \(q^*\) onto the constraint manifold \(\Phi(q) = 0\) to obtain \(q^{n+1}\). This projection can be formulated as an optimization problem:

\[\min_{q} \|q - q^*\|_M^2 \quad \text{subject to} \quad \Phi(q) = 0,\]

where \(\| \cdot \|_M\) denotes the mass-weighted norm defined by \(\|v\|_M^2 = v^\top M v\) for a vector \(v\) and mass matrix \(M\). The mass-weighted metric ensures that the projection preserves the symplectic structure of the system.

For small constraint violations, this optimization problem is solved approximately using Newton-Raphson iteration. The constraint violation \(\Phi(q)\) is minimized iteratively, with corrections computed using the constraint Jacobian \(G(q) = \partial \Phi / \partial q\). The iterative correction takes the form:

\[q_{k+1} = q_k - G^+(q_k) \Phi(q_k),\]

where \(G^+\) is the pseudo-inverse of the constraint Jacobian. This iterative process converges to a configuration \(q^{n+1}\) that satisfies \(\Phi(q^{n+1}) = 0\) to within numerical tolerance, effectively solving the constrained optimization problem.

Velocity Projection

The velocity projection step projects the momentum \(p^*\) onto the tangent space of the constraint manifold, ensuring that the velocity \(v = M^{-1} p\) satisfies the velocity constraint \(G(q) \cdot v = 0\). This projection is formulated as a quadratic programming (QP) problem:

\[\min_{v} \|v - v^*\|_M^2 \quad \text{subject to} \quad G(q) \cdot v = 0,\]

where \(v^* = M^{-1} p^*\) is the unconstrained velocity. The objective function \(\|v - v^*\|_M^2 = (v - v^*)^\top M (v - v^*)\) measures the distance from the unconstrained velocity in the mass-weighted metric, and the constraint ensures that the projected velocity lies in the tangent space of the constraint manifold.

This QP problem has a closed-form solution using the method of Lagrange multipliers. Introducing the Lagrange multiplier vector \(\lambda\), the Lagrangian is:

\[\mathcal{L}(v, \lambda) = \frac{1}{2} (v - v^*)^\top M (v - v^*) + \lambda^\top G(q) v.\]

The optimality conditions \(\partial \mathcal{L} / \partial v = 0\) and \(\partial \mathcal{L} / \partial \lambda = 0\) yield:

\[M(v - v^*) + G(q)^\top \lambda = 0, \quad G(q) v = 0.\]

Solving for \(v\) from the first equation and substituting into the second, we obtain the linear system:

\[G(q) M^{-1} G(q)^\top \lambda = G(q) v^*.\]

Once \(\lambda\) is determined, the projected velocity is:

\[v = v^* - M^{-1} G(q)^\top \lambda,\]

and the corresponding projected momentum is:

\[p = M v = p^* - G(q)^\top \lambda.\]

This projection ensures that the velocity constraint \(G(q) \cdot v = 0\) is satisfied exactly while minimizing the distance from the unconstrained velocity in the mass-weighted metric, preserving the symplectic structure of the system.

The Quadratic Programming Connection

The velocity projection in RATTLE is a canonical example of a quadratic programming (QP) problem. QP problems have the general form:

\[\min_{x} \frac{1}{2} x^\top Q x + c^\top x \quad \text{subject to} \quad A x = b,\]

where \(Q\) is a positive semi-definite matrix, \(c\) is a vector, \(A\) is a constraint matrix, and \(b\) is a constraint vector.

For the velocity projection, we can rewrite the problem in standard QP form. Setting \(x = v\), \(Q = M\), \(c = -M v^*\), \(A = G(q)\), and \(b = 0\), the velocity projection becomes:

\[\min_{v} \frac{1}{2} v^\top M v - (M v^*)^\top v \quad \text{subject to} \quad G(q) v = 0.\]

The solution of this QP problem, as derived above, involves solving the linear system:

\[G(q) M^{-1} G(q)^\top \lambda = G(q) v^*.\]

In SOFAx, this linear system is solved numerically using JAX's linear algebra routines. For scalar constraints, the solution simplifies to:

\[\lambda = \frac{G(q) v^*}{G(q) M^{-1} G(q)^\top},\]

while for vector constraints, the system is solved using regularized matrix inversion for numerical stability (see sofax/integrators/rattle.py, lines 178-192).

The QP formulation of velocity projection highlights the optimization-theoretic nature of constraint enforcement in symplectic integration. The projection is not merely a geometric operation, but a principled optimization that respects both the constraint geometry and the symplectic structure of the phase space.

Gradient Descent and Hamiltonian Dynamics

There is a deep conceptual connection between Hamiltonian dynamics and gradient descent, though with important distinctions. Both can be understood as optimization algorithms, but they operate in fundamentally different ways.

Gradient Descent as Optimization

Classical gradient descent minimizes a function \(f(x)\) by following the negative gradient:

\[\dot{x} = -\nabla f(x).\]

This dynamics drives the system toward local minima of \(f\), with the energy \(f(x)\) decreasing monotonically along trajectories.

Hamiltonian Dynamics as "Gradient Descent" in Phase Space

Hamilton's equations can be written in a form that resembles gradient descent, but in the symplectic structure of phase space:

\[\begin{pmatrix} \dot{q} \\ \dot{p} \end{pmatrix} = J \begin{pmatrix} \frac{\partial H}{\partial q} \\ \frac{\partial H}{\partial p} \end{pmatrix} = J \nabla H,\]

where \(J = \begin{pmatrix} 0 & I \\ -I & 0 \end{pmatrix}\) is the symplectic matrix. The key difference from standard gradient descent is the presence of the symplectic matrix \(J\), which rotates the gradient by 90 degrees in phase space.

This rotation has profound consequences:

Energy Conservation: Unlike gradient descent, which decreases energy monotonically, Hamiltonian dynamics preserves energy: \(\frac{dH}{dt} = \nabla H^\top J \nabla H = 0\) due to the skew-symmetry of \(J\).
Symplectic Structure: The dynamics preserves the symplectic form \(\omega\), ensuring that areas in phase space are conserved (Liouville's theorem).
Reversibility: Hamiltonian dynamics is time-reversible, while gradient descent is not.

The Optimization Perspective

From an optimization perspective, Hamiltonian dynamics can be viewed as a "symplectic gradient flow" that explores level sets of the Hamiltonian rather than descending to minima. This property makes Hamiltonian dynamics suitable for:

Sampling: Hamiltonian Monte Carlo (HMC) uses Hamiltonian dynamics to explore probability distributions, leveraging the energy-conserving property to maintain detailed balance.
Long-term Integration: The symplectic structure preservation ensures numerical stability for long-time simulations, unlike gradient descent which would converge to a single point.
Constrained Optimization: The projection steps in RATTLE can be understood as solving optimization subproblems (as discussed above), while the Hamiltonian flow provides the unconstrained dynamics.

The connection between gradient descent and Hamiltonian dynamics reveals that both are optimization algorithms, but operating under different geometric structures. Gradient descent minimizes energy in Euclidean space, while Hamiltonian dynamics preserves energy while exploring the symplectic geometry of phase space. This distinction is crucial for understanding when to use each approach: gradient descent for finding minima, and Hamiltonian dynamics for sampling, long-term integration, and structure-preserving simulation.