• Call for Social Sciences Papers
  • Sign-up for PNAS eTOC Alerts

Motion microscopy for visualizing and quantifying small motions

  1. William T. Freemana,e,f,2
  1. aComputer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139;
  2. bDepartment of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139;
  3. cHarvard-MIT Program in Health Sciences and Technology, Cambridge, MA 02139;
  4. dResearch Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, MA 02139;
  5. eGoogle Research, Google Inc. Cambridge, MA 02139;
  6. fDepartment of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139;
  7. gSchool of Engineering and Applied Sciences, Harvard University, Cambridge, MA 02138;
  8. hDepartment of Mechanical Engineering, Johns Hopkins University, Baltimore, MD 21218;
  9. iHopkins Extreme Materials Institute, Johns Hopkins University, Baltimore, MD 21218
  1. Edited by William H. Press, University of Texas at Austin, Austin, TX, and approved August 22, 2017 (received for review March 5, 2017)

Significance

Humans have difficulty seeing small motions with amplitudes below a threshold. Although there are optical techniques to visualize small static physical features (e.g., microscopes), visualization of small dynamic motions is extremely difficult. Here, we introduce a visualization tool, the motion microscope, that makes it possible to see and understand important biological and physical modes of motion. The motion microscope amplifies motions in a captured video sequence by rerendering small motions to make them large enough to see and quantifies those motions for analysis. Amplification of these tiny motions involves careful noise analysis to avoid the amplification of spurious signals. In the representative examples presented in this study, the visualizations reveal important motions that are invisible to the naked eye.

Abstract

Although the human visual system is remarkable at perceiving and interpreting motions, it has limited sensitivity, and we cannot see motions that are smaller than some threshold. Although difficult to visualize, tiny motions below this threshold are important and can reveal physical mechanisms, or be precursors to large motions in the case of mechanical failure. Here, we present a “motion microscope,” a computational tool that quantifies tiny motions in videos and then visualizes them by producing a new video in which the motions are made large enough to see. Three scientific visualizations are shown, spanning macroscopic to nanoscopic length scales. They are the resonant vibrations of a bridge demonstrating simultaneous spatial and temporal modal analysis, micrometer vibrations of a metamaterial demonstrating wave propagation through an elastic matrix with embedded resonating units, and nanometer motions of an extracellular tissue found in the inner ear demonstrating a mechanism of frequency separation in hearing. In these instances, the motion microscope uncovers hidden dynamics over a variety of length scales, leading to the discovery of previously unknown phenomena.

Motion microscopy is a computational technique to visualize and analyze meaningful but small motions. The motion microscope enables the inspection of tiny motions as optical microscopy enables the inspection of tiny forms. We demonstrate its utility in three disparate problems from biology and engineering: visualizing motions used in mammalian hearing, showing vibration modes of structures, and verifying the effectiveness of designed metamaterials.

The motion microscope is based on video magnification (1??4), which processes videos to amplify small motions of any kind in a specified temporal frequency band. We extend the visualization produced by video magnification to scientific and engineering analysis. In addition to visualizing tiny motions, we quantify both the object’s subpixel motions and the errors introduced by camera sensor noise (5). Thus, the user can see the magnified motions and obtain their values, with variances, allowing for both qualitative and quantitative analyses.

The motion microscope characterizes and amplifies tiny local displacements in a video by using spatial local phase. It does this by transforming the captured intensities of each frame’s pixels into a wavelet-like representation where displacements are represented by phase shifts of windowed complex sine waves. The representation is the complex steerable pyramid (6), an overcomplete linear wavelet transform, similar to a spatially localized Fourier transform. The transformed image is a sum of basis functions, approximated by windowed sinusoids (Fig. S1), that are simultaneously localized in spatial location (<mml:math><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:math>x,y), scale <mml:math><mml:mi>r</mml:mi></mml:math>r, and orientation <mml:math><mml:mi>θ</mml:mi></mml:math>θ. Each basis function coefficient gives spatially local frequency information and has an amplitude <mml:math><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>Ar,θ(x,y) and a phase <mml:math><mml:mrow><mml:msub><mml:mi>?</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>?r,θ(x,y).

Fig. S1.

Increasing the phase of complex steerable pyramid coefficients results in approximate local motion of the basis functions. (A) A 1D slice of a complex steerable pyramid basis function. (B) The basis function is multiplied by several complex coefficients of constant amplitude and increasing phase to produce the real part of a new basis function that is approximately translating. Copyright (2016) Association for Computing Machinery, Inc. Reprinted with permission from ref. 37.

To amplify motions, we compute the unwrapped phase difference of each coefficient of the transformed image at time <mml:math><mml:mi>t</mml:mi></mml:math>t from its corresponding value in the first frame,<mml:math display="block"><mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:msub><mml:mi>?</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>:=</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>?</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>?</mml:mo><mml:mrow><mml:msub><mml:mi>?</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow><mml:mo>.</mml:mo></mml:mrow></mml:math>Δ?r,θ(x,y,t):=?r,θ(x,y,t)??r,θ(x,y,0).[1]We isolate motions of interest and remove components due to noise by temporally and spatially filtering <mml:math><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:msub><mml:mi>?</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math>Δ?r,θ. We amplify the filtered phase shifts by the desired motion magnification factor to obtain modified phases for each basis function at each time <mml:math><mml:mi>t</mml:mi></mml:math>t. We then transform back each frame’s steerable pyramid to produce the motion-magnified output video (Fig. S2) (3).

Fig. S2.

A 1D example illustrating how the local phase of complex steerable pyramid coefficients is used to amplify the motion of a subtly translating step edge. (A) Frames (two shown) from the video. (B) Sample basis functions of the complex steerable pyramid. (C) Coefficients (one shown per frame) of the frames in the complex steerable pyramid representation. The phases of the resulting complex coefficients are computed. (D) The phase differences between corresponding coefficients are amplified. Only a coefficient corresponding to a single location and scale is shown; this processing is done to all coefficients. (E) The new coefficients are used to shift the basis functions. (F) A reconstructed video is produced by inverse transforming the complex steerable pyramid representation. The motion of the step edge is magnified. Copyright (2016) Association for Computing Machinery, Inc. Reprinted with permission from ref. 37.

We estimate motions under the assumption that there is a single, small motion at each spatial location. In this case, each coefficient’s phase difference, <mml:math><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:msub><mml:mi>?</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math>Δ?r,θ, is approximately equal to the dot product of the corresponding basis function’s orientation and the 2D motion (7) (Relation Between Local Phase Differences and Motions). The reliability of spatial local phase varies across scale and orientations, in direct proportion to the coefficient’s amplitude (e.g., coefficients for basis functions orthogonal to an edge are more reliable than those along it) (Fig. S3 and Low-Amplitude Coefficients Have Noisy Phase). We combine information about the motion from multiple orientations by solving a weighted least squares problem with weights equal to the amplitude squared. The result is a 2D motion field. This processing is accurate, and we provide comparisons to other algorithms and sensors (Fig. 1, Synthetic Validation, and Figs. S4 and S5).

Fig. S3.

Noise model of local phase. (A) A frame from a synthetic video with noise. (B) The real part of a single level of the complex steerable pyramid representation of A. (C) The imaginary part of the same level of the complex steerable pyramid representation of A. (D) A point cloud over noisy values of the real and imaginary values of the complex steerable pyramid representation at the red, green, and blue points in B. (E) The corresponding histogram of phases.

Fig. S4.

A comparison of our quantitative motion estimation vs. a laser vibrometer. Several videos of a cantilevered beam excited by a shaker were taken, with varying focal length, exposure times, and excitation magnitude. (A) A frame from one video. (B) Motions from the motion microscope at the red point are compared with the integrated velocities from a laser vibrometer. (C) B from 0.5 s to 1.5 s. (D) B from 11 s to 12 s. (E) The correlation between the two signals across the videos vs. the RMS motion size in pixels, measured by the laser vibrometer. (F) The correlation between the two signals across the videos vs. the RMS motion size in pixels measured by the laser vibrometer. (G) The correlation between the signals vs. focal length (exposure time, <mml:math><mml:mn>490</mml:mn></mml:math>490 <mml:math><mml:mi>μ</mml:mi></mml:math>μs; excitation magnitude, 15). (H) Correlation vs. exposure time (focal length, <mml:math><mml:mn>85</mml:mn></mml:math>85 mm; excitation magnitude, 15). Cropped frames from the corresponding videos are shown above. (I) Correlation vs. relative excitation magnitude (focal length, <mml:math><mml:mn>85</mml:mn></mml:math>85 mm; exposure time, <mml:math><mml:mn>490</mml:mn></mml:math>490 <mml:math><mml:mi>μ</mml:mi></mml:math>μs). Only motions at the red point in A were used in our analysis.

Fig. S5.

An evaluation of our motion estimation method and Ncorr (30) on a synthetic dataset of images. (A) Sample frames from real videos used to create the dataset. (B) Sample of synthetic motion fields of various motion size and spatial scale used to create the dataset. (C) The motion microscope and Ncorr are used to estimate the motion field, and the average relative error is displayed for both methods as a function of motion size and spatial scale. Both methods are only accurate for spatially smooth motion fields. Our method is twice as accurate for spatially smooth, subpixel motion fields. Ncorr is more accurate for larger motions.

For a still camera, the sensitivity of the motion microscope is mostly limited by local contrast and camera noise—fluctuations of pixel intensities present in all videos (5). When the video is motion-magnified, this noise can lead to spurious motions, especially at low-contrast edges and textures (Fig. S6). We measure motion noise level by computing the covariance matrix of each estimated motion vector. Estimating this directly from the input video is usually impossible, because it requires observing the motions without noise. We solve this by creating a simulated noisy video with zero motion, replicating a static frame of the input video and adding realistic, independent noise to each frame. We compute the sample covariance of the estimated motion vectors in this simulated video (Fig. S7 and Noise Model and Creating Synthetic Video). We show analytically, and via experiments in which the motions in a temporal band are known to be zero, that these covariance matrices are accurate for real videos (Analytic Justification of Noise Analysis and Figs. S8 and S9). We also analyze the limits of our technique by comparing to a laser vibrometer and show that, with a Phantom V-10 camera, at a high-contrast edge, the smallest motion we can detect is on the order of 1/100th of a pixel (Fig. 1 and Fig. S4).

Fig. S6.

Magnification of a spatial smooth and temporally filtered noise can look like a real signal. (A) Frames and time slices from a synthetic 300-frame video created by replicating a single frame 300 times and adding a different realistic noise pattern to each frame. (B) Corresponding frame and time slices from the synthetic video motion magnified 600<mml:math><mml:mo>×</mml:mo></mml:math>× in a temporal band of 40 Hz to 60 Hz. Time slices from the same parts of each video are shown on the right for comparison.

Fig. S7.

Using a probabilistic simulation to compute the noise covariance of the motion estimate. (A) A single frame from an input video, in this case, of an elastic metamaterial. (B) Simulated, but realistic noise (contrast enhanced 80<mml:math><mml:mo>×</mml:mo></mml:math>×). (C) Synthetic video with no motions, consisting of the input frame replicated plus simulated noise (noise contrast-enhanced 80<mml:math><mml:mo>×</mml:mo></mml:math>×). (D) Estimated motions of this video. (E) Sample variances and sample covariance of the vertical and horizontal components of the motion are computed to give an estimate of how much noise is in the motion estimate.

Fig. S8.

Synthetic experiments showing that our noise covariance estimation, which assumes that the motions are zero, is also accurate for small nonzero motions. (A) The motion between synthetic frames with noise and slightly translated versions (not shown) are computed over 4,000 runs at the marked point in red for several different translation amounts. Each time, different but independent noise is added to the frames. (B) The sample covariance vs. motion size. (C) Relative error of horizontal variance vs. motion size. (D) Relative error of vertical variance vs. motion size. C and D are on the same color scale.

Fig. S9.

Validation of our noise estimation on real data. (A) A frame from a real video of an accelerometer attached to a beam. (B) The accelerometer shows there are no motions in the frequency band 600 Hz to 700 Hz. (C) The variance of our motion estimate in the 600- to 700-Hz band serves as a ground-truth measure of noise, as there are no motions. (D) The estimated noise level vs. intensity for a signal-dependent noise model and a constant noise model. (E) The noise estimate produced by our Monte Carlo simulation with a signal-dependent model. All variances are of the motions projected onto the direction of least variance. Textureless regions, where the motion estimation is not meaningful, have been masked out in black. (F) Difference in decibels between ground truth and E. (G) Noise estimate produced by the Monte Carlo simulation with a constant noise model. (H) Difference in decibels between ground truth and G.

Relation Between Local Phase Differences and Motions

Fleet and Jepson have shown that contours of constant phase in image subbands such as those in the complex steerable pyramid approximately track the motion of objects in a video (7). We make a similar phase constancy assumption, in which the following equation relates the phase of the frame at time <mml:math><mml:mn>0</mml:mn></mml:math>0 to the phase of future frames:<mml:math display="block"><mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi>?</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:msub><mml:mi>?</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>?</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mi>y</mml:mi><mml:mo>?</mml:mo><mml:mrow><mml:mi>v</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>,</mml:mo></mml:mrow></mml:math>?r,θ(x,y,0)=?r,θ(x?u(x,y,t),y?v(x,y,t),t),[S1]where <mml:math><mml:mrow><mml:mrow><mml:mi>??</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>:=</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mi>v</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>??(x,y,t):=(u(x,y,t),v(x,y,t)) is the motion we seek to compute. We Taylor-expand the right-hand side around <mml:math><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math>(x,y) to get<mml:math display="block"><mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:msub><mml:mi>?</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mfrac><mml:mrow><mml:mo>?</mml:mo><mml:msub><mml:mi>?</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo>?</mml:mo><mml:mi>x</mml:mi></mml:mrow></mml:mfrac><mml:mo>,</mml:mo><mml:mfrac><mml:mrow><mml:mo>?</mml:mo><mml:msub><mml:mi>?</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo>?</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:mfrac><mml:mo>)</mml:mo></mml:mrow><mml:mo>?</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mi>O</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msup><mml:mi>u</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mo>,</mml:mo><mml:msup><mml:mi>v</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow><mml:mo>,</mml:mo></mml:mrow></mml:math>Δ?r,θ=(??r,θ?x,??r,θ?y)?(u,v)+O(u2,v2),[S2]where <mml:math><mml:mrow><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:msub><mml:mi>?</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>:=</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>?</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>?</mml:mo><mml:mrow><mml:msub><mml:mi>?</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:math>Δ?r,θ(x,y,t):=?r,θ(x,y,t)??r,θ(x,y,0), arguments have been suppressed and <mml:math><mml:mrow><mml:mi>O</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msup><mml:mi>u</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mo>,</mml:mo><mml:msup><mml:mi>v</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>O(u2,v2) represents higher order terms in the Taylor expansion. Because we assume the motions are small, higher order terms are negligible and the local phase variations are approximately equal to only the linear term,<mml:math display="block"><mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:msub><mml:mi>?</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mfrac><mml:mrow><mml:mo>?</mml:mo><mml:msub><mml:mi>?</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo>?</mml:mo><mml:mi>x</mml:mi></mml:mrow></mml:mfrac><mml:mo>,</mml:mo><mml:mfrac><mml:mrow><mml:mo>?</mml:mo><mml:msub><mml:mi>?</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo>?</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:mfrac><mml:mo>)</mml:mo></mml:mrow><mml:mo>?</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>.</mml:mo></mml:mrow></mml:math>Δ?r,θ=(??r,θ?x,??r,θ?y)?(u,v).[S3]

Fleet has shown that the spatial gradients of the local phase, <mml:math><mml:mrow><mml:mo>(</mml:mo><mml:mfrac><mml:mrow><mml:mo>?</mml:mo><mml:msub><mml:mi>?</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo>?</mml:mo><mml:mi>x</mml:mi></mml:mrow></mml:mfrac><mml:mo>,</mml:mo><mml:mfrac><mml:mrow><mml:mo>?</mml:mo><mml:msub><mml:mi>?</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo>?</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:mfrac><mml:mo>)</mml:mo></mml:mrow></mml:math>(??r,θ?x,??r,θ?y), are roughly constant within a subband and that they are approximately equal to the peak tuning frequency of the corresponding subband’s filter (33). This frequency is a 2D vector oriented orthogonal to the direction the subband selects for, which means that the local phase changes only provide information about the motions perpendicular to this direction.

Noise Model and Creating Synthetic Video

We adopt a signal-dependent noise model, in which each pixel is contaminated with spatially independent Gaussian noise with variance <mml:math><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>I</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>f(I), where <mml:math><mml:mi>I</mml:mi></mml:math>I is the pixel’s mean intensity (26, 34). Liu et al. (26) refer to this function <mml:math><mml:mi>f</mml:mi></mml:math>f as a “noise level function,” and we do the same. This reflects that sensor noise is well modeled by the sum of zero-mean Gaussian noise sources, some of which have variances that depend on intensity (5). We show that this noise model is an improvement over a constant variance noise model in Fig. S9.

The noise level function <mml:math><mml:mi>f</mml:mi></mml:math>f is estimated from temporal variations in the input video, with observed intensities <mml:math><mml:mrow><mml:mi>I</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>I(x,y,t). Assuming that <mml:math><mml:mi>I</mml:mi></mml:math>I is the sum of noiseless intensity <mml:math><mml:msub><mml:mi>I</mml:mi><mml:mn>0</mml:mn></mml:msub></mml:math>I0 and a zero-mean Gaussian noise term <mml:math><mml:msub><mml:mi>I</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:math>In with variance <mml:math><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>I</mml:mi><mml:mn>0</mml:mn></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>f(I0), the temporal variations are given by the following Taylor expansion:<mml:math display="block"><mml:mrow><mml:mrow><mml:mi>I</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>I</mml:mi><mml:mn>0</mml:mn></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:msub><mml:mi>I</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:math>I(x,y,t)=I0(x,y,t)+In(x,y,t)[S17]<mml:math display="block"><mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>I</mml:mi><mml:mn>0</mml:mn></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>?</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mi>y</mml:mi><mml:mo>?</mml:mo><mml:mrow><mml:mi>v</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>,</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:msub><mml:mi>I</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:math>=I0(x?u(x,y,t),y?v(x,y,t),0)+In(x,y,t)[S18]<mml:math display="block"><mml:mrow><mml:mrow><mml:mo>≈</mml:mo><mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi>I</mml:mi><mml:mn>0</mml:mn></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>?</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mo>?</mml:mo><mml:msub><mml:mi>I</mml:mi><mml:mn>0</mml:mn></mml:msub></mml:mrow><mml:mrow><mml:mo>?</mml:mo><mml:mi>x</mml:mi></mml:mrow></mml:mfrac><mml:mi>u</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>?</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mo>?</mml:mo><mml:msub><mml:mi>I</mml:mi><mml:mn>0</mml:mn></mml:msub></mml:mrow><mml:mrow><mml:mo>?</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:mfrac><mml:mi>v</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:msub><mml:mi>I</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow><mml:mo>.</mml:mo></mml:mrow></mml:math>≈I0(x,y,0)??I0?xu(x,y,t)??I0?yv(x,y,t)+In(x,y,t).[S19]The second equality is the brightness constancy assumption of optical flow (35, 36). We exclude pixels where the spatial gradient <mml:math><mml:mrow><mml:mo>(</mml:mo><mml:mfrac><mml:mrow><mml:mo>?</mml:mo><mml:msub><mml:mi>I</mml:mi><mml:mn>0</mml:mn></mml:msub></mml:mrow><mml:mrow><mml:mo>?</mml:mo><mml:mi>x</mml:mi></mml:mrow></mml:mfrac><mml:mo>,</mml:mo><mml:mfrac><mml:mrow><mml:mo>?</mml:mo><mml:msub><mml:mi>I</mml:mi><mml:mn>0</mml:mn></mml:msub></mml:mrow><mml:mrow><mml:mo>?</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:mfrac><mml:mo>)</mml:mo></mml:mrow></mml:math>(?I0?x,?I0?y) has high magnitude from our analysis. At the remaining pixel, temporal variations in <mml:math><mml:mi>I</mml:mi></mml:math>I are mostly due to noise,<mml:math display="block"><mml:mrow><mml:mrow><mml:mrow><mml:mi>I</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>≈</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>I</mml:mi><mml:mn>0</mml:mn></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:msub><mml:mi>I</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow><mml:mo>.</mml:mo></mml:mrow></mml:math>I(x,y,t)≈I0(x,y,0)+In(x,y,t).[S20]At these pixels, we take the temporal variance and mean of <mml:math><mml:mi>I</mml:mi></mml:math>I, which, in expectation, are <mml:math><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>I</mml:mi><mml:mn>0</mml:mn></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>f(I0) and <mml:math><mml:msub><mml:mi>I</mml:mi><mml:mn>0</mml:mn></mml:msub></mml:math>I0, respectively. To increase robustness, we divide the intensity range into 64 equally sized bins. For each bin, we take all those pixels with mean inside that bin and take the mean of the corresponding temporal variances of <mml:math><mml:mi>I</mml:mi></mml:math>I to estimate the noise level function <mml:math><mml:mi>f</mml:mi></mml:math>f.

With <mml:math><mml:mi>f</mml:mi></mml:math>f in hand, we can take frames from existing videos and use them to create simulated videos with realistic noise, but with known, zero motion. In Fig. S6A, we take a frame <mml:math><mml:mrow><mml:msub><mml:mi>I</mml:mi><mml:mn>0</mml:mn></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>I0(x,y,0) from a video of the metamaterial, filmed with a Phantom V-10, and add noise to it via the equation<mml:math display="block"><mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi>I</mml:mi><mml:mi>S</mml:mi></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>I</mml:mi><mml:mn>0</mml:mn></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:msub><mml:mi>I</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:msqrt><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>I</mml:mi><mml:mn>0</mml:mn></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msqrt></mml:mrow></mml:mrow></mml:mrow><mml:mo>,</mml:mo></mml:mrow></mml:math>IS(x,y,t)=I0(x,y,0)+In(x,y,t)f(I0(x,y,0)),[S21]where <mml:math><mml:msub><mml:mi>I</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:math>In now is Gaussian noise with unit variance. We motion-magnify the resulting video 600 times in a 20-Hz band centered at 50 Hz to show that motion-magnified noise can cause spurious motions (Fig. S6B).

We use the same simulation to create synthetic videos with which to estimate the covariance matrix of the motion vectors.

We quantify the noise in the motion vectors by estimating their covariance matrices <mml:math><mml:mrow><mml:msub><mml:mi mathvariant="normal">Σ</mml:mi><mml:mi>??</mml:mi></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>Σ??(x,y). These matrices reflect variations in the motion caused by noise. It is not usually possible to directly estimate them from the input video, because both motions and noise vary across frames and the true motions are unknown. Therefore, we create a noisy, synthetic video <mml:math><mml:mrow><mml:msub><mml:mi>I</mml:mi><mml:mi>S</mml:mi></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>IS(x,y,t) with known zero true motion (Eq. S21 and Fig. S7 AC).

We estimate the motions in <mml:math><mml:msub><mml:mi>I</mml:mi><mml:mi>S</mml:mi></mml:msub></mml:math>IS (Fig. S7D) using our technique with spatial smoothing, but without temporal filtering, which we handle in a later step. This results in a set of 2D motion vectors <mml:math><mml:mrow><mml:msub><mml:mi>??</mml:mi><mml:mi>S</mml:mi></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>??S(x,y,t), in which all temporal variations in <mml:math><mml:msub><mml:mi>??</mml:mi><mml:mi>S</mml:mi></mml:msub></mml:math>??S are due to noise. The sample covariance matrix over the time dimension is<mml:math display="block"><mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi mathvariant="normal">Σ</mml:mi><mml:mi>??</mml:mi></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mi>N</mml:mi><mml:mo>?</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:mfrac><mml:mrow><mml:munder><mml:mo largeop="true" movablelimits="false" symmetric="true">∑</mml:mo><mml:mi>t</mml:mi></mml:munder><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>??</mml:mi><mml:mi>S</mml:mi></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>?</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mover accent="true"><mml:mi>??</mml:mi><mml:mo stretchy="false">ˉ</mml:mo></mml:mover></mml:mrow><mml:mi>S</mml:mi></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>??</mml:mi><mml:mi>S</mml:mi></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>?</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mover accent="true"><mml:mi>??</mml:mi><mml:mo stretchy="false">ˉ</mml:mo></mml:mover></mml:mrow><mml:mi>S</mml:mi></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mi>T</mml:mi></mml:msup></mml:mrow></mml:mrow></mml:mrow></mml:mrow><mml:mo>,</mml:mo></mml:mrow></mml:math>Σ??=1N?1∑t(??S(x,y,t)???ˉS(x,y))(??S(x,y,t)???ˉS(x,y))T,[S22]where <mml:math><mml:mrow><mml:msub><mml:mrow><mml:mover accent="true"><mml:mi>??</mml:mi><mml:mo stretchy="false">ˉ</mml:mo></mml:mover></mml:mrow><mml:mi>S</mml:mi></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>??ˉS(x,y) is the mean over <mml:math><mml:mi>t</mml:mi></mml:math>t of the motion vectors. <mml:math><mml:msub><mml:mi mathvariant="normal">Σ</mml:mi><mml:mi>??</mml:mi></mml:msub></mml:math>Σ?? is a <mml:math><mml:mrow><mml:mn>2</mml:mn><mml:mo>×</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:math>2×2 symmetric matrix, defined at every pixel, with only three unique components. In Fig. S7E, we show these components, the variances of the horizontal and vertical components of the motion, and their covariance.

The motion <mml:math><mml:mi>??</mml:mi></mml:math>?? projected onto a direction vector <mml:math><mml:mrow><mml:msub><mml:mi>??</mml:mi><mml:mi>θ</mml:mi></mml:msub><mml:mo>:=</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>cos</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>θ</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mi>sin</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>θ</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>??θ:=(cos(θ),sin(θ)) is <mml:math><mml:mrow><mml:mi>??</mml:mi><mml:mo>?</mml:mo><mml:msub><mml:mi>??</mml:mi><mml:mi>θ</mml:mi></mml:msub></mml:mrow></mml:math>?????θ and has variance <mml:math><mml:mrow><mml:mrow><mml:msubsup><mml:mi>σ</mml:mi><mml:mi>V</mml:mi><mml:mn>2</mml:mn></mml:msubsup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>θ</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:msubsup><mml:mi>??</mml:mi><mml:mi>θ</mml:mi><mml:mi>T</mml:mi></mml:msubsup><mml:msub><mml:mi mathvariant="normal">Σ</mml:mi><mml:mi>??</mml:mi></mml:msub><mml:msub><mml:mi>??</mml:mi><mml:mi>θ</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:math>σV2(θ)=??θTΣ????θ. Of particular interest is the direction <mml:math><mml:mi>θ</mml:mi></mml:math>θ of least variance that minimizes <mml:math><mml:mrow><mml:msubsup><mml:mi>σ</mml:mi><mml:mi>V</mml:mi><mml:mn>2</mml:mn></mml:msubsup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>θ</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>σV2(θ). In the case of an edge in the image, the direction of least variance is usually normal to the edge.

Analytic Justification of Noise Analysis

We analyze only the case when the amplitudes at a pixel in all subbands are large (<mml:math><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub><mml:mo>></mml:mo><mml:mo>></mml:mo><mml:mrow><mml:msup><mml:mi>σ</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mrow><mml:mo>∑</mml:mo><mml:msup><mml:mi>g</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mrow></mml:mrow></mml:math>Ar,θ>>σ2∑g2), because the local phases have a Gaussian distribution in this case. Such points intuitively correspond to places where there is image content in at least two directions. In this case, we show that the sample covariance matrix computed using a simulated video with no motions is accurate for videos with subpixel small motions.

We reproduce the linearization of phase constancy equation (Eq. S3) with noise terms added to the phase variations (<mml:math><mml:msub><mml:mi>n</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:math>nt) and phase gradient (<mml:math><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi>x</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>n</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow></mml:math>nx,ny),<mml:math display="block"><mml:mrow><mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:msub><mml:mi>?</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>+</mml:mo><mml:msub><mml:mi>n</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>?</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mo>?</mml:mo><mml:msub><mml:mi>?</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo>?</mml:mo><mml:mi>x</mml:mi></mml:mrow></mml:mfrac><mml:mo>+</mml:mo><mml:msub><mml:mi>n</mml:mi><mml:mi>x</mml:mi></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mo>?</mml:mo><mml:msub><mml:mi>?</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo>?</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:mfrac><mml:mo>+</mml:mo><mml:msub><mml:mi>n</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>.</mml:mo></mml:mrow></mml:math>Δ?r,θ+nt=(u,v)?(??r,θ?x+nx,??r,θ?y+ny).[S23]The total noise term in this equation is <mml:math><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:msub><mml:mi>n</mml:mi><mml:mi>x</mml:mi></mml:msub></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mi>v</mml:mi><mml:msub><mml:mi>n</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:math>nt+unx+vny. The noise terms <mml:math><mml:msub><mml:mi>n</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:math>nt, <mml:math><mml:msub><mml:mi>n</mml:mi><mml:mi>x</mml:mi></mml:msub></mml:math>nx and <mml:math><mml:msub><mml:mi>n</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:math>ny are of the same order of magnitude. Since <mml:math><mml:mi>u</mml:mi></mml:math>u and <mml:math><mml:mi>v</mml:mi></mml:math>v are much less than <mml:math><mml:mn>1</mml:mn></mml:math>1 pixel, the predominant source of noise is from <mml:math><mml:msub><mml:mi>n</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:math>nt, the effects of <mml:math><mml:msub><mml:mi>n</mml:mi><mml:mi>x</mml:mi></mml:msub></mml:math>nx and <mml:math><mml:msub><mml:mi>n</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:math>ny are negligible, and we can ignore them, allowing us to write the noisy version of the equation as<mml:math display="block"><mml:mrow><mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:msub><mml:mi>?</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>+</mml:mo><mml:msub><mml:mi>n</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>?</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mfrac><mml:mrow><mml:mo>?</mml:mo><mml:msub><mml:mi>?</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo>?</mml:mo><mml:mi>x</mml:mi></mml:mrow></mml:mfrac><mml:mo>,</mml:mo><mml:mfrac><mml:mrow><mml:mo>?</mml:mo><mml:msub><mml:mi>?</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo>?</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:mfrac><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>.</mml:mo></mml:mrow></mml:math>Δ?r,θ+nt=(u,v)?(??r,θ?x,??r,θ?y).[S24]

The motion estimate <mml:math><mml:mi>??</mml:mi></mml:math>?? is the solution to a weighted least squares problem, <mml:math><mml:mrow><mml:mi>??</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msup><mml:mi>??</mml:mi><mml:mi>T</mml:mi></mml:msup><mml:mi>????</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:mo>?</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:msup><mml:mi>??</mml:mi><mml:mi>T</mml:mi></mml:msup><mml:mi>????</mml:mi></mml:mrow></mml:mrow></mml:math>??=(??T????)?1??T????. To simplify notation, let <mml:math><mml:mrow><mml:mi>??</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msup><mml:mi>??</mml:mi><mml:mi>T</mml:mi></mml:msup><mml:mi>????</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:mo>?</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:msup><mml:mi>??</mml:mi><mml:mi>T</mml:mi></mml:msup><mml:mi>??</mml:mi></mml:mrow></mml:mrow></mml:math>??=(??T????)?1??T??, the parts of the equation that don’t depend on time. Then, the flow estimate is<mml:math display="block"><mml:mrow><mml:mrow><mml:mi>??</mml:mi><mml:mo>=</mml:mo><mml:mi>????</mml:mi></mml:mrow><mml:mo>.</mml:mo></mml:mrow></mml:math>??=????.[S25]where the elements of <mml:math><mml:mi>??</mml:mi></mml:math>?? are the local phase variations over time. <mml:math><mml:mi>??</mml:mi></mml:math>?? and <mml:math><mml:mi>??</mml:mi></mml:math>?? contain the spatial gradients of phase and amplitude, respectively. We have demonstrated that <mml:math><mml:mi>??</mml:mi></mml:math>?? is close to noiseless (Eq. S24), and our assumption about the amplitudes being large means <mml:math><mml:mi>??</mml:mi></mml:math>?? is also approximately noiseless, which means that <mml:math><mml:mi>??</mml:mi></mml:math>?? is noiseless.

We split <mml:math><mml:mi>??</mml:mi></mml:math>?? into the sum of its mean <mml:math><mml:msub><mml:mi>??</mml:mi><mml:mn>0</mml:mn></mml:msub></mml:math>??0 and variance, a multivariate Gaussian random variable, denoted as <mml:math><mml:msub><mml:mi>??</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:math>??n, that has zero mean and variance that depends only on image noise and local image content. Then, the flow estimate is<mml:math display="block"><mml:mrow><mml:mrow><mml:mi>??</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:munder><mml:munder accentunder="true"><mml:msub><mml:mi>????</mml:mi><mml:mn>0</mml:mn></mml:msub><mml:mo movablelimits="false">?</mml:mo></mml:munder><mml:mtext>True flow</mml:mtext></mml:munder><mml:mo>+</mml:mo><mml:munder><mml:munder accentunder="true"><mml:msub><mml:mi>????</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo movablelimits="false">?</mml:mo></mml:munder><mml:mrow><mml:mtext>Noise Term</mml:mtext><mml:mo>?</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mtext>Covariance Matrix</mml:mtext><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:munder></mml:mrow></mml:mrow><mml:mo>.</mml:mo></mml:mrow></mml:math>??=????0?True flow+????n?Noise Term?(Covariance Matrix).[S26]The noise term doesn’t depend on the value of the true flow <mml:math><mml:msub><mml:mi>????</mml:mi><mml:mn>0</mml:mn></mml:msub></mml:math>????0. Therefore, the estimated covariance matrix is valid even when the motions are nonzero but small.

From Eq. S15, we know that <mml:math><mml:msub><mml:mi>??</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:math>??n is proportional to the amount of sensor noise. Therefore, an interesting consequence of Eq. S26 is that the motion covariance matrix will be linearly proportional to the variance of the sensor noise.

Results and Discussion

We applied the motion microscope to several problems in biology and engineering. First, we used it to reveal one component of the mechanics of hearing. The mammalian cochlea is a remarkable sensor that can perform high-quality spectral analysis to discriminate as many as 30 frequencies in the interval of a semitone (8). These extraordinary properties of the hearing organ depend on traveling waves of motion that propagate along the cochlear spiral. These wave motions are coupled to the extremely sensitive sensory receptor cells via the tectorial membrane, a gelatinous structure that is 97% water (9).

To better understand the functional role of the tectorial membrane in hearing, we excised segments of the tectorial membrane from a mouse cochlea and stimulated it with audio frequency vibrations (Movie S1 and Fig. 2A). Prior work suggested that motions of the tectorial membrane would rapidly decay with distance from the point of stimulation (10). The unprocessed video of the tectorial membrane appeared static, making it difficult to verify this. However, when the motions were amplified 20 times, waves that persisted over hundreds of micrometers were revealed (Movie S1 and Fig. 2 BE).

Fig. 2.

Exploring the mechanical properties of a mammalian tectorial membrane with the motion microscope. (A) The experimental setup used to stroboscopically film a stimulated mammalian tectorial membrane (TectaY1870C/+). Subfigure Copyright (2007) National Academy of Sciences of the United States of America. Reproduced from ref. 12. (B) Two of the eight captured frames . (Movie S1, data previously published in ref. 13). (C) Corresponding frames from the motion-magnified video in which displacement from the mean was magnified 20<mml:math><mml:mo>×</mml:mo></mml:math>×. The orange and purple lines on top of the tectorial membrane in B are warped according to magnified motion vectors to produce the orange and purple lines in C. (D) The vertical displacement along the orange and purple lines in B is shown for three frames. (E) The power spectrum of the motion signal and noise power is shown in the direction of least variance at the magenta and green points in B.

Subpixel motion analysis suggests that these waves play a prominent role in determining the sensitivity and frequency selectivity of hearing (11??14). Magnifying motions has provided new insights into the underlying physical mechanisms of hearing. Ultimately, the motion microscope could be applied to see and interpret the nanoscale motions of a multitude of biological systems.

We also applied the motion microscope to the field of modal analysis, in which a structure’s resonant frequencies and mode shapes are measured to characterize its dynamic behavior (15). Common applications are to validate finite element models and to detect changes or damage in structures (16). Typically, this is done by measuring vibrations at many different locations on the structure in response to a known input excitation. However, approximate measurements can be made under operational conditions assuming broadband excitation (17). Contact accelerometers have been traditionally used for modal analysis, but densely instrumenting a structure can be difficult and tedious, and, for light structures, the accelerometers’ mass can affect the measurement.

The motion microscope offers many advantages over traditional sensors. The structure is unaltered by the measurement, the measurements are spatially dense, and the motion-magnified video allows for easy interpretation of the motions. While only structural motions in the image plane are visible, this can be mitigated by choosing the viewpoint carefully.

We applied the motion microscope to modal analysis by filming the left span of a suspension bridge from 80 m away (Fig. 3A). The central span was lowered and impacted the left span. Despite this, the left span looks completely still in the input video (Fig. 3B). Two of its modal shapes are revealed in Movie S2 when magnified 400<mml:math><mml:mo>×</mml:mo></mml:math>× (1.6 Hz to 1.8 Hz) and 250<mml:math><mml:mo>×</mml:mo></mml:math>× (2.4 Hz to 2.7 Hz). In Fig. 3 C and D, we show time slices from the motion-magnified videos, displacements versus time at three points, and the estimated noise standard deviations. We also used accelerometers to measure the motions of the bridge at two of those points (Fig. 3B). The motion microscope matches the accelerometers within error bars. In a second example, we show the modal shapes of a pipe after it is struck with a hammer (Modal Shapes of a Pipe, Fig. S10, and Movie S3).

Fig. 3.

The motion microscope reveals modal shapes of a lift bridge. (A) The outer spans of the bridge are fixed while the central span moves vertically. (B) The left span was filmed while the central span was lowered. A frame from the resulting video and a time slice at the red line are shown. (C) Displacement and noise SD from the motion microscope are shown for motions in a 1.6- to 1.8-Hz band at the cyan, green, and orange points in B. Doubly integrated data from accelerometers at the cyan and green points are also shown. A time slice from the motion-magnified video is shown (Movie S2). The time at which the central span is fully lowered is marked as “impact.” (D) Same as C, but for motions in a 2.4- to 2.7-Hz band.

Fig. S10.

The motion microscope is applied to a pipe being struck by a hammer. (A) A frame from the input video, recorded at 24,096 FPS. (BF) (Top) A frame is shown from five motion-magnified videos showing motions amplified in the specified frequency bands (Movie S3). (Middle) Modal shapes recovered from a quantitative analysis of the motions are shown in blue. The theoretically derived modal shapes, shown in dashed orange, are overlaid for comparison over a perfect circle in dotted black. (Bottom) Displacement vs. time and the estimated noise SD is shown at the green point marked in A.

In our final example, we used the motion microscope to verify the functioning of elastic metamaterials, artificially structured materials designed to manipulate and control the propagation of elastic waves. They have received much attention (18) because of both their rich physics and their potential applications, which include wave guiding (19), cloaking (20), acoustic imaging (21), and noise reduction (22). Several efforts have been made to experimentally characterize the elastic wave phenomena observed in these systems. However, as the small amplitude of the propagating waves makes it impossible to directly visualize them, the majority of the experimental investigations have focused on capturing the band gaps through the use of accelerometers, which only provide point measurements. Visualizing the mechanical motions everywhere in the metamaterials has only been possible using expensive and highly specialized setups like scanning laser vibrometers (23).

We focus on a metamaterial comprising an elastic matrix with embedded resonating units, which consists of copper cores connected to four elastic beams (24). Even when vibrated, this metamaterial appears stationary, making it difficult to determine if the metamaterial is functioning correctly (Movies S4 and S5). Previously, these miniscule vibrations were measured with two accelerometers (24). This method only provides point measurements, making it difficult to verify the successful attenuation of vibrations. We gain insight and understanding of the system by visually amplifying its motion.

The elastic metamaterial was forced at two frequencies, 50 Hz and 100 Hz, and, in each case, it was filmed at 500 frames per second (FPS) (Fig. 4A). The motions in 20-Hz bands around the forcing frequencies were amplified, revealing that the metamaterial functions as expected (24), passing 50-Hz waves and rapidly attenuating 100-Hz waves (Movies S4 and S5). We also compared our results with predictions from a finite element analysis simulation (Fig. 4 B and C). In Fig. 4D, we show heatmaps of the estimated displacement amplitudes overlaid on the motion-magnified frames. We interpolated displacements into textureless regions, which had noisy motion estimates. The agreement between the simulation (Fig. 4C) and the motion microscope (Fig. 4D) demonstrates the motion microscope’s usefulness in verifying the correct function of the metamaterial.

Fig. 4.

The motion microscope is used to investigate properties of a designed metamaterial. (A) The metamaterial is forced at 50 Hz and 100 Hz in two experiments, and a frame from the 50-Hz video is shown. (B) One-dimensional slices of the displacement amplitude along the red line in A are shown for both a finite element analysis simulation and the motion microscope. (C) A finite element analysis simulation of the displacement of the metamaterial. Color corresponds to displacement amplitude, and the material is warped according to magnified simulated displacement vectors. (D) Results from the motion microscope are shown. Displacement magnitudes are shown in color at every point on the metamaterial, overlayed on frames from the motion-magnified videos (Movies S4 and S5).

Modal Shapes of a Pipe

We made a measurement of a pipe being struck by a hammer, viewed end on by a camera, to capture its radial–circumferential vibration modes. A standard <mml:math><mml:msup><mml:mn>4</mml:mn><mml:mi>″</mml:mi></mml:msup></mml:math>4″ schedule 40 PVC pipe was recorded with a high-speed camera at 24,000 FPS, at a resolution of 192 <mml:math><mml:mo>×</mml:mo></mml:math>× 192 (Fig. S10A). Fig. S10 BF shows frames from the motion-magnified videos for different resonant frequencies showing the mode shapes, a comparison of the quantitatively measured mode shapes with the theoretically derived mode shapes, and the displacement vs. time of the specific frequency band and the estimated noise SD. The tiny modal motions are seen clearly. Obtaining vibration data with traditional sensors with the same spatial density would be extremely difficult, and accelerometers placed on the pipe would alter its resonant frequencies.

This sequence also demonstrates the accuracy of our noise analysis. The noise standard deviations show that the detected motions before impact, when the pipe is stationary, are likely spurious.

Synthetic Validation

We validate the accuracy of our motion estimation on a synthetic dataset and compare its accuracy to Ncorr, a digital image correlation technique (30) used by mechanical engineers (31). In this experiment, we did not use temporal filtering.

We created a synthetic dataset of frame pairs with known ground-truth motions between them. We took natural images from the frames of real videos (Fig. S5A) and warped them according to known motion fields using cubic b-spline interpolation (32). Sample motions fields, shown in Fig. S5B, were produced by Gaussian-blurring IID Gaussian random variables. We used Gaussian blurs with SDs, ranging from zero (no filtering) to infinite (a constant motion field). We also varied the RMS amplitude of the motion fields from 0.001 pixels to three pixels. For each set of motion field parameters, we sampled five different motion fields to produce a total of 155 motion fields with different amplitudes and spatial coherence. To test the accuracy of the algorithms rather than their sensitivity to noise, no noise was added to the image pairs.

We ran our motion estimation technique and Ncorr on each image pair. We then computed the mean absolute difference between the estimated and ground-truth motion fields. Then, for each set of motion field parameters, we averaged the mean absolute differences across image pairs and divided the result by the RMS motion amplitude to make the errors comparable over motion sizes. The result is the average relative error as a percentage of RMS motion amplitude (Fig. S5C).

Both Ncorr and our method perform best when the motions are spatially coherent (filter standard deviations greater than 10 pixels) with relative errors under 10%. This reflects the fact that both methods assume the motion field is spatially smooth. Across motion sizes, our method performs best for subpixel motions (5% relative error). This is probably because we assume that the motions are small when we linearize the phase constancy equation (Eq. S3). Ncorr has twice the relative error (10%) for the same motion fields.

The relative errors reported in Fig. S5C are computed over all pixels, including those that are in smooth, textureless regions where it is difficult to estimate the motions. If we restrict the error metric to only take into account pixels at edges and corners, the average relative errors for small (<mml:math><mml:mrow><mml:mo><</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math><1 pixel RMS), spatially coherent (filter SD <mml:math><mml:mrow><mml:mo>></mml:mo><mml:mn>10</mml:mn></mml:mrow></mml:math>>10 pixels) motions drops by a factor of 2.5 for both methods.

We generated synthetic images that are slight translations of each other and added Gaussian noise to the frames (Fig. S8A). For each translation amount, we compute the motion between the two frames over 4,000 runs. We compute the sample covariance matrix over the runs as a measure of the ground-truth noise level. We also used our noise analysis to estimate the covariance matrix at the points denoted in red.

The off-diagonal term of the covariance matrix should be zero for the synthetic frames in Fig. S8A. For both examples, it is within <mml:math><mml:msup><mml:mn>10</mml:mn><mml:mrow><mml:mo>?</mml:mo><mml:mn>5</mml:mn></mml:mrow></mml:msup></mml:math>10?5 square pixels of zero for all translation amounts (Fig. S8B).

The relative errors of the horizontal and vertical variances vs. translation (Fig. S8 C and D) are less than <mml:math><mml:mrow><mml:mn>5</mml:mn><mml:mo>%</mml:mo></mml:mrow></mml:math>5% for subpixel motions. This is likely due to the random nature of the simulation. For motions greater than one pixel, the covariance matrix has a relative error of less than <mml:math><mml:mrow><mml:mn>25</mml:mn><mml:mo>%</mml:mo></mml:mrow></mml:math>25%.

Low-Amplitude Coefficients Have Noisy Phase

Each frame of the input video <mml:math><mml:mrow><mml:mi>I</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>I(x,y,t) is transformed to the complex steerable pyramid representation by being spatially band-passed by a bank of quadrature pairs of filters <mml:math><mml:msub><mml:mi>g</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub></mml:math>gr,θ and <mml:math><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub></mml:math>hr,θ, where <mml:math><mml:mi>r</mml:mi></mml:math>r corresponds to different spatial scales of the pyramid and <mml:math><mml:mi>θ</mml:mi></mml:math>θ corresponds to different orientations. We use the filters of Portilla and Simoncelli, which are specified and applied in the frequency domain (25). For one such filter pair, the result is a set of complex coefficients <mml:math><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:math>Sr,θ+iTr,θ whose real and imaginary part are given by<mml:math display="block"><mml:mrow><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>g</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub><mml:mo>?</mml:mo><mml:mi>I</mml:mi></mml:mrow><mml:mo>?</mml:mo><mml:mtext>and</mml:mtext><mml:mo>?</mml:mo><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub><mml:mo>?</mml:mo><mml:mi>I</mml:mi></mml:mrow></mml:mrow><mml:mo>,</mml:mo></mml:mrow></mml:math>Sr,θ=gr,θ?I?and?Tr,θ=hr,θ?I,[S4]where the convolution is applied spatially at each time instant <mml:math><mml:mi>t</mml:mi></mml:math>t. This filter pair is converted to amplitude <mml:math><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub></mml:math>Ar,θ and phase <mml:math><mml:msub><mml:mi>?</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub></mml:math>?r,θ by the operations<mml:math display="block"><mml:mrow><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:msqrt><mml:mrow><mml:msubsup><mml:mi>S</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow><mml:mn>2</mml:mn></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi>T</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:msqrt><mml:mo>?</mml:mo><mml:mtext>and</mml:mtext><mml:mo>?</mml:mo><mml:msub><mml:mi>?</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:msup><mml:mi>tan</mml:mi><mml:mrow><mml:mo>?</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub><mml:mo>/</mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>.</mml:mo></mml:mrow></mml:math>Ar,θ=Sr,θ2+Tr,θ2?and??r,θ=tan?1(Tr,θ/Sr,θ).[S5]

Filters <mml:math><mml:msub><mml:mi>g</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub></mml:math>gr,θ and <mml:math><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub></mml:math>hr,θ are in quadrature relationship, which means that they select for the same frequencies, but are 90 degrees out of phase like <mml:math><mml:mi>sin</mml:mi></mml:math>sin and <mml:math><mml:mi>cos</mml:mi></mml:math>cos. A consequence is that they are uncorrelated and have equal RMS values. Complex coefficients at antipodal orientations are conjugate symmetric and contain redundant information. Therefore, we only use a half-circle of orientations.

Suppose the observed video <mml:math><mml:mrow><mml:mi>I</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>I(x,y,t) is contaminated with IID noise <mml:math><mml:mrow><mml:msub><mml:mi>I</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>In(x,y,t) of variance <mml:math><mml:msup><mml:mi>σ</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:math>σ2,<mml:math display="block"><mml:mrow><mml:mrow><mml:mrow><mml:mi>I</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>I</mml:mi><mml:mn>0</mml:mn></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:msub><mml:mi>I</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow><mml:mo>,</mml:mo></mml:mrow></mml:math>I(x,y,t)=I0(x,y,t)+In(x,y,t),[S6]where <mml:math><mml:mrow><mml:msub><mml:mi>I</mml:mi><mml:mn>0</mml:mn></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>I0(x,y,t) is the underlying noiseless video. This noise causes the complex steerable pyramid coefficients to be noisy, which causes the local phase to be noisy. We show that the local phase at a point has an approximate Gaussian distribution when the amplitude is high and is approximately uniformly distributed when the amplitude is low.

The transformed representation has response<mml:math display="block"><mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi>g</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub><mml:mo>?</mml:mo><mml:msub><mml:mi>I</mml:mi><mml:mn>0</mml:mn></mml:msub></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi>g</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub><mml:mo>?</mml:mo><mml:msub><mml:mi>I</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow><mml:mo>?</mml:mo><mml:mtext>and</mml:mtext><mml:mo>?</mml:mo><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>?</mml:mo><mml:msub><mml:mi>I</mml:mi><mml:mn>0</mml:mn></mml:msub></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub><mml:mo>?</mml:mo><mml:msub><mml:mi>I</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>.</mml:mo></mml:mrow></mml:math>gr,θ?I0+gr,θ?In?and?hr,θ?I0+hr,θ?In.[S7]The first term in each expression is the noiseless filter response, which we denote <mml:math><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mrow><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:msub><mml:mi>g</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub><mml:mo>?</mml:mo><mml:msub><mml:mi>I</mml:mi><mml:mn>0</mml:mn></mml:msub></mml:mrow></mml:mrow></mml:math>S0,r,θ=gr,θ?I0 for the real part and <mml:math><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub><mml:mo>?</mml:mo><mml:msub><mml:mi>I</mml:mi><mml:mn>0</mml:mn></mml:msub></mml:mrow></mml:mrow></mml:math>T0,r,θ=hr,θ?I0 for the imaginary part. The second term in each expression is filtered noise, which we denote as <mml:math><mml:msub><mml:mi>S</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>,</mml:mo><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub></mml:math>Sn,r,θ and <mml:math><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>,</mml:mo><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub></mml:math>Tn,r,θ. At a single point, <mml:math><mml:msub><mml:mi>S</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>,</mml:mo><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub></mml:math>Sn,r,θ and <mml:math><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>,</mml:mo><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub></mml:math>Tn,r,θ are Gaussian random variables with covariance matrix equal to<mml:math display="block"><mml:mrow><mml:mrow><mml:mrow><mml:msup><mml:mi>σ</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mtable displaystyle="true"><mml:mtr><mml:mtd columnalign="center"><mml:mrow><mml:msub><mml:mo largeop="true" movablelimits="false" symmetric="true">∑</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:msub><mml:mi>g</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mrow></mml:mtd><mml:mtd columnalign="center"><mml:mrow><mml:msub><mml:mo largeop="true" movablelimits="false" symmetric="true">∑</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:msub><mml:mi>g</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd columnalign="center"><mml:mrow><mml:msub><mml:mo largeop="true" movablelimits="false" symmetric="true">∑</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:msub><mml:mi>g</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mtd><mml:mtd columnalign="center"><mml:mrow><mml:msub><mml:mo largeop="true" movablelimits="false" symmetric="true">∑</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mrow></mml:mtd></mml:mtr></mml:mtable><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:msup><mml:mi>σ</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mrow><mml:munder><mml:mo largeop="true" movablelimits="false" symmetric="true">∑</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:munder><mml:mrow><mml:msub><mml:mi>g</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup><mml:mi>I</mml:mi></mml:mrow></mml:mrow></mml:mrow></mml:mrow><mml:mo>,</mml:mo></mml:mrow></mml:math>σ2(∑x,ygr,θ(x,y)2∑x,ygr,θ(x,y)hr,θ(x,y)∑x,ygr,θ(x,y)hr,θ(x,y)∑x,yhr,θ(x,y)2)=σ2∑x,ygr,θ(x,y)2I,[S8]where <mml:math><mml:mi>I</mml:mi></mml:math>I is the identity matrix and equality follows from the fact that <mml:math><mml:msub><mml:mi>g</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub></mml:math>gr,θ and <mml:math><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub></mml:math>hr,θ are quadrature pairs.

We suppress the indices <mml:math><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:math>r,θ in this section for readability. From Eq. S5, the noiseless and noisy phases are given by<mml:math display="block"><mml:mrow><mml:mrow><mml:msub><mml:mi>?</mml:mi><mml:mn>0</mml:mn></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:msup><mml:mi>tan</mml:mi><mml:mrow><mml:mo>?</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mn>0</mml:mn></mml:msub><mml:mo>/</mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mn>0</mml:mn></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mtext>and</mml:mtext><mml:mo>?</mml:mo><mml:mi>?</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:msup><mml:mi>tan</mml:mi><mml:mrow><mml:mo>?</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mn>0</mml:mn></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>T</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>/</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mn>0</mml:mn></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>.</mml:mo></mml:mrow></mml:math>?0=tan?1(T0/S0)and??=tan?1((T0+Tn)/(S0+Sn)).[S9]Their difference linearized around <mml:math><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mn>0</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>T</mml:mi><mml:mn>0</mml:mn></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math>(S0,T0) is<mml:math display="block"><mml:mrow><mml:mrow><mml:mrow><mml:mrow><mml:mrow><mml:msup><mml:mi>tan</mml:mi><mml:mrow><mml:mo>?</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mn>0</mml:mn></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>T</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mn>0</mml:mn></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:mfrac><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>?</mml:mo><mml:mrow><mml:mrow><mml:msup><mml:mi>tan</mml:mi><mml:mrow><mml:mo>?</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mfrac><mml:msub><mml:mi>T</mml:mi><mml:mn>0</mml:mn></mml:msub><mml:msub><mml:mi>S</mml:mi><mml:mn>0</mml:mn></mml:msub></mml:mfrac><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:msub><mml:mi>S</mml:mi><mml:mn>0</mml:mn></mml:msub></mml:mrow><mml:mo>?</mml:mo><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:msub><mml:mi>T</mml:mi><mml:mn>0</mml:mn></mml:msub></mml:mrow></mml:mrow><mml:msubsup><mml:mi>A</mml:mi><mml:mn>0</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mfrac><mml:mo>+</mml:mo><mml:mrow><mml:mi>O</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mfrac><mml:mrow><mml:msubsup><mml:mi>S</mml:mi><mml:mi>n</mml:mi><mml:mn>2</mml:mn></mml:msubsup><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:msub><mml:mi>T</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:msubsup><mml:mi>T</mml:mi><mml:mi>n</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow><mml:msubsup><mml:mi>A</mml:mi><mml:mn>0</mml:mn><mml:mn>4</mml:mn></mml:msubsup></mml:mfrac><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow><mml:mo>.</mml:mo></mml:mrow></mml:math>tan?1(T0+TnS0+Sn)?tan?1(T0S0)=SnS0?TnT0A02+O(Sn2,SnTn,Tn2A04).[S10]The terms <mml:math><mml:msubsup><mml:mi>S</mml:mi><mml:mi>n</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:math>Sn2 and <mml:math><mml:msubsup><mml:mi>T</mml:mi><mml:mi>n</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:math>Tn2 are expected to be equal to their variance <mml:math><mml:mrow><mml:msup><mml:mi>σ</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mrow><mml:mo>∑</mml:mo><mml:mrow><mml:msub><mml:mi>g</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mrow></mml:mrow></mml:math>σ2∑gr,θ(x,y)2. Therefore, if <mml:math><mml:mrow><mml:msubsup><mml:mi>A</mml:mi><mml:mn>0</mml:mn><mml:mn>2</mml:mn></mml:msubsup><mml:mo>></mml:mo><mml:mo>></mml:mo><mml:mrow><mml:msup><mml:mi>σ</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mrow><mml:mo>∑</mml:mo><mml:mrow><mml:msub><mml:mi>g</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:math>A02>>σ2∑gr,θ(x,y)2, higher order terms are negligible. In this case, we see that the phase is approximately a linear combination of Gaussian random variables and is therefore Gaussian. This is illustrated empirically by local phase histograms of the green and blue points in Fig. S3 AE.

For these high-amplitude points, we compute the variance of the phase of a coefficient,<mml:math display="block"><mml:mrow><mml:mi>E</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msup><mml:mi>tan</mml:mi><mml:mrow><mml:mo>?</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mn>0</mml:mn></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>T</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mn>0</mml:mn></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:mfrac><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>?</mml:mo><mml:mrow><mml:msup><mml:mi>tan</mml:mi><mml:mrow><mml:mo>?</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mfrac><mml:msub><mml:mi>T</mml:mi><mml:mn>0</mml:mn></mml:msub><mml:msub><mml:mi>S</mml:mi><mml:mn>0</mml:mn></mml:msub></mml:mfrac><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup><mml:mo>]</mml:mo></mml:mrow></mml:mrow></mml:math>E[(tan?1(T0+TnS0+Sn)?tan?1(T0S0))2][S11]<mml:math display="block"><mml:mrow><mml:mo>≈</mml:mo><mml:mrow><mml:mi>E</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mfrac><mml:mrow><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mn>0</mml:mn></mml:msub><mml:msub><mml:mi>S</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow><mml:mo>?</mml:mo><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mn>0</mml:mn></mml:msub><mml:msub><mml:mi>T</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:msubsup><mml:mi>A</mml:mi><mml:mn>0</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mfrac><mml:mo>)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup><mml:mo>]</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math>≈E[(T0Sn?S0TnA02)2][S12]<mml:math display="block"><mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mi>E</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mfrac><mml:mrow><mml:mrow><mml:mrow><mml:msubsup><mml:mi>T</mml:mi><mml:mn>0</mml:mn><mml:mn>2</mml:mn></mml:msubsup><mml:msubsup><mml:mi>S</mml:mi><mml:mi>n</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow><mml:mo>?</mml:mo><mml:mrow><mml:mn>2</mml:mn><mml:msub><mml:mi>T</mml:mi><mml:mn>0</mml:mn></mml:msub><mml:msub><mml:mi>S</mml:mi><mml:mn>0</mml:mn></mml:msub><mml:msub><mml:mi>S</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:msub><mml:mi>T</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:msubsup><mml:mi>S</mml:mi><mml:mn>0</mml:mn><mml:mn>2</mml:mn></mml:msubsup><mml:msubsup><mml:mi>T</mml:mi><mml:mi>n</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mrow><mml:msubsup><mml:mi>A</mml:mi><mml:mn>0</mml:mn><mml:mn>4</mml:mn></mml:msubsup></mml:mfrac><mml:mo>]</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math>=E[T02Sn2?2T0S0SnTn+S02Tn2A04][S13]<mml:math display="block"><mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msup><mml:mi>σ</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mrow><mml:mo>∑</mml:mo><mml:mrow><mml:msubsup><mml:mi>g</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow><mml:mn>2</mml:mn></mml:msubsup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mi>T</mml:mi><mml:mn>0</mml:mn><mml:mn>2</mml:mn></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi>S</mml:mi><mml:mn>0</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow><mml:msubsup><mml:mi>A</mml:mi><mml:mn>0</mml:mn><mml:mn>4</mml:mn></mml:msubsup></mml:mfrac></mml:mrow></mml:math>=σ2∑gr,θ2(T02+S02)A04[S14]<mml:math display="block"><mml:mrow><mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msup><mml:mi>σ</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mrow><mml:mo>∑</mml:mo><mml:msubsup><mml:mi>g</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mrow><mml:msubsup><mml:mi>A</mml:mi><mml:mn>0</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mfrac></mml:mrow><mml:mo>.</mml:mo></mml:mrow></mml:math>=σ2∑gr,θ2A02.[S15]The first approximation follows from the linearization of Eq. S10.

When the amplitude is low compared with the noise level (<mml:math><mml:mrow><mml:msubsup><mml:mi>A</mml:mi><mml:mn>0</mml:mn><mml:mn>2</mml:mn></mml:msubsup><mml:mo><</mml:mo><mml:mo><</mml:mo><mml:mrow><mml:msup><mml:mi>σ</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mrow><mml:mo largeop="true" symmetric="true">∑</mml:mo><mml:mrow><mml:msub><mml:mi>g</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:math>A02<<σ2∑gr,θ(x,y)2), the linearization of Eq. S10 is not accurate. In this case, <mml:math><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mn>0</mml:mn></mml:msub><mml:mo>≈</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math>S0≈0 and <mml:math><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mn>0</mml:mn></mml:msub><mml:mo>≈</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math>T0≈0, and phase is given by<mml:math display="block"><mml:mrow><mml:mrow><mml:msup><mml:mi>tan</mml:mi><mml:mrow><mml:mo>?</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mfrac><mml:msub><mml:mi>T</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:msub><mml:mi>S</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mfrac><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>.</mml:mo></mml:mrow></mml:math>tan?1(TnSn).[S16]<mml:math><mml:msub><mml:mi>T</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:math>Tn and <mml:math><mml:msub><mml:mi>S</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:math>Sn are uncorrelated Gaussian random variables with equal variance, which means that the phase is a uniformly random number. The phase at such points contains no information and intuitively corresponds to places where there is no image content in a given pyramid level (Fig. S3E, red point).

Conclusion

Small motions can reveal important dynamics in a system under study, or can foreshadow large-scale motions to come. Motion microscopy facilitates their visualization, and has been demonstrated here for motion amplification factors from 20<mml:math><mml:mo>×</mml:mo></mml:math>× to 400<mml:math><mml:mo>×</mml:mo></mml:math>× across length scales ranging from 100 nm to 0.3 mm.

Materials and Methods

Quantitative Motion Estimation.

For every pixel at location <mml:math><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math>(x,y) and time <mml:math><mml:mi>t</mml:mi></mml:math>t, we combine spatial local phase information in different subbands of the frames of the input video using the least squares objective function,<mml:math display="block"><mml:mrow><mml:mrow><mml:munder><mml:mrow><mml:mi>arg</mml:mi><mml:mi>min</mml:mi></mml:mrow><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi></mml:mrow></mml:munder><mml:mrow><mml:munder><mml:mo largeop="true" movablelimits="false" symmetric="true">∑</mml:mo><mml:mi>i</mml:mi></mml:munder><mml:mrow><mml:msubsup><mml:mi>A</mml:mi><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>θ</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mn>2</mml:mn></mml:msubsup><mml:msup><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mo>?</mml:mo><mml:msub><mml:mi>?</mml:mi><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>θ</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo>?</mml:mo><mml:mi>x</mml:mi></mml:mrow></mml:mfrac><mml:mo>,</mml:mo><mml:mfrac><mml:mrow><mml:mo>?</mml:mo><mml:msub><mml:mi>?</mml:mi><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>θ</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo>?</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>?</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>?</mml:mo><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:msub><mml:mi>?</mml:mi><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>θ</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:msub></mml:mrow></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mrow></mml:mrow><mml:mo>.</mml:mo></mml:mrow></mml:math>argminu,v∑iAri,θi2[(??ri,θi?x,??ri,θi?y)?(u,v)?Δ?ri,θi]2.[2]Arguments have been suppressed for readability; <mml:math><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>θ</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>Ari,θi(x,y,t) and <mml:math><mml:mrow><mml:msub><mml:mi>?</mml:mi><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>θ</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>?ri,θi(x,y,t) are the spatial local amplitude and phase of a steerable pyramid representation of the image, and <mml:math><mml:mrow><mml:mi>u</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>u(x,y,t) and <mml:math><mml:mrow><mml:mi>v</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>v(x,y,t) are the horizontal and vertical motions, respectively, at every pixel. The solution (<mml:math><mml:mrow><mml:mi>??</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>??=(u,v)) is our motion estimate and is equal to<mml:math display="block"><mml:mrow><mml:mi>??</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msup><mml:mi>??</mml:mi><mml:mi>T</mml:mi></mml:msup><mml:mi>????</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:mo>?</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msup><mml:mi>??</mml:mi><mml:mi>T</mml:mi></mml:msup><mml:mi>????</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mrow></mml:mrow></mml:math>??=(??T????)?1(??T????),[3]where <mml:math><mml:mi>??</mml:mi></mml:math>?? is <mml:math><mml:mrow><mml:mi>N</mml:mi><mml:mo>×</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:math>N×2 with <mml:math><mml:mi>i</mml:mi></mml:math>ith row <mml:math><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mfrac><mml:mo>?</mml:mo><mml:mrow><mml:mo>?</mml:mo><mml:mi>x</mml:mi></mml:mrow></mml:mfrac><mml:msub><mml:mi>?</mml:mi><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>θ</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mfrac><mml:mo>?</mml:mo><mml:mrow><mml:mo>?</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:mfrac><mml:msub><mml:mi>?</mml:mi><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>θ</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math>(??x?ri,θi,??y?ri,θi), <mml:math><mml:mi>??</mml:mi></mml:math>?? is <mml:math><mml:mrow><mml:mi>N</mml:mi><mml:mo>×</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math>N×1 with <mml:math><mml:mi>i</mml:mi></mml:math>ith row <mml:math><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:msub><mml:mi>?</mml:mi><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>θ</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:msub></mml:mrow></mml:math>Δ?ri,θi, and <mml:math><mml:mi>??</mml:mi></mml:math>?? is a diagonal <mml:math><mml:mrow><mml:mi>N</mml:mi><mml:mo>×</mml:mo><mml:mi>N</mml:mi></mml:mrow></mml:math>N×N matrix with <mml:math><mml:mi>i</mml:mi></mml:math>ith diagonal element <mml:math><mml:msubsup><mml:mi>A</mml:mi><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>θ</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mn>2</mml:mn></mml:msubsup></mml:math>Ari,θi2.

To increase the signal-to-noise ratio, we assume the motion field is constant in a small window around each pixel. This gives additional constraints from neighboring pixels, weighted by both their amplitude squared and the corresponding value in a smoothing kernel <mml:math><mml:mi>K</mml:mi></mml:math>K, to the objective described in Eq. 3. To handle temporal filtering, we replace the local phase variations <mml:math><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:msub><mml:mi>?</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>θ</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mpadded width="+1.7pt"><mml:mo>,</mml:mo></mml:mpadded><mml:mi>y</mml:mi><mml:mpadded width="+1.7pt"><mml:mo>,</mml:mo></mml:mpadded><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>Δ?r,θ(x,y,t) with temporally filtered local phase variations.

We use a four-orientation complex steerable pyramid specified by Portilla and Simoncelli (25). We use only the two highest-frequency scales of the complex steerable pyramid, for a total of eight subbands. We use a Gaussian spatial smoothing kernel with a SD of 3 pixels and a support of <mml:math><mml:mrow><mml:mn>19</mml:mn><mml:mo>×</mml:mo><mml:mn>19</mml:mn></mml:mrow></mml:math>19×19 pixels. The temporal filter depends on the application.

Noise Model and Creating Synthetic Video.

We estimate the noise level function (26) of a video. We apply derivative of Gaussian filters to the image in the <mml:math><mml:mi>x</mml:mi></mml:math>x and <mml:math><mml:mi>y</mml:mi></mml:math>y directions and use them to compute the gradient magnitude. We exclude pixels where the gradient magnitude is above <mml:math><mml:mn>0.05</mml:mn></mml:math>0.05 on a <mml:math><mml:mn>0</mml:mn></mml:math>0 to <mml:math><mml:mn>1</mml:mn></mml:math>1 intensity scale. At the remaining pixels, we take the temporal variance and mean of the image. We divide the intensity range into 64 equally sized bins. For each bin, we take all pixels with mean inside that bin and take the mean of the corresponding temporal variances of <mml:math><mml:mi>I</mml:mi></mml:math>I to form 64 points that are linearly interpolated to estimate the noise level function <mml:math><mml:mi>f</mml:mi></mml:math>f.

Estimating Covariance Matrices of Motion Vectors.

For an input video <mml:math><mml:mrow><mml:mi>I</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>I(x,y,t), we use the noise level function <mml:math><mml:mi>f</mml:mi></mml:math>f to create a synthetic video<mml:math display="block"><mml:mrow><mml:mrow><mml:msub><mml:mi>I</mml:mi><mml:mi>S</mml:mi></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>I</mml:mi><mml:mn>0</mml:mn></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:msub><mml:mi>I</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:msqrt><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>I</mml:mi><mml:mn>0</mml:mn></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msqrt></mml:mrow></mml:mrow></mml:mrow></mml:math>IS(x,y,t)=I0(x,y,0)+In(x,y,t)f(I0(x,y,0))[4]that is <mml:math><mml:mi>N</mml:mi></mml:math>N frames long. We estimate the covariance matrices of the motion vectors by taking the temporal sample covariance of <mml:math><mml:msub><mml:mi>I</mml:mi><mml:mi>S</mml:mi></mml:msub></mml:math>IS,<mml:math display="block"><mml:mrow><mml:msub><mml:mi mathvariant="normal">Σ</mml:mi><mml:mi>??</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mi>N</mml:mi><mml:mo>?</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:mfrac><mml:mrow><mml:munder><mml:mo largeop="true" movablelimits="false" symmetric="true">∑</mml:mo><mml:mi>t</mml:mi></mml:munder><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>??</mml:mi><mml:mi>S</mml:mi></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>?</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mover accent="true"><mml:mi>??</mml:mi><mml:mo stretchy="false">ˉ</mml:mo></mml:mover></mml:mrow><mml:mi>S</mml:mi></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>??</mml:mi><mml:mi>S</mml:mi></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>?</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mover accent="true"><mml:mi>??</mml:mi><mml:mo stretchy="false">ˉ</mml:mo></mml:mover></mml:mrow><mml:mi>S</mml:mi></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mi>T</mml:mi></mml:msup><mml:mo>,</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:math>Σ??=1N?1∑t(??S(x,y,t)???ˉS(x,y))(??S(x,y,t)???ˉS(x,y))T,[5]where <mml:math><mml:mrow><mml:msub><mml:mrow><mml:mover accent="true"><mml:mi>??</mml:mi><mml:mo stretchy="false">ˉ</mml:mo></mml:mover></mml:mrow><mml:mi>S</mml:mi></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>??ˉS(x,y) is the mean over <mml:math><mml:mi>t</mml:mi></mml:math>t of the motion vectors.

The temporal filter reduces noise and decreases the covariance matrix. Oppenheim and Schafer (27) show that a signal with independent and identically distributed (IID) noise of variance <mml:math><mml:msup><mml:mi>σ</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:math>σ2, when filtered with a filter with impulse response <mml:math><mml:mrow><mml:mi>T</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>T(t), has variance <mml:math><mml:mrow><mml:msub><mml:mo largeop="true" symmetric="true">∑</mml:mo><mml:mi>t</mml:mi></mml:msub><mml:mrow><mml:mi>T</mml:mi><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup><mml:msup><mml:mi>σ</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mrow></mml:math>∑tT(t)2σ2. Therefore, when a temporal filter is used, we multiply the covariance matrix by <mml:math><mml:mrow><mml:msub><mml:mo largeop="true" symmetric="true">∑</mml:mo><mml:mi>t</mml:mi></mml:msub><mml:mrow><mml:mi>T</mml:mi><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mrow></mml:math>∑tT(t)2.

Comparison of Our Motion Estimation to a Laser Vibrometer.

We compare the results of our motion estimation algorithm to that of a laser vibrometer, which measures velocity using Doppler shift (28). In the first experiment, a cantilevered beam was shaken by a mechanical shaker at <mml:math><mml:mn>7.3</mml:mn></mml:math>7.3 Hz, <mml:math><mml:mn>58.3</mml:mn></mml:math>58.3 Hz, <mml:math><mml:mn>128</mml:mn></mml:math>128 Hz, and <mml:math><mml:mn>264</mml:mn></mml:math>264 Hz, the measured modal frequencies of the beam. The relative amplitude of the shaking signal was varied between a factor of <mml:math><mml:mn>5</mml:mn></mml:math>5 and <mml:math><mml:mn>25</mml:mn></mml:math>25 in <mml:math><mml:mn>2.5</mml:mn></mml:math>2.5 increments. We simultaneously recorded a 2,000 FPS video of the beam with a high-speed camera (VisionResearch Phantom V-10) and measured its horizontal velocity with a laser vibrometer (Polytec PDV100). We repeated this experiment for nine different excitation magnitudes, three focal lengths (<mml:math><mml:mn>24</mml:mn></mml:math>24 mm, <mml:math><mml:mn>50</mml:mn></mml:math>50 mm, <mml:math><mml:mn>85</mml:mn></mml:math>85 mm) and eight exposure times (<mml:math><mml:mn>12.5</mml:mn></mml:math>12.5 <mml:math><mml:mi>μ</mml:mi></mml:math>μs, <mml:math><mml:mn>25</mml:mn></mml:math>25 <mml:math><mml:mi>μ</mml:mi></mml:math>μs, <mml:math><mml:mn>50</mml:mn></mml:math>50 <mml:math><mml:mi>μ</mml:mi></mml:math>μs, <mml:math><mml:mn>100</mml:mn></mml:math>100 <mml:math><mml:mi>μ</mml:mi></mml:math>μs, <mml:math><mml:mn>200</mml:mn></mml:math>200 <mml:math><mml:mi>μ</mml:mi></mml:math>μs, <mml:math><mml:mn>300</mml:mn></mml:math>300 <mml:math><mml:mi>μ</mml:mi></mml:math>μs, <mml:math><mml:mn>400</mml:mn></mml:math>400 <mml:math><mml:mi>μ</mml:mi></mml:math>μs, <mml:math><mml:mn>490</mml:mn></mml:math>490 <mml:math><mml:mi>μ</mml:mi></mml:math>μs), for a total of 20 high-speed videos. The beam had an accelerometer mounted on it (white object in Fig. 1A), but we did not use it in this experiment.

We used our motion estimation method to compute the horizontal displacement of the marked, red point on the left side of the accelerometer from the video (Fig. 1A). We applied a temporal band-stop filter to remove motions between 67 Hz and 80 Hz that corresponded to camera motions caused by its cooling fan’s rotation. The laser vibrometer signal was integrated using discrete, trapezoidal integration. Before integration, both signals were high-passed above 2.5 Hz to reduce low-frequency noise in the integrated vibrometer signal. The motion signals from each video were manually aligned. For one video (exposure, <mml:math><mml:mn>490</mml:mn></mml:math>490 <mml:math><mml:mi>μ</mml:mi></mml:math>μs; excitation, <mml:math><mml:mn>25</mml:mn></mml:math>25; and focal length, <mml:math><mml:mn>85</mml:mn></mml:math>85 mm), we plot the two motion signals (Fig. S4 BD). They agree remarkably well, with higher modes well aligned and a correlation of <mml:math><mml:mn>0.997</mml:mn></mml:math>0.997.

To show the sensitivity of the motion microscope, we plot the correlation of our motion estimate and the integrated velocities from the laser vibrometer vs. motion size (RMS displacement). Because the motion’s average size varies over time, we divide each video’s motion signal into eight equal pieces and plot the correlations of each piece in each video in Fig. S4 E and F. For RMS displacements on the order of <mml:math><mml:mrow><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>100</mml:mn></mml:mrow></mml:math>1/100th of a pixel, the correlation between the two signals varies between <mml:math><mml:mn>0.87</mml:mn></mml:math>0.87 and <mml:math><mml:mn>0.94</mml:mn></mml:math>0.94. For motions larger than <mml:math><mml:mrow><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>20</mml:mn></mml:mrow></mml:math>1/20th of a pixel, the correlation is between <mml:math><mml:mn>0.95</mml:mn></mml:math>0.95 and <mml:math><mml:mn>0.999</mml:mn></mml:math>0.999. Possible sources of discrepancy are noise in the motion microscope signal, integrated low-frequency noise in the vibrometer signal, and slight misalignment between the signals. Displacements with RMS smaller than <mml:math><mml:mrow><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>100</mml:mn></mml:mrow></mml:math>1/100th of a pixel were noisier and had lower correlations, indicating that noise in the video prevents the two signals from matching.

As expected, correlation increases with focal length and excitation magnitude, two things that positively correlate with motion size (in pixels) (Fig. S4 G and H). The correlation also increases with exposure, because videos with lower exposure times are noisier (Fig. S4I).

Filming Bridge Sequence.

The bridge was filmed with a monochrome Point Gray Grasshopper3 camera (model GS3-U3-23S6M-C) at 30 FPS with a resolution of 800?<mml:math><mml:mo>×</mml:mo></mml:math>×?600. The central span of the bridge lifted to accommodate marine traffic. Filming was started about 5 s before the central span was lowered to its lowest point.

The accelerometer data were doubly integrated using trapezoidal integration to displacement. In Fig. 3 C and D, both the motion microscope displacement and the doubly integrated acceleration were band-passed with a first-order band-pass Butterworth filter with the specified parameters.

Motion Field Interpolation.

In textureless regions, it may not be possible to estimate the motion at all, and, at one-dimensional structures like edges, the motion field will only be accurate in the direction perpendicular to the edge. These inaccuracies are reflected in the motion covariance matrix. We show how to interpolate the motion field from accurate regions to inaccurate regions, assuming that adjacent pixels have similar motions.

We minimize the following objective function:<mml:math display="block"><mml:mrow><mml:mrow><mml:mstyle displaystyle="true"><mml:munder><mml:mo largeop="true" movablelimits="false" symmetric="true">∑</mml:mo><mml:mi>??</mml:mi></mml:munder></mml:mstyle><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>??</mml:mi><mml:mi>S</mml:mi></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>??</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>?</mml:mo><mml:mrow><mml:mi>??</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>??</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:msubsup><mml:mi mathvariant="normal">Σ</mml:mi><mml:mi>??</mml:mi><mml:mrow><mml:mo>?</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>??</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>??</mml:mi><mml:mi>S</mml:mi></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>??</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>?</mml:mo><mml:mrow><mml:mi>??</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>??</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mi>T</mml:mi></mml:msup></mml:mrow></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:msub><mml:mi>λ</mml:mi><mml:mi>S</mml:mi></mml:msub><mml:mrow><mml:mstyle displaystyle="true"><mml:munder><mml:mo largeop="true" movablelimits="false" symmetric="true">∑</mml:mo><mml:mrow><mml:mi>??</mml:mi><mml:mo>∈</mml:mo><mml:mrow><mml:mi mathvariant="script">N</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>??</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:munder></mml:mstyle><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>??</mml:mi><mml:mi>S</mml:mi></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>??</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>?</mml:mo><mml:mrow><mml:msub><mml:mi>??</mml:mi><mml:mi>S</mml:mi></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>??</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>??</mml:mi><mml:mi>S</mml:mi></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>??</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>?</mml:mo><mml:mrow><mml:msub><mml:mi>??</mml:mi><mml:mi>S</mml:mi></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>??</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mi>T</mml:mi></mml:msup><mml:mo>,</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math>∑??(??S(??)???(??))Σ???1(??)(??S(??)???(??))T+λS∑??∈N(??)(??S(??)???S(??))(??S(??)???S(??))T,[6]where <mml:math><mml:msub><mml:mi>??</mml:mi><mml:mi>S</mml:mi></mml:msub></mml:math>??S is the desired interpolated field, <mml:math><mml:mi>??</mml:mi></mml:math>?? is the estimated motion field, <mml:math><mml:msub><mml:mi mathvariant="normal">Σ</mml:mi><mml:mi>??</mml:mi></mml:msub></mml:math>Σ?? is its covariance, <mml:math><mml:mrow><mml:mi mathvariant="script">N</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>??</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>N(??) is the four-pixel neighborhood of <mml:math><mml:mi>x</mml:mi></mml:math>x, and <mml:math><mml:msub><mml:mi>λ</mml:mi><mml:mi>S</mml:mi></mml:msub></mml:math>λS is a user-specified constant that specifies the relative importance of matching the estimated motion field vs. making adjacent pixels have similar motion fields. The first term seeks to ensure that <mml:math><mml:msub><mml:mi>??</mml:mi><mml:mi>S</mml:mi></mml:msub></mml:math>??S is close to <mml:math><mml:mi>??</mml:mi></mml:math>??, weighted by the expected amount of noise at each pixel. The second term seeks to ensure that adjacent pixels have similar motion fields.

In Fig. 4D, we produce the color overlays by applying the above processing to the estimated motion field with <mml:math><mml:mrow><mml:mpadded width="+1.7pt"><mml:msub><mml:mi>λ</mml:mi><mml:mi>S</mml:mi></mml:msub></mml:mpadded><mml:mo>=</mml:mo><mml:mn>300</mml:mn></mml:mrow></mml:math>λS=300 and then taking the amplitude of each motion vector. We also set components of the covariance matrix that were larger than <mml:math><mml:mn>0.1</mml:mn></mml:math>0.1 square pixels to be an arbitrarily large number (we used <mml:math><mml:mrow><mml:mn>10</mml:mn><mml:mo>,</mml:mo><mml:mn>000</mml:mn></mml:mrow></mml:math>10,000 square pixels).

Finite Element Analysis of Acoustic Metamaterial.

We use Abaqus/Standard (29), a commercial finite-element analyzer, to simulate the metamaterial’s response to forcing. We constructed a 2D model with 37,660 nodes and 11,809 eight-node plane strain quadrilateral elements (Abaqus element type CPE8H). We modeled the rubber as Neo-Hookean, with shear modulus <mml:math><mml:mn>443.4</mml:mn></mml:math>443.4 kPa, bulk modulus <mml:math><mml:mrow><mml:mpadded width="+1.7pt"><mml:mn>7.39</mml:mn></mml:mpadded><mml:mo>×</mml:mo><mml:msup><mml:mn>10</mml:mn><mml:mn>5</mml:mn></mml:msup></mml:mrow></mml:math>7.39×105 kPa, and density <mml:math><mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>050</mml:mn><mml:mtext>kg</mml:mtext></mml:mrow><mml:mo>?</mml:mo><mml:msup><mml:mi mathvariant="normal">m</mml:mi><mml:mn>3</mml:mn></mml:msup></mml:mrow></mml:math>1,050kg?m3 (Abaqus parameters <mml:math><mml:mrow><mml:mpadded width="+1.7pt"><mml:mi mathvariant="normal">C</mml:mi><mml:mn>10</mml:mn></mml:mpadded><mml:mo>=</mml:mo><mml:mn>221.7</mml:mn></mml:mrow></mml:math>C10=221.7 kPa, <mml:math><mml:mrow><mml:mpadded width="+1.7pt"><mml:mi mathvariant="normal">D</mml:mi><mml:mn>1</mml:mn></mml:mpadded><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mpadded width="+1.7pt"><mml:mn>2.71</mml:mn></mml:mpadded><mml:mo>×</mml:mo><mml:msup><mml:mn>10</mml:mn><mml:mrow><mml:mo>?</mml:mo><mml:mn>9</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:mo>?</mml:mo><mml:msup><mml:mtext>Pa</mml:mtext><mml:mrow><mml:mo>?</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:mrow></mml:math>D1=2.71×10?9?Pa?1). We modeled the copper core with shear modulus <mml:math><mml:mrow><mml:mrow><mml:mpadded width="+1.7pt"><mml:mn>4.78</mml:mn></mml:mpadded><mml:mo>×</mml:mo><mml:msup><mml:mn>10</mml:mn><mml:mn>7</mml:mn></mml:msup></mml:mrow><mml:mo>?</mml:mo><mml:mtext>kPa</mml:mtext></mml:mrow></mml:math>4.78×107?kPa, bulk modulus <mml:math><mml:mrow><mml:mpadded width="+1.7pt"><mml:mn>1.33</mml:mn></mml:mpadded><mml:mpadded width="+1.7pt"><mml:msup><mml:mo>×</mml:mo><mml:mn>8</mml:mn></mml:msup></mml:mpadded><mml:mo>?</mml:mo><mml:mtext>kPa</mml:mtext></mml:mrow></mml:math>1.33×8?kPa, and density <mml:math><mml:mrow><mml:mrow><mml:mn>8</mml:mn><mml:mo>,</mml:mo><mml:mn>960</mml:mn><mml:mtext>kg</mml:mtext></mml:mrow><mml:mo>?</mml:mo><mml:msup><mml:mi mathvariant="normal">m</mml:mi><mml:mn>3</mml:mn></mml:msup></mml:mrow></mml:math>8,960kg?m3 (Abaqus parameters <mml:math><mml:mrow><mml:mpadded width="+1.7pt"><mml:mi mathvariant="normal">C</mml:mi><mml:mn>10</mml:mn></mml:mpadded><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mpadded width="+1.7pt"><mml:mn>2.39</mml:mn></mml:mpadded><mml:mo>×</mml:mo><mml:msup><mml:mn>10</mml:mn><mml:mn>7</mml:mn></mml:msup></mml:mrow><mml:mo>?</mml:mo><mml:mtext>kPa</mml:mtext></mml:mrow></mml:mrow></mml:math>C10=2.39×107?kPa, <mml:math><mml:mrow><mml:mpadded width="+1.7pt"><mml:mi mathvariant="normal">D</mml:mi><mml:mn>1</mml:mn></mml:mpadded><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mpadded width="+1.7pt"><mml:mn>1.5</mml:mn></mml:mpadded><mml:mo>×</mml:mo><mml:msup><mml:mn>10</mml:mn><mml:mrow><mml:mo>?</mml:mo><mml:mn>11</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:mo>?</mml:mo><mml:msup><mml:mtext>Pa</mml:mtext><mml:mrow><mml:mo>?</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:mrow></mml:math>D1=1.5×10?11?Pa?1. Geometry and material properties are specified in Wang et al. (24). The bottom of the metamaterial was given a zero-displacement boundary condition. A sinusoidal displacement loading condition at the forcing frequency was applied to a node located halfway between the top and bottom of the metamaterial.

Validation of Noise Analysis with Real Video Data.

We took a video of an accelerometer attached to a beam (Fig. S9A). We used the accelerometer to verify that the beam had no motions between 600 Hz and 700 Hz (Fig. S9B). We then estimated the in-band motions from a video of the beam. Because the beam is stationary in this band, these motions are entirely due to noise, and their temporal sample covariance gives us a ground-truth measure of the noise level (Fig. S9C). We used our simulation with a signal-dependent noise model to estimate the covariance matrix from the first frame of the video, the specific parameters of which are shown in Fig. S9D. The resulting covariance matrices closely match the ground truth (Fig. S9 E and F), showing that our simulation can accurately estimate noise level and error bars.

We also verify that the signal-dependent noise model performs better than the simpler constant variance noise model, in which noise is IID. The result of the constant noise model simulation produced results that are much less accurate than the signal-dependent noise model (Fig. S9 G and H).

In Fig. S9, we only show the component of the covariance matrix corresponding to the direction of least variance, and only at points corresponding to edges or corners.

Using the Motion Microscope and Limitations

The motion microscope works best when the camera is stable, such as when mounted on a tripod. If the camera is handheld or the tripod is not sturdy, the motion microscope may only detect camera motions instead of subject motions. The easiest way to solve this problem is to stabilize the camera. If the camera motions are caused by some periodic source, such as camera fans, that occur at a temporal frequency that the subject is not moving at, they can be removed with a band-stop filter.

If adjacent individual pixels have different motions, it is unlikely that the motion microscope will be able to discern or differentiate them. The motion microscope assumes motions are locally constant, and it won’t be able to properly process videos with high spatial frequency motions. The best way to solve this problem is to increase the resolving power or optical zoom of the imaging system. If noise is not a concern, reducing the width of the spatial Gaussian used to smooth the motions may also help.

It is not possible for the motion microscope to quantify the movement of textureless regions or the full 2D movement of 1D edges. However, the motion covariance matrices produced by the motion microscope will alert the user as to when this is happening. The visualization will still be reasonable, as it is not possible to tell if a textureless region was motion-amplified in the wrong way.

Saturated, pure white pixels may pose a problem for the motion microscope. Slight color changes are the underlying signal used to detect tiny motions. For pixels with clipped intensities, this signal will be missing, and it may not be possible to detect motion. The best ways to mitigate this problem are to reduce the exposure time, change the position of lights, or use a camera with a higher dynamic range.

The motion microscope only quantifies lateral, 2D motions in the imaging plane. Motions of the subject toward or away from the camera will not be properly reported. For a reasonable camera, these 2D motions can be multiplied by a constant to convert from pixels to millimeters or other units of interest. However, if the lens has severe distortion, the constant will vary across the scene. If quantification of the motions is important, the images will need to be lens distortion-corrected before the motion microscope is applied.

The motion microscope visualization is unable to push pixels beyond the support of the basis functions of the steerable pyramid. In practice, this results in a maximum amplified motion of around four pixels. If the amplification is high enough to push an image feature past this, ringing artifacts may occur.

Acknowledgments

We thank Professor Erin Bell and Travis Adams at University of New Hampshire and New Hampshire Department of Transportation for their assistance with filming the Portsmouth lift bridge. This work was supported, in part, by Shell Research, Quanta Computer, National Science Foundation Grants CGV-1111415 and CGV-1122374, and National Institutes of Health Grant R01-DC00238.

Footnotes

  • ?1Present address: Google Research, Google Inc. Mountain View, CA 94043.

  • ?2To whom correspondence should be addressed. Email: billf{at}mit.edu.
  • Author contributions: N.W., J.G.C., J.B.S., D.W., M.R., R.G., D.M.F., O.B., S.H.K., K.B., F.D., and W.T.F. designed research; N.W., J.G.C., J.B.S., D.W., R.G., P.W., S.S., S.H.K., and W.T.F. performed research; N.W., J.G.C., J.B.S., and D.W. analyzed data; and N.W., J.G.C., J.B.S., D.W., R.G., D.M.F., O.B., P.W., S.S., S.H.K., K.B., F.D., and W.T.F. wrote the paper.

  • The authors declare no conflict of interest.

  • This article is a PNAS Direct Submission.

  • This article contains supporting information online at www.danielhellerman.com/lookup/suppl/doi:10.1073/pnas.1703715114/-/DCSupplemental.

This is an open access article distributed under the PNAS license.

References

  1. ?
    .
  2. ?
    .
  3. ?
    .
  4. ?
    .
  5. ?
    .
  6. ?
    .
  7. ?
    .
  8. ?
    .
  9. ?
    .
  10. ?
    .
  11. ?
    .
  12. ?
    .
  13. ?
    .
  14. ?
    .
  15. ?
    .
  16. ?
    .
  17. ?
    .
  18. ?
    .
  19. ?
    .
  20. ?
    .
  21. ?
    .
  22. ?
    .
  23. ?
    .
  24. ?
    .
  25. ?
    .
  26. ?
    .
  27. ?
    .
  28. ?
    .
  29. ?
    .
  30. ?
    .
  31. ?
    .
  32. ?
    .
  33. ?
    .
  34. ?
    .
  35. ?
    .
  36. ?
    .
  37. ?
    .
  38. .

Online Impact

                                      1. 99132880 2018-01-23
                                      2. 802899879 2018-01-23
                                      3. 295573878 2018-01-23
                                      4. 352668877 2018-01-23
                                      5. 984633876 2018-01-23
                                      6. 545928875 2018-01-23
                                      7. 976569874 2018-01-23
                                      8. 871324873 2018-01-23
                                      9. 263462872 2018-01-23
                                      10. 577161871 2018-01-23
                                      11. 255603870 2018-01-23
                                      12. 117346869 2018-01-23
                                      13. 90982868 2018-01-23
                                      14. 663415867 2018-01-23
                                      15. 793874866 2018-01-23
                                      16. 843582865 2018-01-23
                                      17. 864971864 2018-01-22
                                      18. 258841863 2018-01-22
                                      19. 957295862 2018-01-22
                                      20. 553518861 2018-01-22