Respiratory motion in living organisms is known to result in image blurring and loss of resolution, chiefly due to the lengthy acquisition times of the corresponding image acquisition methods. Optoacoustic tomography can effectively eliminate in vivo motion artifacts due to its inherent capacity for collecting image data from the entire imaged region following a single nanoseconds-duration laser pulse. However, multi-frame image analysis is often essential in applications relying on spectroscopic data acquisition or for scanning-based systems. Thereby, efficient methods to correct for image distortions due to motion are imperative. Herein, we demonstrate that efficient motion rejection in optoacoustic tomography can readily be accomplished by frame clustering during image acquisition, thus averting excessive data acquisition and post-processing. The algorithm’s efficiency for two- and three-dimensional imaging was validated with experimental whole-body mouse data acquired by spiral volumetric optoacoustic tomography (SVOT) and full-ring cross-sectional imaging scanners.