Gastrointestinal (GI) endoscopy is a common medical diagnostic procedure used for esophageal cancer detection. Current emerging capsule optoacoustic endoscopes, however, suffer from low pulse repetition rates and slow scanning units limit attainable imaging frame rates. Consequently, motion artifacts result in inaccurate spatial mapping and misinterpretation of data. To overcome these limitations, we report a 360º, 50 Hz frame rate, distal scanning capsule optoacoustic endoscope. The translational capability of the instrument for human GI tract imaging was characterized with an Archimedean spiral phantom consisting of twelve 100 µm sutures, a stainless steel mesh with a pitch of 3 mm and an ex vivo pig esophagus sample. We estimated an imaging penetration depth of ~0.84 mm in vivo by immersing the mesh phantom in intralipid solution to simulate light scattering in human esophageal tissue and validated our findings ex vivo using pig esophagus. This proof-of-concept study demonstrates the translational potential of the proposed video-rate endoscope for human GI tract imaging.