The gamma-ray sky as seen by the Large Area Telescope (LAT) on board the Fermi satellite is a superposition of emissions from many processes. To study them, a rich toolkit of analysis methods for gamma-ray observations has been developed, most of which rely on emission templates to model foreground emissions. Here, we aim to complement these methods by presenting a template-free spatio-spectral imaging approach for the gamma-ray sky, based on a phenomenological modeling of its emission components. It is formulated in a Bayesian variational inference framework and allows a simultaneous reconstruction and decomposition of the sky into multiple emission components, enabled by a self-consistent inference of their spatial and spectral correlation structures. Additionally, we formulated the extension of our imaging approach to template-informed imaging, which includes adding emission templates to our component models while retaining the "data-drivenness"of the reconstruction. We demonstrate the performance of the presented approach on the ten-year Fermi LAT data set. With both template-free and template-informed imaging, we achieve a high quality of fit and show a good agreement of our diffuse emission reconstructions with the current diffuse emission model published by the Fermi Collaboration. We quantitatively analyze the obtained data-driven reconstructions and critically evaluate the performance of our models, highlighting strengths, weaknesses, and potential improvements. All reconstructions have been released as data products.