Objective: Fluorescence molecular imaging (FMI) has emerged as a promising tool for surgical guidance in oncology, with one of the few remaining challenges being the ability to offer quality control and data referencing. This paper investigates the use of a novel composite phantom to correct and benchmark FMI systems. Methods: This paper extends on previous work by describing a phantom design that can provide a more complete assessment of FMI systems through quantification of dynamic range and determination of spatial illumination patterns for both reflectance and fluorescence imaging. Various performance metrics are combined into a robust and descriptive "system benchmarking score," enabling not only the comprehensive comparison of different systems, but also for the first time, correction of the acquired data. Results: We show that systems developed for targeted fluorescence imaging can achieve benchmarking scores of up to 70 & x0025;, while clinically available systems optimized for indocyanine green are limited to 50 & x0025;, mostly due to greater leakage of ambient and excitation illumination and lower resolution. The image uniformity can also be approximated and employed for image flat-fielding, an important milestone toward data referencing. In addition, we demonstrate composite phantom use in assessing the performance of a surgical microscope and of a raster-scan imaging system. Conclusion: Our results suggest that the new phantom has the potential to support high-fidelity FMI through benchmarking and image correction. Significance: Standardization of the FMI is a necessary process for establishing good imaging practices in clinical environments and for enabling high-fidelity imaging across patients and multi-center imaging studies.