Comprehensive two-dimensional gas chromatography (GC × GC) is amongst the most powerful separation technologies currently existing. Since its advent in early 1990, it has become an established method which is readily available. However, one of its most challenging aspects, especially in hyphenation with mass spectrometry is the high amount of chemical information it provides for each measurement. The GC × GC community agrees that there, the highest demand for action is found. In response, the number of software packages allowing for in-depth data processing of GC × GC data has risen over the last couple of years. These packages provide sophisticated tools and algorithms allowing for more streamlined data evaluation. However, these tools/algorithms and their respective specific functionalities differ drastically within the available software packages and might result in various levels of findings if not appropriately implemented by the end users. This study focuses on two main objectives. First, to propose a data analysis framework and second to propose an open-source dataset for benchmarking software options and their specificities. Thus, allowing for an unanimous and comprehensive evaluation of GC × GC software. Thereby, the benchmark data includes a set of standard compound measurements and a set of chocolate aroma profiles. On this foundation, eight readily available GC × GC software packages were anonymously investigated for fundamental and advanced functionalities such as retention and detection device derived parameters, revealing differences in the determination of e.g. retention times and mass spectra.