We analysed the robustness of species identification based on proteomic composition to data processing and intraspecific variability, specificity and sensitivity of species-markers as well as discriminatory power of proteomic fingerprinting and its sensitivity to phylogenetic distance. Our analysis is based on MALDI-TOF MS (matrix-assisted laser desorption ionization time of flight mass spectrometry) data from 32 marine copepod species coming from 13 regions (North and Central Atlantic and adjacent seas). A random forest (RF) model correctly classified all specimens to the species level with only small sensitivity to data processing, demonstrating the strong robustness of the method. Compounds with high specificity showed low sensitivity, that is identification was based on complex pattern-differences rather than on presence of single markers. Proteomic distance was not consistently related to phylogenetic distance. A species-gap in proteome composition appeared at 0.7 Euclidean distance when using only specimens from the same sample. When other regions or seasons were included, intraspecific variability increased, resulting in overlaps of intra and inter-specific distance. Highest intraspecific distances (>0.7) were observed between specimens from brackish and marine habitats (i.e., salinity probably affects proteomic patterns). When testing library sensitivity of the RF model to regionality, strong misidentification was only detected between two congener pairs. Still, the choice of reference library may have an impact on identification of closely related species and should be tested before routine application. We envisage high relevance of this time- and cost-efficient method for future zooplankton monitoring as it provides not only in-depth taxonomic resolution for counted specimens but also add-on information, such as on developmental stage or environmental conditions. |