Recent decades have seen an explosion in the amount of data available on all aspects of biodiversity, which has led to data-driven approaches to understand how and why diversity varies in time and space. Global repositories facilitate access to various classes of species-level data including biogeography, genetics and conservation status, which are in turn required to study different dimensions of diversity. Ensuring that these different data sources are interoperable is a challenge as we aim to create synthetic data products to monitor the state of the world's biodiversity. One way to approach this is to link data of different classes, and to inventory the availability of data across multiple sources. Here, we use a comprehensive list of more than 200 000 marine animal species, and quantify the availability of data on geographical occurrences, genetic sequences, conservation assessments and DNA barcodes across all phyla and broad functional groups. This reveals a very uneven picture: 44% of species are represented by no record other than their taxonomy, but some species are rich in data. Although these data-rich species are concentrated into a few taxonomic and functional groups, especially vertebrates, data are spread widely across marine animals, with members of all 32 phyla represented in at least one database. By highlighting gaps in current knowledge, our census of marine diversity data helps to prioritize future data collection activities, as well as emphasizing the importance of ongoing sustained observations and archiving of existing data into global repositories. |