Researchers Daniel Schwabe, Katinka Becker, Martin Seyferth, Andreas Klaß and Tobias Schaeffter from Berlin proposed a new data quality framework. (jamesteohart/Getty Images) An article recently published in Nature proposes a new way to evaluate data quality for artificial intelligence used in healthcare.
Several documentation efforts and frameworks already exist to evaluate AI models, like FactSheets, Model cards and Dataset Nutrition Labels. However, the authors write that none comprehensively assess the content of data sets and their suitability for use in ML.
The German researchers Daniel Schwabe, Katinka Becker, Martin Seyferth, Andreas Klaß and Tobias Schaeffter sought to identify which characteristics should be used to evaluate data quality for trustworthy AI in medicine. The factors can also help explain why a model behaves a certain way.
The authors developed the METRIC framework , a specialized data quality framework for medical training data. It has five categories and 15 sub-dimensions through which researchers and healthcare entities can evaluate their data fitness for the task at hand.
The categories comprise measurement process, timeliness, representativeness, informativeness and consistency to assess the appropriateness of a data set with respect to a specific use case
The researchers note that developers should familiarize themselves with the aspects of the framework and begin to use them to evaluate their data. They say more work needs to be done to establish quantitative and qualitative measures for each dimension.
The researchers performed a literature review using Web of Science, PubMed and ACM Digital Library and found 120 papers that met their criteria. Within those papers, they found over 450 data quality measures for healthcare data.
The authors distilled the terms down to 15 considerations, or “dimensions” that they say healthcare entities should use to determine the quality of their data.
“Data quality plays a decisive role in the creation of trustworthy AI and assessing the quality of a data set is of utmost importance to AI developers, as well as regulators and notified bodies,” the study says.The first of the five categories, measurement process, assesses uncertainty during data collection. It takes account of missing data, device error, human error and noisy labeling. The measurement process category considers […]
What Is Cannabis Juicing
What Is Cannabis Juicing By Sarah Johns, The Fresh Toast