Ensuring Dataset Quality for Machine Learning Certification

Sylvaine Picard,Camille Chapdelaine,Cyril Cappi,L. Gardes,E. Jenn,Baptiste Lefèvre,Thomas Soumarmon

Published 2020 in 2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)

ABSTRACT

In this paper, we address the problem of dataset quality in the context of Machine Learning (ML)-based critical systems. We briefly analyse the applicability of some existing standards dealing with data and show that the specificities of the ML context are neither properly captured nor taken into account. As a first answer to this concerning situation, we propose a dataset specification and verification process, and apply it on a signal recognition system from the railway domain. In addition, we also give a list of recommendations for the collection and management of datasets. This work is one step towards the dataset engineering process that will be required for ML to be used on safety critical systems.

PUBLICATION RECORD

Publication year
2020
Venue
2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)
Publication date
2020-10-01
Fields of study
Mathematics, Computer Science, Engineering
Identifiers
DOI 10.1109/ISSREW51248.2020.00085 arXiv 2011.01799
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

FRSign: A Large-Scale Traffic Light Dataset for Autonomous Trains
2020cited by this paper
Data Quality Model for Machine Learning
2019cited by this paper
Data Quality Considerations for Big Data and Machine Learning: Going Beyond Data Cleaning and Transformations
2017cited by this paper
An analysis of data quality dimensions
2015cited by this paper
Signalisation ferroviaire - Principales fonctions
2015cited by this paper
Formal Implementation of Data Validation for Railway Safety-Related Systems with OVADO
2013cited by this paper
Small Airplane Considerations for the Guidelines for Development of Civil Aircraft and Systems
2013cited by this paper
Data quality requirements analysis and modeling
2011cited by this paper
Data sets and data quality in software engineering
2008cited by this paper
Executing Data Quality Projects: Ten Steps to Quality Data and Trusted Information TM
2008cited by this paper
Empirical Bernstein stopping
2008cited by this paper
Gæðastjórnunarkerfi : grunnatriði og íðorðasafn = Quality Management Systems : fundamentals and vocabulary.
2006cited by this paper
SOFTWARE CONSIDERATIONS IN AIRBORNE SYSTEMS AND EQUIPMENT CERTIFICATION
2001cited by this paper
Data quality for the information age
1996cited by this paper

CITED BY

From Reflection to Repair: A Scoping Review of Dataset Documentation Tools
2026cites this paper
LabelBuddy: An Open Source Music and Audio Language Annotation Tagging Tool Using AI Assistance
2026cites this paper
Genes, shells, and AI: using computer vision to detect cryptic morphological divergence between genetically distinct populations of limpets
2025cites this paper
An explainable artificial intelligence feature selection framework for transparent, trustworthy, and cost-efficient energy forecasting
2025cites this paper
An entropy-driven method for llm dataset evaluation and optimization
2025cites this paper
AI Impermanence: Achilles’ Heel for AI Assessment?
2025cites this paper
Colorectal Polyp Segmentation with Different Dataset Combinations
2025cites this paper
Hybrid-CNN Intrusion Detection Framework for CAN Networks in Connected and Autonomous Vehicles
2025cites this paper
Explainable AI and Random Forest based reliable intrusion detection system
2025cites this paper
Continuous prediction of human knee joint angle using a sparrow search algorithm optimized random forest model based on sEMG signals
2025cites this paper
Bi-Modal Multiperspective Percussive (BiMP) Dataset for Visual and Audio Human Fall Detection
2025cites this paper
RADIANT: Reactive Autoencoder Defense for Industrial Adversarial Network Threats
2025cites this paper
PVD4RCV: A Photo-realistic Multi-Distortion Video Dataset for Benchmarking and Developing Robust Computer Vision Models
2025cites this paper
Standardness Clouds Meaning: A Position Regarding the Informed Usage of Standard Datasets
2024cites this paper
Alternate pathway for regional flood frequency analysis in data-sparse region
2024cites this paper
Certification of avionic software based on machine learning: the case for formal monotony analysis
2024cites this paper
Acapella-based music generation with sequential models utilizing discrete cosine transform
2024cites this paper
The Contribution of XAI for the Safe Development and Certification of AI: An Expert-Based Analysis
2024cites this paper
DQFed: A Federated Learning Strategy for Non-IID Data based on a Quality-Driven Perspective
2024cites this paper
A comprehensive analysis of the machine learning pose estimation models used in human movement and posture analyses: A narrative review
2024cites this paper
Analysis and Evaluation of Open Source Scientific Entity Datasets
2024cites this paper
A roadmap for improving data quality through standards for collaborative intelligence in human-robot applications
2024cites this paper
An Exploratory Analysis of Effect of Adversarial Machine Learning Attack on IoT-enabled Industrial Control Systems
2023cites this paper
HeartWave: A Multiclass Dataset of Heart Sounds for Cardiovascular Diseases Detection
2023cites this paper
Design and Implementation of Industrial Accident Detection Model Based on YOLOv4
2023cites this paper
Advancements in SARS-CoV-2 Testing: Enhancing Accessibility through Machine Learning-Enhanced Biosensors
2023cites this paper
How to enhance hydrological predictions in hydrologically distinct watersheds of the Indian subcontinent?
2023cites this paper
Towards an Organically Growing Hate Speech Dataset in Hate Speech Detection Systems in a Smart Mobility Application
2023cites this paper
The FormAI Dataset: Generative AI in Software Security through the Lens of Formal Verification
2023cites this paper
Framework for Improving the Accuracy of the Machine Learning Model in Predicting Future Values
2023cites this paper
Smart Communication System for Human Life Safety
2023cites this paper
Modeling Data Requirements for Machine Learning Systems
2022cites this paper
A Data Modeling Method for Machine Learning Systems
2022cites this paper
Automated Detection and Classification of Returnable Packaging Based on YOLOV4 Algorithm
2022cites this paper
SPRSound: Open-Source SJTU Paediatric Respiratory Sound Database
2022cites this paper
Quantifying Dataset Quality in Radio Frequency Machine Learning
2021cites this paper
Dataset Definition Standard (DDS)
2021cites this paper
Certification of embedded systems based on Machine Learning: A survey
2021cites this paper
Prediction Models for Agonists and Antagonists of Molecular Initiation Events for Toxicity Pathways Using an Improved Deep-Learning-Based Quantitative Structure–Activity Relationship System
2021cites this paper
Data Quality and Network Considerations for Mobile Contact Tracing and Health Monitoring
2021cites this paper