The Sensitivity of Mapping Methods to Reference Data Quality: Training Supervised Image Classifications with Imperfect Reference Data

G. Foody,M. Pal,D. Rocchini,Carol X. Garzon‐Lopez,L. Bastin

Published 2016 in ISPRS Int. J. Geo Inf.

ABSTRACT

The accuracy of a map is dependent on the reference dataset used in its construction. Classification analyses used in thematic mapping can, for example, be sensitive to a range of sampling and data quality concerns. With particular focus on the latter, the effects of reference data quality on land cover classifications from airborne thematic mapper data are explored. Variations in sampling intensity and effort are highlighted in a dataset that is widely used in mapping and modelling studies; these may need accounting for in analyses. The quality of the labelling in the reference dataset was also a key variable influencing mapping accuracy. Accuracy varied with the amount and nature of mislabelled training cases with the nature of the effects varying between classifiers. The largest impacts on accuracy occurred when mislabelling involved confusion between similar classes. Accuracy was also typically negatively related to the magnitude of mislabelled cases and the support vector machine (SVM), which has been claimed to be relatively insensitive to training data error, was the most sensitive of the set of classifiers investigated, with overall classification accuracy declining by 8% (significant at 95% level of confidence) with the use of a training set containing 20% mislabelled cases.

PUBLICATION RECORD

Publication year
2016
Venue
ISPRS Int. J. Geo Inf.
Publication date
2016-11-01
Fields of study
Geography, Computer Science, Environmental Science
Identifiers
DOI 10.3390/IJGI5110199
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Computer Processing Of Remotely Sensed Images An Introduction
2016influential reference
Are species occurrence data in global online repositories fit for modeling species distributions? The case of the Global Biodiversity Information Facility (GBIF). Final Report of the Task Group on GBIF Data Fitness for Use in Distribution Modelling.
2016cited by this paper
Tree Species Abundance Predictions in a Tropical Agricultural Landscape with a Supervised Classification Model and Imbalanced Data
2016influential reference
Land use mapping error introduces strongly-localised, scale-dependent uncertainty into land use and ecosystem services modelling
2015cited by this paper
Valuing map validation: The need for rigorous land cover map accuracy assessment in economic valuations of ecosystem services
2015cited by this paper
The effect of mis-labeled training data on the accuracy of supervised image classification by SVM
2015cited by this paper
Impacts of Species Misidentification on Species Distribution Modeling with Presence-Only Data
2015cited by this paper
Integrating User Needs on Misclassification Error Sensitivity into Image Segmentation Quality Assessment
2015cited by this paper
A survey of image classification methods and techniques
2014cited by this paper
Automated Training Sample Extraction for Global Land Cover Mapping
2014cited by this paper
Ground reference data error and the mis-estimation of the area of land cover change as a function of its abundance
2013cited by this paper
Assessing the Accuracy of Volunteered Geographic Information arising from Multiple Contributors to an Internet Based Collaborative Project
2013cited by this paper
Fuzzy support vector machine based on within-class scatter for classification problems with outliers or noises
2013cited by this paper
Global characterization and monitoring of forest cover using Landsat data: opportunities and challenges
2012cited by this paper
Evaluation of SVM, RVM and SMLR for Accurate Image Classification With Limited Ground Data
2012influential reference
Robust Hyperspectral Classification Using Relevance Vector Machine
2011cited by this paper
The Elements of Statistical Learning: Data Mining, Inference, and Prediction
2010cited by this paper
Assessing the accuracy of land cover change with imperfect ground reference data
2010cited by this paper
Feature Selection for Classification of Hyperspectral Data by SVM
2010cited by this paper
Increasing the accuracy of neural network classification using refined training data
2009cited by this paper
The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition
2009cited by this paper
Effect of errors in ground truth on classification accuracy
2009cited by this paper
Kernel methods for remote sensing data analysis
2009cited by this paper
A Novel Context-Sensitive Semisupervised SVM Classifier Robust to Mislabeled Training Samples
2009cited by this paper
Kernel methods for remote sensing data analysis
2009cited by this paper
Commentary: whither VGI?
2008cited by this paper
RVM‐based multi‐class classification of remotely sensed data
2008cited by this paper
Supervised Machine Learning: A Review of Classification Techniques
2007cited by this paper
Hyperspectral Image Classification Using Relevance Vector Machines
2007cited by this paper
The Global Biodiversity Information Facility (GBIF)
2007cited by this paper
Citizens as sensors: the world of volunteered geography
2007cited by this paper
Enrichment of High-Throughput Screening Data with Increasing Levels of Noise Using Support Vector Machines, Recursive Partitioning, and Laplacian-Modified Naive Bayesian Classifiers
2006cited by this paper
Sparse multinomial logistic regression: fast algorithms and generalization bounds
2005cited by this paper
Results and implications of a study of fifteen years of satellite image classification experiments
2005cited by this paper
Comparison of land cover maps using fuzzy agreement
2005cited by this paper
Support vector machines for classification in remote sensing
2005cited by this paper
Sources of error in accuracy assessment of thematic land-cover maps in the Brazilian Amazon
2004cited by this paper
Classification of hyperspectral remote sensing images with support vector machines
2004cited by this paper
Thematic map comparison: Evaluating the statistical significance of differences in classification accuracy
2004cited by this paper
Toward intelligent training of supervised image classifications: directing training data acquisition for SVM classification
2004cited by this paper
Assessing species misidentification rates through quality assurance of vegetation monitoring
2003cited by this paper
Support vector machines for hyperspectral image classification with spectral-based kernels
2003cited by this paper
An assessment of support vector machines for land cover classi(cid:142) cation
2002cited by this paper
Status of land cover classification accuracy assessment
2002cited by this paper
Sparse Bayesian Learning and the Relevan e Ve tor Ma hine
2001influential reference
The Nature of Statistical Learning Theory
2000cited by this paper
The significance of border training patterns in classification by a feedforward neural network using back propagation learning
1999cited by this paper
An evaluation of some factors affecting the accuracy of classification by an artificial neural network
1997cited by this paper
Support-Vector Networks
1995cited by this paper
The Nature of Statistical Learning
1995influential reference
Components of accuracy of maps with special reference to discriminant analysis on remote sensor data
1995cited by this paper
Multispectral classification of Landsat-images using neural networks
1992cited by this paper
An automated land-use mapping comparison of the Bayesian maximum likelihood and linear discriminant analysis algorithms
1984cited by this paper
ISPRS Journal of Photogrammetry and Remote Sensing
year unknowncited by this paper

CITED BY

Mapping Nationwide Subfield Division Dynamics in Saudi Arabia Using Temporal Patterns of Sentinel-2 NDVI and Machine Learning
2025cites this paper
Comparative Analysis of Deep Learning and Traditional Methods for High-Resolution Cropland Extraction with Different Training Data Characteristics
2025cites this paper
Characterising the Thematic Content of Image Pixels with Topologically Structured Clustering
2025cites this paper
Standardised Drone Procedures for Phytosociological Data Collection
2025cites this paper
Integrating Drone Truthing and Functional Classification of Remote Sensing Time Series for Supervised Vegetation Mapping
2025cites this paper
Ecosystem services provided by green areas and their implications for human health in Brazil
2024cites this paper
Geochemistry of Terrestrial Plants in the Central African Copperbelt: Implications for Sediment Hosted Copper-Cobalt Exploration
2024cites this paper
Ground Truth in Classification Accuracy Assessment: Myth and Reality
2024cites this paper
Critical Assessment of Cocoa Classification with Limited Reference Data: A Study in Côte d'Ivoire and Ghana Using Sentinel-2 and Random Forest Model
2024cites this paper
Long-term land cover changes assessment in the Jiului Valley mining basin in Romania
2024cites this paper
‘Uncertainty audit’ for ecosystem accounting: Satellite-based ecosystem extent is biased without design-based area estimation and accuracy assessment
2024cites this paper
An assessment of training data for agricultural land cover classification: a case study of Bafra, Türkiye
2024cites this paper
The feasibility of using national‐scale datasets for classifying wetlands in Arizona with machine learning
2024cites this paper
Comparative Study of Different Classification Methods and Winner Takes All Approach
2024cites this paper
Is Your Training Data Really Ground Truth? A Quality Assessment of Manual Annotation for Individual Tree Crown Delineation
2024cites this paper
National wetland mapping using remote-sensing-derived environmental variables, archive field data, and artificial intelligence
2023cites this paper
Improving Spatial and Temporal Variation of Ammonia Emissions for the Netherlands Using Livestock Housing Information and a Sentinel-2-Derived Crop Map
2023cites this paper
A global land cover training dataset from 1984 to 2020
2023cites this paper
Challenges in the real world use of classification accuracy metrics: From recall and precision to the Matthews correlation coefficient
2023cites this paper
Assessment of herbaceous vegetation classification using orthophotos produced from the image acquired with unmanned aerial systems
2023cites this paper
Examining energy and nutrient production across the different agroecological zones in rural Ethiopia using statistical methods
2023cites this paper
The impact of selection of reference samples and DEM on the accuracy of land cover classification based on Sentinel-2 data
2023cites this paper
Evaluating the Effect of Training Data Size and Composition on the Accuracy of Smallholder Irrigated Agriculture Mapping in Mozambique Using Remote Sensing and Machine Learning Algorithms
2023cites this paper
Tree Species Diversity Mapping - Success Stories and Possible Ways Forward
2023cites this paper
Improving Crop Mapping by Using Bidirectional Reflectance Distribution Function (BRDF) Signatures with Google Earth Engine
2023cites this paper
Advancing High-Resolution Land Cover Mapping in Colombia: The Importance of a Locally Appropriate Legend
2023cites this paper
Woody Plant Encroachment in a Seasonal Tropical Savanna: Lessons about Classifiers and Accuracy from UAV Images
2023cites this paper
Multi‐Source Mapping of Peatland Types Using Sentinel‐1, Sentinel‐2, and Terrain Derivatives—A Comparison Between Five High‐Latitude Landscapes
2023cites this paper
Comparison between Parametric and Non-Parametric Supervised Land Cover Classifications of Sentinel-2 MSI and Landsat-8 OLI Data
2023cites this paper
The Spectral Species Concept in Living Color
2022cites this paper
An Automatic Procedure for Forest Fire Fuel Mapping Using Hyperspectral (PRISMA) Imagery: A Semi-Supervised Classification Approach
2022cites this paper
Land cover classification in an era of big and open data: Optimizing localized implementation and training data selection to improve mapping outcomes
2022cites this paper
Double down on remote sensing for biodiversity estimation: a biological mindset
2022cites this paper
Unbiased Area Estimation Using Copernicus High Resolution Layers and Reference Data
2022cites this paper
CALC-2020: a new baseline land cover map at 10 m resolution for the circumpolar Arctic
2022cites this paper
RID - Roof Information Dataset for Computer Vision-Based Photovoltaic Potential Assessment
2022cites this paper
Factors shaping economics of land use change in Gilgit Baltistan, Pakistan
2021cites this paper
The Role of Earth Observation in Achieving Sustainable Agricultural Production in Arid and Semi-Arid Regions of the World
2021cites this paper
Performance Improvement of Encoder/Decoder-Based CNN Architectures for Change Detection from Very High-Resolution Satellite Imagery
2021cites this paper
Exploring Google Street View with deep learning for crop type mapping
2021cites this paper
Assessing the Effect of Training Sampling Design on the Performance of Machine Learning Classifiers for Land Cover Mapping Using Multi-Temporal Remote Sensing Data and Google Earth Engine
2021cites this paper
Data-Driven Signal–Noise Classification for Microseismic Data Using Machine Learning
2021cites this paper
Mask R-CNN and OBIA Fusion Improves the Segmentation of Scattered Vegetation in Very High-Resolution Optical Sensors
2021cites this paper
Deep Learning for Land Cover Change Detection
2020cites this paper
Mapping the functional dimension of vegetation series in the Mediterranean region using multitemporal MODIS data
2020cites this paper
Optimal Soybean (Glycine max L.) Land Suitability Using GIS-Based Multicriteria Analysis and Sentinel-2 Multitemporal Images
2020cites this paper
Assessing the Repeatability of Automated Seafloor Classification Algorithms, with Application in Marine Protected Area Monitoring
2020cites this paper
Assessment of volunteered geographic information for vegetation mapping
2020cites this paper
Automatic classification of fine-scale mountain vegetation based on mountain altitudinal belt
2020cites this paper
A new framework to map fine resolution cropping intensity across the globe: Algorithm, validation, and implication
2020cites this paper
Development and Applications of Machine Learning Methods for Hyperspectral Data
2020cites this paper
Probabilistic Mapping and Spatial Pattern Analysis of Grazing Lawns in Southern African Savannahs Using WorldView-3 Imagery and Machine Learning Techniques
2020cites this paper
Accounting for training data error in machine 2 learning applied to Earth observations
2020influential citation
The t-SNE Algorithm as a Tool to Improve the Quality of Reference Data Used in Accurate Mapping of Heterogeneous Non-Forest Vegetation
2019cites this paper
Evaluation of Sampling and Cross-Validation Tuning Strategies for Regional-Scale Machine Learning Classification
2019cites this paper
Title Mapping spatial accuracy of the forest type classification in JAXA’s high-resolution land use and land cover map
2019cites this paper
MAPPING SPATIAL ACCURACY OF FOREST TYPE CLASSIFICATION IN JAXA’s HIGH-RESOLUTION LAND USE AND LAND COVER MAP
2019cites this paper
Google street view and deep learning: a new ground truthing approach for crop mapping
2019cites this paper
Trends in Remote Sensing Accuracy Assessment Approaches in the Context of Natural Resources
2019cites this paper
Using Multi-Sensor Satellite Images and Auxiliary Data in Updating and Assessing the Accuracies of Urban Land Products in Different Landscape Patterns
2019cites this paper
The Truth About Ground Truth: Label Noise in Human-Generated Reference Data
2019cites this paper
Key issues in rigorous accuracy assessment of land cover products
2019cites this paper
Advanced Techniques for Unsupervised Classification of Remote Sensing Hyperspectral Images
2019cites this paper
An efficient approach to capture continuous impervious surface dynamics using spatial-temporal rules and dense Landsat time series stacks
2019cites this paper
Accounting for Training Data Error in Machine Learning Applied to Earth Observations
2019cites this paper
Implementation of machine-learning classification in remote sensing: an applied review
2018cites this paper
Land-cover change in the Wulagai grassland, Inner Mongolia of China between 1986 and 2014 analysed using multi-temporal Landsat images
2018cites this paper
Utilizing publicly available satellite data for urban research: Mapping built-up land cover and land use in Ho Chi Minh City, Vietnam
2018cites this paper
Using volunteered geographic information (VGI) in design-based statistical inference for area estimation and accuracy assessment of land cover
2018cites this paper
High-resolution multi-temporal mapping of global urban land using Landsat images based on the Google Earth Engine Platform
2018cites this paper
Sensitivity of the subspace method for land cover classification
2018cites this paper
An Explorative Study on Estimating Local Accuracies in Land-Cover Information Using Logistic Regression and Class-Heterogeneity-Stratified Data
2018cites this paper
Mapping the Individual Trees in Urban Orchards by Incorporating Volunteered Geographic Information and Very High Resolution Optical Remotely Sensed Data: A Template Matching-Based Approach
2018cites this paper
Comparing the classification performances of supervised classifiers with balanced and imbalanced SAR data sets
2018cites this paper
Building up user confidence for the spaceborne derived global and continental land cover products for the Mediterranean region: the case of Thessaly
2017influential citation
Applied One-Class Classification of Remote Sensing Data
2017cites this paper
An Open-Source Semi-Automated Processing Chain for Urban Object-Based Classification
2017cites this paper
Impacts of sample design for validation data on the accuracy of feedforward neural network classification
2017cites this paper
Land Cover Information Extraction Based on Daily NDVI Time Series and Multiclassifier Combination
2017cites this paper
Filtering mislabeled data for improving time series classification
2017cites this paper
A semi-supervised Genetic Programming method for dealing with noisy labels and hidden overfitting
2017cites this paper
Validation and Inter-Comparison of Spaceborne Derived Global and Continental Land Cover Products for the Mediterranean Region: The Case of Thessaly
2017cites this paper
Cartographie de l’occupation des sols à partir de séries temporelles d’images satellitaires à hautes résolutions Identification et traitement des données mal étiquetées
2017cites this paper
The impact of training data characteristics on ensemble classification of land cover
2017cites this paper
Exploring diversity in ensemble classification: Applications in large area land cover mapping
2017cites this paper
Predicting biodiverse semi-natural grasslands through satellite imagery and machine learning
year unknowncites this paper
Analysis of Machine Learning Classifiers for LULC Classification on Google Earth Engine
year unknowncites this paper
Digital Commons @ Michigan Tech Digital Commons @ Michigan Tech
year unknowncites this paper
Accounting for training data error in machine learning applied to Accounting for training data error in machine learning applied to earth observations earth observations
year unknowninfluential citation