Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection

Emre Çakir,Giambattista Parascandolo,Toni Heittola,H. Huttunen,Tuomas Virtanen

Published 2017 in IEEE/ACM Transactions on Audio Speech and Language Processing

ABSTRACT

Sound events often occur in unstructured environments where they exhibit wide variations in their frequency content and temporal structure. Convolutional neural networks (CNNs) are able to extract higher level features that are invariant to local spectral and temporal variations. Recurrent neural networks (RNNs) are powerful in learning the longer term temporal context in the audio signals. CNNs and RNNs as classifiers have recently shown improved performances over established methods in various sound recognition tasks. We combine these two approaches in a convolutional recurrent neural network (CRNN) and apply it on a polyphonic sound event detection task. We compare the performance of the proposed CRNN method with CNN, RNN, and other established methods, and observe a considerable improvement for four different datasets consisting of everyday sound events.

PUBLICATION RECORD

Publication year
2017
Venue
IEEE/ACM Transactions on Audio Speech and Language Processing
Publication date
2017-02-21
Fields of study
Computer Science
Identifiers
DOI 10.1109/TASLP.2017.2690575 arXiv 1702.06286
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Metrics for Polyphonic Sound Event Detection
2016influential reference
Theano: A Python framework for fast computation of mathematical expressions
2016influential reference
Filterbank learning for deep neural network based polyphonic sound event detection
2016cited by this paper
Convolutional recurrent neural networks for music classification
2016cited by this paper
DOMESTIC AUDIO TAGGING WITH CONVOLUTIONAL NEURAL NETWORKS
2016cited by this paper
Discriminative training of GMM parameters for audio scene classification and audio tagging
2016cited by this paper
TUT database for acoustic scene classification and sound event detection
2016influential reference
CQT-based Convolutional Neural Networks for Audio Scene Classification
2016cited by this paper
Deep Learning
2016cited by this paper
Recurrent neural networks for polyphonic sound event detection in real life recordings
2016influential reference
Robust Audio Event Recognition with 1-Max Pooling Convolutional Neural Networks
2016cited by this paper
Audio-based multimedia event detection using deep recurrent neural networks
2016cited by this paper
Polyphonic sound event detection using multi label deep neural networks
2015influential reference
Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks
2015influential reference
Learning the speech front-end with raw waveform CLDNNs
2015cited by this paper
Robust sound event recognition using convolutional neural networks
2015cited by this paper
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
2015cited by this paper
An End-to-End Neural Network for Polyphonic Piano Music Transcription
2015cited by this paper
Sound event detection in real life recordings using coupled matrix factorization of spectral representations and class activity annotations
2015influential reference
Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin
2015cited by this paper
Chime-home: A dataset for sound source recognition in a domestic environment
2015cited by this paper
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
2015cited by this paper
Acoustic event detection for multiple overlapping similar sources
2015cited by this paper
librosa: Audio and Music Signal Analysis in Python
2015cited by this paper
A Theoretically Grounded Application of Dropout in Recurrent Neural Networks
2015cited by this paper
Feature learning with deep scattering for urban sound analysis
2015cited by this paper
Reliable detection of audio events in highly noisy environments
2015cited by this paper
Environmental sound classification with convolutional neural networks
2015cited by this paper
Deep Residual Learning for Image Recognition
2015cited by this paper
Dropout: a simple way to prevent neural networks from overfitting
2014cited by this paper
Adam: A Method for Stochastic Optimization
2014cited by this paper
Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
2014cited by this paper
On the Properties of Neural Machine Translation: Encoder–Decoder Approaches
2014influential reference
Sound event recognition in unstructured environments using spectrogram image processing
2014cited by this paper
How transferable are features in deep neural networks?
2014cited by this paper
Recognition of acoustic events using deep neural networks
2014cited by this paper
Deep visual-semantic alignments for generating image descriptions
2014cited by this paper
Context-dependent sound event detection
2013influential reference
Maxout Networks
2013cited by this paper
Speech recognition with deep recurrent neural networks
2013cited by this paper
Supervised model training for overlapping sound events based on unsupervised source separation
2013influential reference
Sound event detection using non-negative dictionaries learned from annotated overlapping events
2013cited by this paper
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps
2013cited by this paper
Acoustic Monitoring and Localization for Social Care
2012cited by this paper
ImageNet classification with deep convolutional neural networks
2012cited by this paper
Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups
2012cited by this paper
Deep Learning of Representations for Unsupervised and Transfer Learning
2011cited by this paper
Audio context recognition using audio event histograms
2010cited by this paper
Apples-to-apples in cross-validation studies: pitfalls in classifier performance measurement
2010cited by this paper
Acoustic event detection in real life recordings
2010cited by this paper
A flexible framework for key audio effects detection and auditory context inference
2006cited by this paper
Semantic context detection based on hierarchical audio models
2003cited by this paper
Gradient-based learning applied to document recognition
1998cited by this paper
Long Short-Term Memory
1997cited by this paper

CITED BY

Optimizing CNN-GRU Hybrid Ratios for Resource-Constrained Audio Classification: A Systematic Study From Parameter Efficiency to MCU Deployment
2026cites this paper
Sound event localization and detection based on multi-scale attentional feature fusion
2026cites this paper
Dynamic Attention-Asymmetric Perceptron Network for Overlapping Sound Event Detection
2026cites this paper
Sound event detection for modified-exhaust vehicles in urban environment
2026cites this paper
Sound Event Detection with Boundary-Aware Optimization and Inference
2026cites this paper
PolyBench: A Benchmark for Compositional Reasoning in Polyphonic Audio
2026cites this paper
LSKFDY-CNN: Large selective kernel frequency dynamic convolutional neural network for sound event detection
2026cites this paper
WS-IMUBench: Can Weakly Supervised Methods from Audio, Image, and Video Be Adapted for IMU-based Temporal Action Localization?
2026cites this paper
Not in Sync: Unveiling Temporal Bias in Audio Chat Models
2025cites this paper
Noise-Robust Sound Event Detection and Counting via Language-Queried Sound Separation
2025cites this paper
Joint Analysis of Acoustic Scenes and Sound Events Based on Semi-Supervised Training of Sound Events With Partial Labels
2025cites this paper
Auditory Intelligence: Understanding the World Through Sound
2025cites this paper
Neuroprotective and Nootropic Effects of a Standardised Siddha Polyherbal Formulation, Bhiramiyadhi Bhavanai Choornam, Against β-Amyloid-Induced Neurodegeneration.
2025cites this paper
MSAG-Net: A Multi-Scale and Attention-Guided GRU Network for Snore Detection
2025cites this paper
Detection of Polyphonic Alarm Sounds From Medical Devices Using Frequency-Enhanced Deep Learning: Simulation Study
2025cites this paper
An Ensemble of Convolutional Neural Networks for Sound Event Detection
2025cites this paper
Robust detection of overlapping bioacoustic sound events
2025cites this paper
Debiased Training For Semi-supervised Sound Event Detection
2025cites this paper
SAT-SED: Semi-Supervised Sound Event Detection by Pseudo-Labeling with Self-Adaptive Threshold Strategy
2025cites this paper
An end-to-end mass spectrometry data classification model with a unified architecture
2025cites this paper
Acoustic Event Detection in Vehicles: A Multi-Label Classification Approach
2025cites this paper
IoTBystander: A Non-Intrusive Dual-Channel-Based Smart Home Security Monitoring Framework
2025cites this paper
Optimized Network Combining Depthwise Separable Convolutional with Multi-scale Dilated Convolution Used for Polyphonic Sound Event Detection
2025influential citation
From Large-scale Audio Tagging to Real-Time Explainable Emergency Vehicle Sirens Detection
2025cites this paper
Whilter: A Whisper-based Data Filter for "In-the-Wild" Speech Corpora Using Utterance-level Multi-Task Classification
2025cites this paper
Semi-supervised sound event detection with dynamic convolution and confidence-aware mean teacher
2025cites this paper
Contextual Pooling for Multiple-Instance Learning-based Non-Intrusive Load Monitoring
2025cites this paper
Duration-Aware Sound Event Detection on Ultra-Low-Power Sensor Devices
2025cites this paper
Onset-and-Offset-Aware Sound Event Detection via Differentiable Frame-to-Event Mapping
2025influential citation
QG-DETR: Query-Guided Detection Transformer for Audio Moment Retrieval
2025cites this paper
Adapting Single-Channel Pre-trained Transformer Models for Multi-Channel Sound Event Localization and Detection
2025cites this paper
Deep Noise Embedding and Classification for Model-Based Speech Enhancement
2025cites this paper
Real-Time Emergency Vehicle Siren Detection with Efficient CNNs on Embedded Hardware
2025cites this paper
Temporal Attention Pooling for Frequency Dynamic Convolution in Sound Event Detection
2025cites this paper
Spectrogram-Based Gunshot Detection Using Machine Learning Model
2025cites this paper
FedMLAC: Mutual Learning Driven Heterogeneous Federated Audio Classification
2025cites this paper
Real-time active-learning method for audio-based anomalous event identification and rare events classification for audio events detection
2025cites this paper
FAF-Filt: Frequency-aware Fourier Filter for Sound Event Detection
2025cites this paper
Research on Testing Methods of Life Detection Equipment for Water-bearing Soil Disasters
2025cites this paper
Towards Understanding of Frequency Dependence on Sound Event Detection
2025influential citation
Multichannel feature fusion network-based technique for heart sound signal classification and recognition
2025cites this paper
JiTTER: Jigsaw Temporal Transformer for Event Reconstruction for Self-Supervised Sound Event Detection
2025cites this paper
Robust automatic train pass-by detection combining deep learning and sound level analysis.
2025cites this paper
Exploring Performance-Complexity Trade-Offs in Sound Event Detection Models
2025cites this paper
Research on Anomaly Sound Detection Methods Based on Large Models
2025cites this paper
Recognizing Ornaments in Vocal Indian Art Music With Active Annotation
2025cites this paper
A lightweight dual branch masking network for environmental sound classification
2025cites this paper
Machine learning predicts cognitive outcome from preterm infants' EEG.
2025cites this paper
Training-Free Defense Against Adversarial Attacks in Deep Learning MRI Reconstruction
2025cites this paper
Frequency-adaptive conformer based on test-corrections-inspired slanted training mechanism for SED
2025cites this paper
Graph Neural Network of Multiple Diagonal Constraints for Sound Event Detection
2025influential citation
Detect Any Sound: Open-Vocabulary Sound Event Detection with Multi-Modal Queries
2025cites this paper
Formula-Supervised Sound Event Detection: Pre-Training Without Real Data
2025cites this paper
Smart Home Security Through Real-Time Audio Event Detection Systems Using Convolutional Neural Network (CNN)
2025cites this paper
Radio Signal Recognition Using Two-Stage Spatiotemporal Network with Bispectral Analysis
2025cites this paper
Environmental Noise Dataset for Sound Event Classification and Detection
2025cites this paper
Simultaneous speech and background sound recognition in diverse acoustic environments with branched neural networks
2025cites this paper
Compression at the Edge for Energy- and Bandwidth-Efficient Industrial Audio Analysis
2025cites this paper
Enhancing sound-based classification of birds and anurans with spectrogram representations and acoustic indices in neural network architectures
2025cites this paper
DeepASA: An Object-Oriented Multi-Purpose Network for Auditory Scene Analysis
2025cites this paper
Angular Distance Distribution Loss for Audio Classification
2024cites this paper
Leveraging LLM and Text-Queried Separation for Noise-Robust Sound Event Detection
2024cites this paper
A Sequential Audio Spectrogram Transformer for Real-Time Sound Event Detection
2024cites this paper
Full-frequency dynamic convolution: a physical frequency-dependent convolution for sound event detection
2024cites this paper
SiCRNN: A Siamese Approach for Sleep Apnea Identification via Tracheal Microphone Signals
2024cites this paper
Multi-Modal Hit Detection and Positional Analysis in Padel Competitions
2024cites this paper
LEAD Dataset: How Can Labels for Sound Event Detection Vary Depending on Annotators?
2024cites this paper
Contrastive Loss Based Frame-Wise Feature Disentanglement for Polyphonic Sound Event Detection
2024cites this paper
Language-based Audio Moment Retrieval
2024cites this paper
A Comprehensive Approach to Urban Sound Detection with YAMNet and Bi-Directional LSTM
2024cites this paper
Trends in audio scene source counting and analysis
2024cites this paper
Prototype based Masked Audio Model for Self-Supervised Learning of Sound Event Detection
2024cites this paper
A Deep Snore Recognition Approach Integrating Attention Mechanism and Spatio-Temporal Feature Extraction
2024cites this paper
Waveform-Logmel Audio Neural Networks for Respiratory Sound Classification
2024cites this paper
Digital audio preservation for Indonesian traditional vocal recognition based on machine learning: A literature review and bibliometric analysis
2024cites this paper
Detection of Moving Vehicles From Multichannel Acoustic Signals Using Convolutional Recurrent Neural Networks
2024cites this paper
Enhancing Conformer-Based Sound Event Detection Using Frequency Dynamic Convolutions and BEATs Audio Embeddings
2024cites this paper
Impact of Noisy Labels on Sound Event Detection: Deletion Errors Are More Detrimental Than Insertion Errors
2024cites this paper
Acoustic comfort in educational buildings: An integrative review and new directions for future research
2024cites this paper
Multi-Label Audio Classification with a Noisy Zero-Shot Teacher
2024cites this paper
DiffCRNN: A Novel Approach for Detecting Sound Events in Smart Home Systems Using Diffusion-based Convolutional Recurrent Neural Network
2024cites this paper
MAT-SED: A Masked Audio Transformer with Masked-Reconstruction Based Pre-training for Sound Event Detection
2024cites this paper
Reshaping Bioacoustics Event Detection: Leveraging Few-Shot Learning (FSL) with Transductive Inference and Data Augmentation
2024cites this paper
Sound Activity-Aware Based Cross-Task Collaborative Training for Semi-Supervised Sound Event Detection
2024influential citation
DeFT-Mamba: Universal Multichannel Sound Separation and Polyphonic Audio Classification
2024influential citation
Split Learning-based Sound Event Detection in Energy-Constrained Sensor Devices
2024cites this paper
Evaluating Noise-Robustness of Convolutional and Recurrent Neural Networks for Baby Cry Recognition
2024cites this paper
ConcVAE: Conceptual Representation Learning
2024cites this paper
Pushing the Limit of Sound Event Detection with Multi-Dilated Frequency Dynamic Convolution
2024cites this paper
Diversifying and Expanding Frequency-Adaptive Convolution Kernels for Sound Event Detection
2024cites this paper
FMSG-JLESS Submission for DCASE 2024 Task4 on Sound Event Detection with Heterogeneous Training Dataset and Potentially Missing Labels
2024cites this paper
WildDESED: An LLM-Powered Dataset for Wild Domestic Environment Sound Event Detection System
2024cites this paper
Avian Song Identification Using CNN
2024cites this paper
Exploring current research trends in sound event detection: a systematic literature review
2024influential citation
Dual-Domain Feature Fusion and Multi-Level Memory-Enhanced Network for Spectral Compressive Imaging
2024cites this paper
Separation of overlapping audio signals: A review on current trends and evolving approaches
2024cites this paper
Real-Time Sound Recognition System for Human Care Robot Considering Custom Sound Events
2024cites this paper
Polyphonic sound event localization and detection using channel-wise FusionNet
2024cites this paper
An event-scene cooperative analysis network with dual-stream attention convolution module and soft parameter-sharing
2024cites this paper
DG-SED: Domain Generalization for Sound Event Detection with Heterogeneous Training Data
2024cites this paper