Describing Video With Attention-Based Bidirectional LSTM

Yi Bin,Yang Yang,Fumin Shen,Ning Xie,Heng Tao Shen,Xuelong Li

Published 2019 in IEEE Transactions on Cybernetics

ABSTRACT

Video captioning has been attracting broad research attention in the multimedia community. However, most existing approaches heavily rely on static visual information or partially capture the local temporal knowledge (e.g., within 16 frames), thus hardly describing motions accurately from a global view. In this paper, we propose a novel video captioning framework, which integrates bidirectional long-short term memory (BiLSTM) and a soft attention mechanism to generate better global representations for videos as well as enhance the recognition of lasting motions in videos. To generate video captions, we exploit another long-short term memory as a decoder to fully explore global contextual information. The benefits of our proposed method are two fold: 1) the BiLSTM structure comprehensively preserves global temporal and visual information and 2) the soft attention mechanism enables a language decoder to recognize and focus on principle targets from the complex content. We verify the effectiveness of our proposed video captioning framework on two widely used benchmarks, that is, microsoft video description corpus and MSR-video to text, and the experimental results demonstrate the superiority of the proposed approach compared to several state-of-the-art methods.

PUBLICATION RECORD

Publication year
2019
Venue
IEEE Transactions on Cybernetics
Publication date
2019-07-01
Fields of study
Medicine, Computer Science
Identifiers
DOI 10.1109/TCYB.2018.2831447 PMID 29993730
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar, PubMed

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Action Recognition in Video Sequences using Deep Bi-Directional LSTM With CNN Features
2018cited by this paper
Robust discrete code modeling for supervised hashing
2018cited by this paper
Unsupervised Deep Hashing with Similarity-Adaptive and Discrete Optimization
2018cited by this paper
Hashing with Angular Reconstructive Embeddings
2018cited by this paper
Recurrent attention network using spatial-temporal relations for action recognition
2018cited by this paper
Discrete Nonnegative Spectral Clustering
2017cited by this paper
Robust Web Image Annotation via Exploring Multi-Facet and Structural Knowledge
2017cited by this paper
Adversarial Cross-Modal Retrieval
2017cited by this paper
Perceptually Guided Photo Retargeting
2017cited by this paper
Asymmetric Binary Coding for Image Search
2017cited by this paper
Video Captioning With Attention-Based LSTM and Semantic Consistency
2017cited by this paper
Coherent Semantic-Visual Indexing for Large-Scale Image Retrieval in the Cloud
2017cited by this paper
Video-Based Pedestrian Re-Identification by Adaptive Spatio-Temporal Appearance Model
2017cited by this paper
Robust Joint Graph Sparse Coding for Unsupervised Spectral Feature Selection
2017cited by this paper
Adaptively Attending to Visual Attributes and Linguistic Knowledge for Captioning
2017cited by this paper
Perceptually Guided Photo Retargeting.
2017cited by this paper
Block-Row Sparse Multiview Multilabel Learning for Image Classification
2016cited by this paper
Optimized Graph Learning Using Partial Tags and Multiple Features for Image and Video Annotation
2016cited by this paper
Frame- and Segment-Level Features and Candidate Pool Evaluation for Video Caption Generation
2016cited by this paper
MSR-VTT: A Large Video Description Dataset for Bridging Video and Language
2016cited by this paper
Theano: A Python framework for fast computation of mathematical expressions
2016cited by this paper
Jointly Modeling Deep Video and Compositional Text to Bridge Vision and Language in a Unified Framework
2015cited by this paper
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
2015cited by this paper
Deep Residual Learning for Image Recognition
2015cited by this paper
Training Very Deep Networks
2015cited by this paper
Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning
2015influential reference
Sequence to Sequence -- Video to Text
2015influential reference
Multitask Spectral Clustering by Exploring Intertask Correlation
2015cited by this paper
Describing Videos by Exploiting Temporal Structure
2015influential reference
Robust Discrete Spectral Hashing for Large-Scale Image Semantic Indexing
2015cited by this paper
A Multi-scale Multiple Instance Video Description Network
2015cited by this paper
Visual-Patch-Attention-Aware Saliency Detection
2015cited by this paper
Jointly Modeling Embedding and Translation to Bridge Video and Language
2015influential reference
DenseCap: Fully Convolutional Localization Networks for Dense Captioning
2015cited by this paper
Neural Machine Translation by Jointly Learning to Align and Translate
2014cited by this paper
CIDEr: Consensus-based image description evaluation
2014cited by this paper
Deep visual-semantic alignments for generating image descriptions
2014cited by this paper
Microsoft COCO: Common Objects in Context
2014cited by this paper
Very Deep Convolutional Networks for Large-Scale Image Recognition
2014cited by this paper
Sequence to Sequence Learning with Neural Networks
2014influential reference
Long-term recurrent convolutional networks for visual recognition and description
2014cited by this paper
Show and tell: A neural image caption generator
2014cited by this paper
Integrating Language and Vision to Generate Natural Language Descriptions of Videos in the Wild
2014cited by this paper
Meteor Universal: Language Specific Translation Evaluation for Any Target Language
2014cited by this paper
Exploiting Web Images for Semantic Video Indexing Via Robust Sample-Specific Loss
2014cited by this paper
Translating Videos to Natural Language Using Deep Recurrent Neural Networks
2014influential reference
YouTube2Text: Recognizing and Describing Arbitrary Activities Using Semantic Hierarchies and Zero-Shot Recognition
2013cited by this paper
Joint Attention by Gaze Interpolation and Saliency
2013cited by this paper
Generating Natural-Language Video Descriptions Using Text-Mined Knowledge
2013influential reference
ADADELTA: An Adaptive Learning Rate Method
2012cited by this paper
ImageNet classification with deep convolutional neural networks
2012influential reference
Collecting Highly Parallel Data for Paraphrase Evaluation
2011cited by this paper
ROUGE: A Package for Automatic Evaluation of Summaries
2004cited by this paper
Bleu: a Method for Automatic Evaluation of Machine Translation
2002cited by this paper
Bidirectional recurrent neural networks
1997cited by this paper
Long Short-Term Memory
1997cited by this paper
Induction of Multiscale Temporal Structure
1991cited by this paper

CITED BY

Group-Aware Personalized Stress Detection Based on Surveillance Videos
2026cites this paper
TCCCL: Transformer-based cross-modal contextual correlation learning networks for web video event mining
2026cites this paper
A Comparison of Interdependent Deep Learning Models and Exponential Smoothing Method for Predicting Bitcoin Price
2025cites this paper
3D long time spatiotemporal convolution for complex transfer sequence prediction
2025cites this paper
Attention-based Bi-LSTM for Continuous Authentication Using Touch Dynamics
2025cites this paper
Study of three-dimensional distribution of chloride in coral aggregate concrete: A CNN-BiGRU-attention data-intelligence model driven by beluga whale optimization algorithm
2025cites this paper
NeuroFed-ResBiLSTM: A Federated, Neuromorphic-aware ResNet-BiLSTM Hybrid for Occlusion-Robust Edge Vehicle Detection
2025cites this paper
Video saliency prediction via single feature enhancement and temporal recurrence
2025cites this paper
Localized Weather Prediction Using Kolmogorov-Arnold Network-Based Models and Deep RNNs
2025cites this paper
Modeling and Visualization of Public Opinion Sentiment Evolution after Sichuan Luding MS6.8 Earthquake Based on LSTM Neural Networks
2025cites this paper
Application of cross-modal contrastive learning for semantic consistency optimization in video captioning
2025cites this paper
Long Short-Term Memory Networks: A Comprehensive Survey
2025cites this paper
Deep Learning Techniques For Option Price Prediction: A Comparative Analysis
2025cites this paper
A UAV flight time prediction model for substation noise detection based on attention-enhanced LSTM
2025cites this paper
APPBoost: an adaptive parameter pair boosting algorithm for enhanced robustness against noise and imbalance
2025cites this paper
Prediction of non-uniform reactions in PEMFC based on the multi-physics quantity fusion graph auto-encoder network
2025cites this paper
Leveraging multi-agent framework for root cause analysis
2025cites this paper
An adaptive weighted boosting framework for enhanced cardiovascular disease diagnosis
2025cites this paper
A comprehensive review of AI-Based detection of Arrhythmia using Electrocardiogram (ECG)
2025cites this paper
A Dual-Pathway Driver Emotion Classification Network Using Multitask Learning Strategy: A Joint Verification
2025influential citation
SATrans-Net: Sparse Attention Transformer for EEG-based motor imagery decoding
2025cites this paper
Image captioning for life ecological experiment of China’s space station
2025cites this paper
Instance-Dictionary Learning for Open-World Object Detection in Autonomous Driving Scenarios
2024cites this paper
DAPNet: A Dual-Attention Parallel Network for the Prediction of Ship Fuel Consumption Based on Multi-Source Data
2024cites this paper
Hybrid Crow Search Algorithm–LSTM System for Enhanced Stock Price Forecasting
2024cites this paper
An Analysis of Bitcoin Price Prediction Using Parametric Time-Series Forecasting Models
2024cites this paper
DECNet: A Non-Contacting Dual-Modality Emotion Classification Network for Driver Health Monitoring
2024cites this paper
Human gait recognition using joint spatiotemporal modulation in deep convolutional neural networks
2024cites this paper
CVLP-NaVD: Contrastive Visual-language Pre-training Models for Non-annotated Visual Description
2024cites this paper
Discourse Element Identification Integrated with Attention
2024cites this paper
Structurally Tuned LSTM Networks to Nowcast Photovoltaic Power Production
2024cites this paper
Efficient prediction framework for large-scale nonlinear petrochemical process based on feature selection and temporal-attention LSTM: Applied to fluid catalytic cracking
2024cites this paper
Unsupervised quantitative structural damage identification method based on BiLSTM networks and probability distribution model
2024cites this paper
Forecasting corn NDVI through AI-based approaches using sentinel 2 image time series
2024cites this paper
An MSDCNN-LSTM framework for video frame deletion forensics
2024cites this paper
A short-term wind power forecasting method based on multivariate signal decomposition and variable selection
2024cites this paper
RNSC: A hierarchical deep learning model for net promoter scoring understanding by combining review and note through semantic consistency
2024cites this paper
Quality of Experience Optimization for Real-Time XR Video Transmission With Energy Constraints
2024cites this paper
Spatio-temporal feature interpretable model for air quality forecasting
2024cites this paper
Deep learning approaches to predict sea surface height above geoid in Pekalongan
2024cites this paper
Time-Aware Knowledge Representations of Dynamic Objects with Multidimensional Persistence
2024cites this paper
Speech synthesis from three-axis accelerometer signals using conformer-based deep neural network
2024cites this paper
High precision temperature measurement for cryogenic temperature sensors based on deep learning technology
2024cites this paper
Estimating reference crop evapotranspiration using improved convolutional bidirectional long short-term memory network by multi-head attention mechanism in the four climatic zones of China
2024cites this paper
Video Annotation & Descriptions using Machine Learning & Deep learning: Critical Survey of methods
2023cites this paper
Performance Analysis of Deep Learning Techniques for Time Series Forecasting
2023cites this paper
Application of a Short Video Caption Generation Algorithm in International Chinese Education and Teaching
2023cites this paper
Focusing on Relevant Responses for Multi-Modal Rumor Detection
2023cites this paper
Savitar: an intelligent sign language translation approach for deafness and dysphonia in the COVID-19 era
2023cites this paper
A novel blind action quality assessment based on multi-headed GRU network and attention mechanism
2023cites this paper
A Hybrid Deep Learning Method Based on CEEMDAN and Attention Mechanism for Network Traffic Prediction
2023cites this paper
Panel-Page-Aware Comic Genre Understanding
2023cites this paper
Refinement of ensemble strategy for acute lymphoblastic leukemia microscopic images using hybrid CNN-GRU-BiLSTM and MSVM classifier
2023influential citation
A hybrid forecasting model based on deep learning feature extraction and statistical arbitrage methods for stock trading strategies
2023cites this paper
Deep sequential collaborative cognition of vision and language based model for video description
2023cites this paper
A hybrid CNN-LSTM machine learning model for rock mechanical parameters evaluation
2023cites this paper
Multi-sentence video captioning using spatial saliency of video frames and content-oriented beam search algorithm
2023cites this paper
Novel three-axis accelerometer-based silent speech interface using deep neural network
2023cites this paper
Oil Logging Reservoir Recognition Based on TCN and SA-BiLSTM Deep Learning Method
2023cites this paper
What captures attention in the risk communication process: Exploring streaming video attractiveness during the first wave of the COVID-19 pandemic in China
2023cites this paper
Multi-Modal Transformer With Global-Local Alignment for Composed Query Image Retrieval
2023cites this paper
Oil well production prediction based on CNN-LSTM model with self-attention mechanism
2023cites this paper
Robust recurrent neural networks for time series forecasting
2023cites this paper
Predicting in-hospital outcomes of patients with acute kidney injury
2023cites this paper
SOVC: Subject-Oriented Video Captioning
2023cites this paper
Neural Networks for Aircraft Trajectory Prediction: Answering Open Questions About Their Performance
2023cites this paper
A Novel Penetration State Recognition Method Based on LSTM With Auditory Attention During Pulsed GTAW
2023cites this paper
Recurrent Interaction Network for Stereoscopic Image Super-Resolution
2023cites this paper
Towards better transition modeling in recurrent neural networks: The case of sign language tokenization
2023cites this paper
A novel Gaussian process regression-based stock index interval forecasting model integrating optimal variables screening with bidirectional long short-term memory
2023cites this paper
Mental Stress Detection using EEG and Recurrent Deep Learning
2023influential citation
Joint multi-scale information and long-range dependence for video captioning
2023cites this paper
Assessing fruit hardness in robot hands using electric gripper actuators with tactile sensors
2023cites this paper
Video Captioning: a comparative review of where we are and which could be the route
2022cites this paper
Automatic Multichannel Electrocardiogram Record Classification Using XGBoost Fusion Model
2022cites this paper
V2T: video to text framework using a novel automatic shot boundary detection algorithm
2022cites this paper
Risk Prediction for Internet Financial Enterprises by Deep Learning Algorithm and Sustainable Development of Business Transformation
2022cites this paper
Privacy-preserving household load forecasting based on non-intrusive load monitoring: A federated deep learning approach
2022cites this paper
Semantic context driven language descriptions of videos using deep neural network
2022influential citation
Deteksi Sarkasme Pada Judul Berita Berbahasa Inggris Menggunakan Algoritme Bidirectional LSTM
2022cites this paper
Covid-19 Pandemic Predictive System Using Machine Learning
2022cites this paper
LVE-S2D: Low-Light Video Enhancement From Static to Dynamic
2022cites this paper
Screening and functional prediction of differentially expressed genes in walnut endocarp during hardening period based on deep neural network under agricultural internet of things
2022cites this paper
Sequential Memory Modelling for Video Captioning
2022cites this paper
An attention-based hybrid deep learning approach for bengali video captioning
2022cites this paper
Efficient Video Summarization for Smart Surveillance Systems
2022cites this paper
Rethinking Open-World Object Detection in Autonomous Driving Scenarios
2022cites this paper
A novel Multi-Layer Attention Framework for visual description prediction using bidirectional LSTM
2022influential citation
Inferential Visual Question Generation
2022cites this paper
DISNet: A sequential learning framework to handle occlusion in human action recognition with video acquisition sensors
2022cites this paper
GLCM: Global–Local Captioning Model for Remote Sensing Image Captioning
2022cites this paper
MS²-GNN: Exploring GNN-Based Multimodal Fusion Network for Depression Detection
2022cites this paper
A CTR prediction model based on session interest
2022cites this paper
Deep learning approaches based improved light weight U-Net with attention module for optic disc segmentation.
2022cites this paper
QSAN: A Near-Term Achievable Quantum Self-Attention Network
2022cites this paper
Utility-Based Route Choice Behavior Modeling Using Deep Sequential Models
2022cites this paper
Joint User and Data Detection in Grant-Free NOMA With Attention-Based BiLSTM Network
2022cites this paper
Video Captioning based on Augmented Semantic Alignment
2022cites this paper
Visual and language semantic hybrid enhancement and complementary for video description
2022cites this paper
Video Monitoring Queries
2022cites this paper