English-Arabic Phonetic Dataset construction

Published 2024 in BIO Web of Conferences

ABSTRACT

In the field of natural language processing, the effectiveness of a semantic similarity task is significantly influenced by the presence of an extensive corpus. While numerous monolingual corpora exist, predominantly in English, the availability of multilingual resources remains quite restricted. In this study, we present a semi- automated framework designed for generating a multilingual phonetic English- Arabic corpus, specifically tailored for application in multilingual phonetically and semantic similarity tasks. The proposed model consists of four phases: data gathering, preprocessing and translation, extraction IPA representation, and manual correction. Four datasets were used one of them was constructed from many sources. A manual correction was used at all the levels of the system to produce a golden standard dataset. The final dataset was in the form (English Word, English Phonetic, equivalent Arabic Word, and Arabic Phonetic). Also, a deep learning approach was used for extracting International Phonetic Alphabet (IPA) phonetic representation where the results for 13400 samples show that the Phonetic Error Rate (PER) and accuracy were 11.96% and 88.04 % respectively which are good results for producing IPA representation for unknown English and Arabic names.

PUBLICATION RECORD

Publication year
2024
Venue
BIO Web of Conferences
Publication date
Unknown publication date
Fields of study
Not labeled
Identifiers
DOI 10.1051/bioconf/20249700057
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Improving English-Arabic Transliteration with Phonemic Memories
2022cited by this paper
Grapheme-to-Phoneme Transformer Model for Transfer Learning Dialects
2021cited by this paper
Massively Multilingual Pronunciation Modeling with WikiPron
2020cited by this paper
Concurrent phonetic transcription, lexical stress assignment and syllabification with deep neural networks
2020cited by this paper
Grapheme-to-Phoneme Conversion with Convolutional Neural Networks
2019cited by this paper
MaRePhoR — An open access machine-readable phonetic dictionary for Romanian
2017cited by this paper
Arabic phonemes transcription using data driven approach
2015cited by this paper
Phoneme Recognition System Using Articulatory-Type Information
2015cited by this paper
Grapheme-to-phoneme conversion using Long Short-Term Memory recurrent neural networks
2015cited by this paper
Translating English Names to Arabic Using Phonotactic Rules
2011cited by this paper
Data-driven phonetic comparison and conversion between south african, british and american English pronunciations
2009cited by this paper
Cross Linguistic Name Matching in English and Arabic
2006cited by this paper

CITED BY

No citing papers are available for this paper.