Recent Advances In Arabic Syntactic Diacritics Restoration

Yasser Hifny

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:13:15

10 Jun 2021

Restoring Arabic syntactic diacritics based on Long Short-Term Memory (LSTM) networks leads to state-of-the-art performance. These LSTM networks are commonly augmented with Maximum Entropy (MaxEnt) sparse direct connections between the input and the output layers of the tagger. One way to improve such tagger performance is to use an ensemble of taggers. However, an ensemble of taggers may require huge computational and memory resources. In this paper, we implement a knowledge distillation technique where an ensemble of teachers/taggers is used to train a single student tagger. On the other hand, Arabic is a morphologically rich language and has a high Out-Of-Vocabulary (OOV) rate. In addition to word embeddings, we propose to use character embeddings encoded using LSTMs for each word to overcome this problem. On the Arabic tree bank task, our hybrid LSTM/MaxEnt tagger achieves 1.0% absolute WER improvement over a strong baseline using the proposed two techniques.

Chairs:

Eric Fosler-Lussier

Tags:

signal processing society

IEEE icassp 2021

virtual conference

2021

sps

virtual conference icassp 2021

june 6-11 2021

icassp 2021

Recent Advances In Arabic Syntactic Diacritics Restoration

Yasser Hifny

Value-Added Bundle(s) Including this Product

ICASSP 2021 Virtual Conference - Presentation Videos Product Bundle

More Like This

Panel: Building Sustainable Cities for Tomorrow

Panel: Unleashing the Potential of Virtual Power Plants for Sustainable Energy Solutions

Bundle: 2024 IEEE SustainTech Leadership Forum

Join the IEEE Signal Processing Society