Eudetector

Leveraging language model to identify eu-related news

verfasst von
Koustav Rudra, Danny Tran, Miroslav Shaltev
Abstract

News media reflects the present state of a country or region to its audiences. Media outlets of a region post different kinds of news for their local and global audiences. In this paper, we focus on Europe (precisely EU) and propose a method to identify news that has an impact on Europe from any aspect such as financial, business, crime, politics, etc. Predicting the location of the news is itself a challenging task. Most of the approaches restrict themselves towards named entities or handcrafted features. In this paper, we try to overcome that limitation i.e., instead of focusing only on the named entities (Europe location, politicians etc.) and some hand-crafted rules, we also explore the context of news articles with the help of pre-Trained language model BERT. The auto-regressive language model based European news detector shows about 9-19% improvement in terms of F-score over baseline models. Interestingly, we observe that such models automatically capture named entities, their origin, etc; hence, no separate information is required. We also evaluate the role of such entities in the prediction and explore the tokens that BERT really looks at for deciding the news category. Entities such as person, location, organization turn out to be good rationale tokens for the prediction.

Organisationseinheit(en)
Forschungszentrum L3S
Externe Organisation(en)
Leibniz-Zentrum für Marine Tropenökologie GmbH
Typ
Aufsatz in Konferenzband
Seiten
380-384
Anzahl der Seiten
5
Publikationsdatum
03.06.2021
Publikationsstatus
Veröffentlicht
Peer-reviewed
Ja
ASJC Scopus Sachgebiete
Computernetzwerke und -kommunikation, Software
Ziele für nachhaltige Entwicklung
SDG 16 – Frieden, Gerechtigkeit und starke Institutionen
Elektronische Version(en)
https://zenodo.org/record/6705970 (Zugang: Offen)
https://doi.org/10.1145/3442442.3452324 (Zugang: Geschlossen)