JADT 2016 - Sciencesconf.org

JADT2016

Journées internationales d'Analyse statistique des Données Textuelles

7-10 juin 2016 Nice (France)

sciencesconf.org:jadt2016:85807

Big data and textual analysis: a corpus selection from Twitter. Rome between the fear of terrorism and the Jubilee

Francesca Della Ratta 1, @ , Maria Elena Pontecorvo 1, @ , Antonino Virgillito 1, *, @ , Carlo Vaccari 1, *, @

1 : Istituto Nazionale di Statistica (ISTAT) - Site web

Istat - Istituto nazionale di statistica Via Cesare Balbo 16 00184 - Roma - Italie

* : Auteur correspondant

The exponential growth of web technologies makes Big Data a field of great interest for textual analysis. Twitter, among the social media, best suits the analysis of ideas and contents for its "openness" and "horizontality". However, extracting a textual corpus from Twitter is not an immediate task. The Big Data Sandbox project, promoted as part of the High Level Group at UNECE, aims to check the possibility of using Big Data in the official statistics. The project, started in 2014 and attended by about twenty national and international statistical organizations, focused in 2015 on the analysis of four different sources of Big Data. In particular one group focused on the collection of geo-located tweets. The public interface provided by Twitter is used to extract tweet generated within defined geographic coordinates. Within this project, all tweets generated in the territory of Rome starting from November 2015 are stored, to monitoring activities related to the Jubilee. The dramatic events of November 13 in Paris, quickly attracted the attention of users in Rome: in the context of the global threat of terrorist, the attack on a European city has deeply affected the imagination of Twitter users, also in view of the forthcoming Jubilee, which increased worldwide media exposure of the city. This suggested the opportunity to investigate the connections between Jubilee and terrorism to understand whether among Twitter's users the global threat of terrorism could affect the way of telling the Jubilee.

The aim of this work is to apply some techniques of textual analysis on a corpus extracted from Twitter, to describe its contents and to investigate possible ties between technologies for Big Data analysis and Text Mining. Despite the selected corpus shows a poor connection between the two phenomena in the period of analysis, the analysis supplied interesting possibilities.

Type :	:	oral
Langue du texte intégral	:	anglais
Thématiques	:	Genres du web et CMC
Mots-Clés	:	big data ; text mining ; roma ; Paris attacks ; Giubileo ; sentiment analysis ; social media

Autre

Personnes connectées : 1