Journées internationales d'Analyse statistique des Données Textuelles
7-10 juin 2016 Nice (France)
How to explore conflicts in French Wikipedia talk pages?
Céline Poudat  1@  , Laurent Vanni, Natalia Grabar  2@  
1 : UMR 7320 Bases, Corpus, Langage  (BCL)  -  Site web
Université de Nice Sophia-Antipolis, CNRS : UMR7320
2 : Savoir, Textes, Langage  (STL)
CNRS : UMR8163, Université Lille III - Sciences humaines et sociales

With the exponential development of the Internet, new discourse genres and situations have expanded. These new web genres, which are still little described, are complex objects challenging our methodologies and our analysis tools: the encyclopedic project Wikipedia is one of these new objects which are part of Computer-mediated communication (CMC).

The present article concentrates on the exploration of conflicts in Wikipedia talk pages, using Hyperbase Web. Wikipedia data and CMC corpora have been little studied by French linguistics so far, and are still challenging text statistics, notably because of the complexity of such data (multiple annotations, consistent metadata, references between postings and user networks).

Based on the Wikiconflits corpus, which is already available and freely usable by researchers, we will propose some methodological avenues to explore Wikipedia data and CMC corpora.

