dc.contributor.author | PAUN, S | en_US |
dc.contributor.author | Carpenter, B | en_US |
dc.contributor.author | Chamberlain, J | en_US |
dc.contributor.author | Hovy, D | en_US |
dc.contributor.author | Kruschwitz, U | en_US |
dc.contributor.author | POESIO, M | en_US |
dc.date.accessioned | 2019-02-04T15:31:46Z | |
dc.date.available | 2018-11-30 | en_US |
dc.date.submitted | 2018-12-07T17:15:42.808Z | |
dc.identifier.uri | https://qmro.qmul.ac.uk/xmlui/handle/123456789/55140 | |
dc.description.abstract | The analysis of crowdsourced annotations in NLP is concerned with identifying 1) gold standard labels, 2) annotator accuracies and biases, and 3) item difficulties and error patterns. Traditionally, majority voting was used for 1), and coefficients of agreement for 2) and 3). Lately, model-based analy- sis of corpus annotations have proven better at all three tasks. But there has been rel- atively little work comparing them on the same datasets. This paper aims to fill this gap by analyzing six models of annotation, covering different approaches to annotator ability, item difficulty, and parameter pool- ing (tying) across annotators and items. We evaluate these models along four aspects: comparison to gold labels, predictive accu- racy for new annotations, annotator char- acterization, and item difficulty, using four datasets with varying degrees of noise in the form of random (spammy) annotators. We conclude with guidelines for model selec- tion, application, and implementation. | en_US |
dc.language | English | en_US |
dc.publisher | Association for Computational Linguistics and MIT Press | en_US |
dc.relation.ispartof | Transactions of the Association for Computational Linguistics | en_US |
dc.rights | This is a pre-copyedited, author-produced version of an article accepted for publication in Transactions of the Association for Computational Linguistics following peer review. | |
dc.subject | probabilistic annotation models | en_US |
dc.subject | computational linguistics | en_US |
dc.subject | crowdsourcing | en_US |
dc.title | Comparing Bayesian Models of Annotation | en_US |
dc.type | Article | |
dc.rights.holder | © 2019 Association for Computational Linguistics and MIT Press | |
pubs.notes | No embargo | en_US |
pubs.publication-status | Accepted | en_US |
pubs.publisher-url | https://www.mitpressjournals.org/loi/tacl | en_US |
dcterms.dateAccepted | 2018-11-30 | en_US |
rioxxterms.funder | Default funder | en_US |
rioxxterms.identifier.project | Default project | en_US |
qmul.funder | Disagreements in Language Interpretation (DALI)::European Research Council | en_US |