Word Sense Distance and Similarity Patterns in Regular Polysemy - Insights Gained from Human Annotations of Graded Word Sense Similarity and an Investigation of Contextualised Language Models

Haber, J

dc.contributor.author	Haber, J	en_US
dc.date.accessioned	2022-12-19T17:19:17Z
dc.date.issued	2022
dc.identifier.uri	https://qmro.qmul.ac.uk/xmlui/handle/123456789/83280
dc.description.abstract	This thesis investigates the notion of distance between different interpretations of polysemic words. It presents a novel, large-scale dataset containing a total of close to 18,000 human annotations rating both the nuanced sense similarity in lexically ambiguous word forms as well as the acceptability of combining their different sense interpretations in a single co-predication structure. The collected data suggests that different polysemic sense extensions can be perceived as significantly dissimilar in meaning, forming patterns of word sense similarity in some types of regular metonymic alternations. These observations question traditional theories postulating a fully under-specified mental representation of polysemic sense. Instead, the collected data supports more recent hypotheses of a structured representation of polysemy in the mental lexicon, suggesting some form of sense grouping, clustering, or hierarchical ordering based on word sense similarity. The new dataset then also is used to evaluate the performance of a range of contextualised language models in predicting graded word sense similarity. Our findings suggest that without any dedicated fine-tuning, especially BERT Large shows a relatively high correlation with the collected judgements. The model however struggles to consistently reproduce the similarity patterns observed in the human data, or to cluster word senses solely based on their contextualised embeddings. Finally, this thesis presents a pilot algorithm for automatically detecting words that exhibit a given polysemic sense alternation. Formulated in an unsupervised fashion, this algorithm is intended to bootstrap the collection of an even larger dataset of ambiguous language use that could be used in the fine-tuning or evaluation of computational language models for (graded) word sense disambiguation tasks.	en_US
dc.language.iso	en	en_US
dc.title	Word Sense Distance and Similarity Patterns in Regular Polysemy - Insights Gained from Human Annotations of Graded Word Sense Similarity and an Investigation of Contextualised Language Models	en_US
pubs.notes	Not known	en_US
rioxxterms.funder	Default funder	en_US
rioxxterms.identifier.project	Default project	en_US
qmul.funder	Disagreements and Language Interpretation::European Research Council	en_US

Files in this item

Name:: PhD_dissertation_11_2022_corre ...
Size:: 6.401Mb
Format:: application/
Description:: PhD Thesis

View/Open

This item appears in the following Collection(s)

Theses [4223]
Theses Awarded by Queen Mary University of London

Show simple item record