kopia lustrzana https://github.com/thinkst/zippy
1 wiersz
5.8 KiB
Plaintext
1 wiersz
5.8 KiB
Plaintext
Abstract Computational Humour (CH) has attracted the interest of Natural Language Processing and Computational Linguistics communities. Cre- ating datasets for automatic measurement of humour quotient is difficult due to multiple possible interpretations of the content. In this work, we create a multi-modal humour- annotated dataset (âÃÂü40 hours) using stand-up comedy clips. We devise a novel scoring mechanism to annotate the training data with a humour quotient score using the audienceâÂÂs laughter. The normalized duration (laughter duration divided by the clip duration) of laugh- ter in each clip is used to compute this humour coefficient score on a five-point scale (0-4). This method of scoring is validated by compar- ing with manually annotated scores, wherein a quadratic weighted kappa of 0.6 is obtained. We use this dataset to train a model that pro- vides a âÂÂfunninessâ score, on a five-point scale, given the audio and its corresponding text. We compare various neural language models for the task of humour-rating and achieve an accuracy of 0.813 in terms of Quadratic Weighted Kappa (QWK). Our âÂÂOpen Micâ dataset is re- leased for further research along with the code. Introduction Humour is one of the most important lubricants of communication between people. Humour is subjec- tive and, at times, also requires cultural knowledge as humour is often dependent on stereotypes in a culture or a country. At times, even cultural ap- propriation is used to convey humour, which can be offensive to minority cultures (Rosenthal et al., 2015; Kuipers, 2017). The factors listed above, along with the underlying subjectivity in humour render the task of rating humour, difficult for ma- chines (Meaney, 2020). The task of humour classi- fication suffers due to this subjectivity and the lack of datasets that rate the âÂÂfunninessâ of content. In this paper, we propose rating humour on a scale of zero to four. We create the first multi- modal dataset2 using standup comedy clips and compute the humour quotient of each clip using the audience laughter. The validity of our scoring criteria is verified by finding the overall agreement between human annotation and automated scores. We use the audio and text-based signals to process this multi-modal data to generate âÂÂhumour ratingsâÂÂ. Since humour annotation is subjective, even the data annotated by humans might not provide an objective measure. We reduce this subjectivity by taking laughter feedback from a larger audience. To the best of our knowledge, no previous literature has proposed an automatically humour-rated multi- modal dataset and used it in ML model-building to automatically obtain the humor score. Standup comedy is an art form where the deliv- ery of humour has a much larger context, and there are multiple jokes and multiple related punchlines in the same story. The resulting laughter from the audience depends on various factors, including the understanding of the context, delivery, and tonality of the comic. Standup comedy seems to be an ideal choice for a humour rating dataset as it inherently contains some feedback in terms of the audience laughter. We believe a smaller context window re- stricts computational models, but we know this is not the case for the human audience. Hence, our approach utilises live audience laughter as a mea- sure to rate the humour quotient in the data created. We also believe that such an approach can gener- ate insights into what aspects of stories and their delivery make them funny. Our humour rating model is partly inspired by the character âÂÂTARSâ from the movie âÂÂInterstel- larâÂÂ, which generates funny responses based on adjustable humour setting (Nolan, 2014). An es- sential step in developing such a machine that can adjust its âÂÂfunninessâ is to create a model that can recognize and rate the âÂÂfunninessâ of a joke. With this work, we aim to release a dataset that can help researchers shed light on the humour quotient of a particular text. The key contributions of this paper are: (a) Creation and public release of an automatically rated multi-modal dataset based on English standup comedy clips and (b) Manual eval- uation of this dataset along with humour-rating quotient defined on a Likert-scale (Likert, 1932). Conclusion and Future Work We propose a novel scoring mechanism to show that humour rating can be automated using audi- encewhich concurs well with the humour perception of humans. We create a multi-modal (audio & text) dataset for the task of humour rating. With the help of three human annotators, we man- ually evaluate our scoring mechanism and show a substantial agreement in terms of QWK. Our eval- uation shows that although the card-rating mechanism is robust, it is not smooth. We also show that the card-rating mechanism can be automated with the help of pre-existing language models and traditional audio features. Our approach utilises live audience laughter as a mea- sure to rate the humour quotient of a card and to annotate the training data with a humour quotient score using the audience laughter. We use this dataset to train a model that pro- vides a âÂÂfunninessâ score, on a five-point scale, given the audio and its corresponding text. We compare various neural language models for the task of humour rating and achieve an accuracy of 0.813 in terms of Quadratic Weighted Kappa (QWK). Our âÂÂOpen Micâ dataset is re- leased for further research along with the code. Introduction Humour is one of the most important lubricants of communication between people. Humour is subjec- tive and, at times, also requires cultural knowledge as humour is often dependent on stereotypes in a culture or a country. |