Read out

English Research Seminar 29.10.2019

29/10/2019

Lecture: Dr Andrew Kehoe, Birmingham City University, UK: “All our items are pre-owned and may have musty odor”: Using automated linguistic analysis techniques to study e-commerce data Zeit: Dienstag 18.06.19, 18:15 – 19:45; Ort: WU, Gebäude D2, Eingang D, 2.OG, Besprechungsraum 228

Abstract: Corpus Linguistics is the analysis of a large collection of electronic texts (a corpus) in order to discover patterns and trends in language use. In this paper I describe research I have carried out over the past 20 years on the application of Corpus Linguistic techniques to online data, with a particular focus on web-based marketing and e-commerce.

I begin by describing the key methods in Corpus Linguistics and the software tools developed by my research team, including the WebCorp Linguist’s Search Engine (WebCorpLSE). I explain how we have used WebCorpLSE to crawl the web, downloading and processing texts to build a 10 billion word, linguistically-tagged corpus, including sub-corpora for specific research purposes: the Birmingham Blog Corpus, as well as literary, news, and general web corpora.

In the second part of the paper I present my more recent work with partners outside the field of Linguistics and outside academia. Corpus Linguistics is not a discipline as such; rather it is a collection of techniques for the systematic analysis of data which can be applied to a wide range of problems in a variety of fields.

My first example is my work with colleague Matt Gee on the Puma Dance Dictionary project. Here, we served as linguistic consultants to the Grey London advertising agency and Procter & Gamble, manufacturer of a new range of fragrances licensed under the Puma brand and targeted at consumers aged 14-25. The aim of the campaign was to raise awareness of the new brand through a social media campaign. Our specific task was to determine which words are likely to occur most frequently in social media communication between young people. I describe how we achieved this by building a social media corpus and applying our WebCorpLSE analysis tools.

My second example is our research on the language of eBay. Our eBay corpus contains over 400,000 item descriptions totalling 100 million words. I explain the linguistic differences we found between item categories, looking in particular at words describing used items (second-hand, pre-owned, pre-loved, etc) and words describing ‘fake’ items (non-original, generic, compatible, etc.). I also outline our findings on variation in language use across price bands on eBay, for example the fact that personal pronouns are significantly more frequent in the Antiques category, and that watches described as gents’ sell for more than twice as much as watches described as men’s.

Throughout the paper I highlight the fact that a deeper understanding of the language of online selling is vital as e-commerce continues to grow worldwide. I give examples of how corpus linguistic techniques can be applied to the study of this increasingly important social phenomenon, and suggest how our techniques could be used to improve the indexing and search functions on sites like eBay.

Dr Andrew Kehoe is Associate Professor in the School of English at Birmingham City University. He is Deputy Head of School and Director of the Research & Development Unit for English Studies (RDUES). The RDUES team carries out research in the field of corpus linguistics, and has in recent years developed both the WebCorp suite of online search tools for linguistic study (http://www.webcorp.org.uk/) and the eMargin collaborative text annotation system (http://emargin.bcu.ac.uk/).

Back to overview