Educated adults know some 100,000 distinct words, and they encounter and create novel words all the time. Only a fraction of all words are used by the entire speech community. Most are associated with particular topics or social groups. As a result, rare and novel words provide an interesting window into the cognitive and social processes that shape lexical systems.
To investigate the structure and evolution of the lexicon, we use large scale-text mining and psycholinguistic experiments. This talk will present examples of both methods. First, I will present a mathematical analysis of the dynamics of words in the archives of USENET discussion groups, selected because they provide data from large numbers of people (10,000 to 100,000 individuals) over long time spans (10 to 20 years). I will also talk about some experiments from the Wordovators project. This project, funded by the John Templeton Foundation, uses on-line word games in order to collect data about artificial language learning from a large and diverse pool of people. Results reveal individual variation in cognitive style, as well as social influences in games involving two people. These interact to determine general patterns of word formation.
Janet B. Pierrehumbert is Professor of Language Modelling in the Oxford e-Research Centre. She received her B.A. from Harvard in 1975, and her Ph.D. from MIT in 1980. She was a Member of Technical Staff at AT&T Bell Labs in Linguistics and AI Research until 1989. From then until 2015, she was a member of the Linguistics faculty at Northwestern University. Her current research focuses on how the dynamics of language — in acquisition, processing, or historical change — is related to the structure of linguistic systems. It combines experiments, statistical analyses of large corpora, and computational simulations of linguistic communities. She is a Fellow of the American Academy of Arts and Sciences, the Linguistic Society of America, and the Cognitive Science Society.