Some talks at the CQL Lab

Michael Zock, LIF-CNRS (July 13, 2017). Cognitive aspects of Natural Language Processing: Wheels for the mind of the language producer.
Languages are not only means of expression, but also vehicles of thought, allowing us to discover new ideas (brainstorming) or clarify existing ones by refining, expanding, illustrating more or less well specified thoughts. Of course, all this must be learned, and to this end we need resources, tools and knowledge on how to use them.
Knowledge can be encoded at various levels of abstractions, considering different units (words, sentences, texts). While semantic maps represent words and their relations at a micro-level, schematic maps (tree banks, pattern libraries) represent them combined, in larger chunks (macro-level).
We all are familiar with microscopes, maps, and navigational tools, and we normally associate them with professions having little to do with NLP. I will argue during my talk that this does not need to be so. Methaphorically speaking, we do use the very same tools to process language, regardless of the task (analysis vs. generation) and the processor (machine vs. human brain).
Dictionaries are resources, but they can also be seen as microscopes as they reveal in more detail the hidden meanings, nutshelled in a word. This kind of information display can be achieved nowadays by a simple mouse-click, even for languages whose script we cannot read (e.g. oriental languages for most Europeans). A corpus query system like Sketch Engine can reveal additionally very precious information: a word’s grammatical and collocational behaviour in texts.
Unlike inverted spyglasses, which reduce only size, macroscopes are tools that allow us to get the great picture. Even though badly needed, they are not yet available in hardware stores, but they do exist in some scientists’ minds. They are known under the headings of pattern recognition, feature detectors, etc. The resulting abstractions, models or blueprints (frames, scripts, patterns) are useful for a great number of tasks. I will illustrate this point for patterns via two examples related to real-time language production and foreign language learning (acquisition of fluency via a self-extending speakable phrasebook).
Semantic maps (wordnets, thesauri, ontologies, encyclopedias) are excellent tools for organizing words and knowledge in a huge multidimensional meaning space. Nevertheless, in order to be truly useful, i.e. to guarantee access to the stored and desired information, maps are insufficient — we also need some navigational tool(s). To illustrate this point I will present some of my ongoing work devoted to the building of a lexical compass. The assumption is that people have a highly connected conceptual-lexical network in their mind. Finding a word amounts thus to entering the network at any point by giving a related word (source word) and to follow then the links (associations) until one has reached the target word.
To allow for this kind of navigation, I believe that we need to do three things : (a) build an association network, (b) cluster the set of words, i.e. the associated terms we get in response to the input (word coming to the user's mind while trying to access the target; tip of the tongue problem), and (c) give meaningful names to the clusters. While the first step consists in building the semantic map within search takes place, the role of the next two steps is to support navigation. The role of the resulting categorial tree is to organize the set of words triggered by some input. Since any input is likely to yield many outputs (all words being associated with many other words) it is important to organize the resulting set of words, as otherwise we will drown the user.
Stuart Semple, Roehampton University (December 15, 2016). Investigating communication in our primate relatives: from information content to linguistic laws.
Communication underpins the social behaviour of humans, and of our primate relatives. While language is unique to our own species, the other primates have complex repertoires of calls, which they use to convey diverse messages. In this seminar I will describe a range of studies on primate communication that I and my collaborators have conducted. I will talk about work investigating the information content and function of primates’ vocalisations, the role that bystanders can play in shaping the outcome of communicative interactions, and the correlates of primate vocal repertoire size. I will also describe our most recent work, testing whether patterns consistent with linguistic laws - specifically Zipf’s law of abbreviation and Menzerath’s law - are found in the vocal (and gestural) communication of monkeys and apes.
Iván González Torre, Universidad Politécnica de Madrid (December 15, 2016). Exploration of linguistic laws in human voice.
Despite great interest and research activity, the study of linguistic laws is usually restricted to written text where segmentation is explicit. However, this segmentation is not well defined in the case of oral corpora. I will present a mathematical method that we have proposed to transform generic acoustic signals into sequences of symbols describing speech energy fluctuations. With this transformation, we can explore linguistic laws such as Zipf’s law, Heap’s law and the law of abbreviation without requiring a transcription of the signals into symbols of some sort. The method is simple and general, and it allows one to perform comparative studies between human and animal communication and beyond.
Emília-Maria Garcia Casademont, Institut de Biologia Evolutiva, CSIC/UPF (November 24, 2016). Cultural emergence of recursive phrase structure.
Naming Game models have been widely used to study the emergence of purely lexical systems, consisting of word-meaning pairs, within a population of communicative agents. Following a similar methodology, I will discuss a model meant to study the emergence of a grammatical system exhibiting recursive phrase structure.
Phrase structures combine words, phrases, and both, into phrases. I will argue that phrase structures and recursive phrase structures could be motivated by the need to avoid combinatorial search in parsing and semantic ambiguity in interpretation, while always keeping communicative accuracy. I will introduce an operational minimal model of communication (a specific language game), together with a language acquisition device (mechanisms), to explain how phrase structure can be achieved and acquired at the individual and collective level.
Vineeta Chand, University of Essex (June 14, 2016). Language dynamics in contemporary India.
In this talk I will introduce sociolinguistic aspects of contemporary India (language politics, changing fluencies, contemporary practices) as a situated case study with broader ramifications for sites of multilingualism and language contact across the globe. Specifically, I will focus on mixed practices (codeswitching) from two angles: (1) language shift dynamics in the Hindi Belt within a predator-prey modeling framework and (2) the quantitative characteristics of two mixed codes, Benglish and Hinglish,
with respect to Zipf’s Law. There are theoretical linguistic, sociolinguistic and evolutionary linguistic conclusions to be drawn from such research, which will also be addressed.

Bio: I am a sociolinguist at the University of Essex. I completed my BA at UC Berkeley (Cognitive Science), and my MA & PhD at UC Davis (Linguistics), supported by NSF and Wenner-Gren dissertation grants. My NIH/NIA-funded postdoctoral research was in the Alzheimer’s Disease Center at the UC Davis Department of Neurology. MY PhD research explored English dynamics in India. My research currently focuses on Indian sociolinguistics and clinical linguistics. I explore language politics and changing fluencies in India from sociolinguistic and applied physics approaches, while my clinical research explores language changes related to Alzheimer’s dementia. My journal publications include papers in Language in Society, Journal of Sociolinguistic, Current Protocols of Neuroscience, The Journal of Gerontology, and Physica A.
Miquel Fernández (dijous 7 d'abril del 2016). L'ordre de constituents: una visió diacrònica.
La major part de treballs de tipologia lingüística es centren en els valors i freqüències  observats a les llengües actuals, possiblement per la manca de dades lingüístiques històriques a la major part del món. En aquesta tesi s'han fet servir metodologies pròpies de la zoogeografia vicariant cladística per tal de trobar els escenaris més parsimoniosos de canvi d'ordre de constituents succeïts els darrers 50.000 anys, tot lligant-los a l'aparició dels factors socials i demogràfics propis del neolític.
Bruno Galantucci, Yeshiva University (March 14, 2016). A few things I learned about human communication.
In this talk I present a synopsis of the research on human communication that I conducted over the last few years. In the first part of the talk I focus on the linguistic side of communication, which I investigated through a methodology—Experimental Semiotics—that allows us to study in the laboratory the emergence of novel forms of human communication. In particular, I present a set of related studies aimed at investigating how communication systems acquire a combinatorial design. In the second part of the talk I focus on the psychological and social sides of communication, starting from the observation that humans exhibit important limitations when they are asked to perform tasks that require communicative sophistication. This raises the question of how individuals who have limited communicative skills manage to develop sophisticated forms of communication. I discuss three non-mutually exclusive hypotheses to address the question and present some empirical evidence relevant to one of them.
William Schueller, INRIA & ENSTA ParisTech  (July 27, 2015). Active learning and active control of growth complexity in naming games.
Naming Games are models of the dynamic formation of lexical conventions in populations of agents. In this work we introduce new Naming Game strategies, using developmental and active learning mechanisms to control the growth of complexity. An information theoretical measure to compare those strategies is introduced, and used to study their impact on the dynamics of the Naming Game.
Chris Kello, UC Merced (April 23, 2015). Adaptive critical branching networks.
Biological neural networks exhibit ongoing, spatiotemporal patterns of spiking activity. Evidence shows that spike dynamics shift from one transient attractor to another, i.e. they appear to be metastable. Metastability is theorized to be adaptive for neural and cognitive function, but learning must somehow remain stable in the context of highly variable spike dynamics. Stable learning is challenging in part because it appears that functions of homeostatic regulation and learning are both expressed through potentiation and de-potentiation of synapses.  In this talk, Prof. Kello will present a spiking neural network model that integrates homeostatic regulation with learning via a local, biological plausible process of synaptic modulation.  Homeostatic regulation towards the critical branching point results in power law spike dynamics, while learning shapes those dynamics to maximize reward and minimize punishment.  The model is shown to simulate intrinsic fluctuations in neural and behavioral activity, and the efficacy of learning is demonstrated using time-delayed XOR classification as a simple test function, and real-time phoneme recognition in naturalistic speech as a more challenging test.
Cynthia Siew, University of Kansas (June 5, 2014). Community structure in the phonological network.
Community structure, which refers to the presence of densely connected groups within a larger network, is a common feature of several real-world networks from a variety of domains such as the human brain, social networks of hunter-gatherers and business organizations, and the World Wide Web (Porter et al., 2009). Using a community detection technique known as the Louvain optimization method, 17 communities were extracted from the giant component of the phonological network described in Vitevitch (2008). Additional analyses comparing the lexical and phonological characteristics of words in these communities against words in randomly generated communities revealed several novel discoveries. Larger communities tend to consist of short, frequent words of high degree and low age of acquisition ratings, and smaller communities tend to consist of longer, less frequent words of low degree and high age of acquisition ratings. Real communities also contained fewer different phonological segments compared to random communities, although the number of occurrences of phonological segments found in real communities was much higher than that of the same phonological segments in random communities. Interestingly, the observation that relatively few biphones occur very frequently and a large number of biphones occur rarely within communities mirrors the pattern of the overall frequency of words in a language (Zipf, 1935). The present findings have important implications for understanding the dynamics of activation spread among words in the phonological network that are relevant to lexical processing, as well as understanding the mechanisms that underlie language acquisition and the evolution of language.
Cynthia Siew, University of Kansas (June 11, 2014). Spoken word recognition and serial recall of words from components in the phonological network.
Network science uses mathematical techniques to study complex systems such as the phonological lexicon (Vitevitch, 2008). The phonological network consists of a giant component (the largest connected component of the network) and lexical islands (smaller groups of words that are connected to each other but not to the giant component). To determine if the component that a word resided in influenced lexical processing, three language-related tasks (naming, lexical decision, and serial recall) were used to compare the processing of words from the giant component and from lexical islands. Results showed that words from lexical islands were recognized more quickly and recalled more accurately than words from the giant component. These findings can be accounted for via a spreading activation framework. Implications for network science and for models of spoken word recognition are also discussed.