Elsevier

Brain and Language

Volume 150, November 2015, Pages 54-68
Brain and Language

Corticostriatal response selection in sentence production: Insights from neural network simulation with reservoir computing

https://doi.org/10.1016/j.bandl.2015.08.002Get rights and content

Highlights

  • We modeled the corticostriatal system as a recurrent network with modifiable readouts.

  • The model learns to produce a variety of English and Japanese constructions.

  • It displays syntactic complexity and generalization effects.

  • Advances the field of recurrent network modeling of sentence-level language processing.

  • Corticostriatal response selection in motor control may generalize to language production.

Abstract

Language production requires selection of the appropriate sentence structure to accommodate the communication goal of the speaker – the transmission of a particular meaning. Here we consider event meanings, in terms of predicates and thematic roles, and we address the problem that a given event can be described from multiple perspectives, which poses a problem of response selection. We present a model of response selection in sentence production that is inspired by the primate corticostriatal system. The model is implemented in the context of reservoir computing where the reservoir – a recurrent neural network with fixed connections – corresponds to cortex, and the readout corresponds to the striatum. We demonstrate robust learning, and generalization properties of the model, and demonstrate its cross linguistic capabilities in English and Japanese. The results contribute to the argument that the corticostriatal system plays a role in response selection in language production, and to the stance that reservoir computing is a valid potential model of corticostriatal processing.

Introduction

The goal of the current research is to present a model of sentence production based on the function of the primate corticostriatal system, extending our previous work on corticostriatal function in sentence comprehension. We situate this work in the context of related models, and background on the neuropsychology of corticostriatal function in sentence production, both of which are relevant to our proposed model. The transmission of meaning by language is one of the marvels of human cognition. Sentence production and comprehension are complementary, but asymmetric. In comprehension, it is possible to correctly extract only part of the message – for example only the thematic role assignment (who did what to whom). In production, the speaker must generate a specific linear string of words which communicates the intended meaning that in addition to thematic roles should include some notion of focus or importance, and other dimensions including time, mode and aspect (Klein, 2013). These dimensions can be considered in the larger context of phrasal semantics – meaning that can be communicated by the grammatical structure of the sentence (Dominey, 2005, Jackendoff, 2002). Here, we can consider a representation of the meaning of an event and its thematic roles in a predicate-argument structure, along with some indication of whether the focus is on the agent, object, recipient, etc. Our meaning representation is in a predicate-argument format, originally developed in the domain of describing object manipulation actions, e.g. “The ball was given to Jean by Marie” (Dominey & Boucher, 2005). There we adopted a representation with the predicate, corresponding to the action, and the arguments corresponding to the agent, the manipulated object, and the recipient. This resulted in our use of the PAOR – or predicate, agent, object, recipient-representation. Thus, our notion of object in the PAOR notation corresponds to the classic thematic role of patient. Both of these components (thematic roles, and focus) should be encoded in the phrasal semantics of the sentence. In comprehension, the reception of this sentence should allow the listener to reconstruct the intended meaning – the thematic roles and the focus structure constituting the speakers’ construed meaning.1 Part of the richness of language expressivity is the varying ability across languages to use word order as a mechanism for specifying the communicative focus and other aspects of phrasal semantics within the sentence, in addition to communicating “who did what to whom.”

Our model can be considered in the larger context of models of language production, with those that focus on aspects of the word level processes of semantic retrieval, word repetition, and word production Roelofs (2014), or that may be more concerned with accounting for higher level behavior including alignment between speakers at multiple levels (e.g. alignment of grammatical structures, and situation models of the joint task that dialog participants are working on) that takes place during dialog (Pickering & Garrod, 2013). We are concerned with the production of sentences, multiple word utterances, that may have some degree of complexity including the use of embedded relative clauses. Takac, Benuskova, and Knott (2012) have modeled sentence production as a form of mapping from sensorimotor sequences to word sequences. They did not address issues of multiple non-canonical orders, relative clauses etc. Chang (2002) modeled sentence production using a dual path model that has one pathway for mapping message content to words and a separate pathway that enforces sequencing constraints, i.e. word order, based on Elman’s simple recurrent network (SRN) (Elman, 1990, Elman, 1991, Elman, 1993). This model employs recurrent connections that are modified by back propagation of error. In order to simplify the difficult problem of assigning error to recurrent connections, the problem is simplified, by only taking one recurrent pass through the network into account for the learning, hence the term “simple”. This model has been quite influential in cognitive science, including studies of language, e.g. (Christiansen and Chater, 1999, Elman, 1993) and sequence learning e.g. (Cleeremans and McClelland, 1991, Jiménez et al., 1996, Servan-Schreiber et al., 1991). Chang also set out to account for cross-linguistic differences, and thus demonstrated that the dual path model could account for word-order effects in English and in Japanese. Chang (2009) demonstrated that when the prominence of the thematic roles is expressed as part of the meaning, the model can appropriately learn different forms (e.g. active and passive) in English, and accommodate word scrambling in Japanese. The model was able to handle 50 different constructions with analogous structure in English and Japanese. This included 3 simple constructions, 9 sentential conjunctions, 6 phrasal conjunctions, 32 structures with relative clauses. In order to address relative clauses in more detail, Fitz, Chang, and Christiansen (2011) exploited the extended dual path model to accommodate multiple clauses. The meaning representation included three components: thematic roles (AGENT, PATIENT, RECIPIENT, etc.), concepts (lexical semantics), and event features to signal the number and relative prominence of event participants. Dell and Chang (2014) have recently applied their model of prediction and prediction error processing in sentence production to understanding aspects of aphasic production. Part of the goal of such modeling indeed should be not only to posit mechanisms of linguistic function, but also to establish links between linguistic function and the underlying neurophysiology.

The current research proposes a biologically inspired neural network model, in the reservoir computing framework, that learns to produce sentences. The link between reservoir computing and corticostriatal neurophysiology can provide useful insight into understanding aspects of higher cognitive function in human and non-human primates. Barone and Joseph (1989) observed PFC activity in macaque monkeys trained to perform a visuomotor sequencing task. For the first time, they observed PFC neurons that encoded a mixture of spatial and sequential rank selectivity. We modeled PFC as a network of leaky integrator neurons with fixed recurrent inhibitory and excitatory connections, and corticostriatal connections modifiable based on reward-related dopamine (Dominey, Arbib, & Joseph, 1995). This was the first instantiation of reservoir computing. The key notion is that the intrinsic dynamics of the fixed-connection reservoir provide an inherent capacity to represent arbitrary sequential structure. PFC neurons in the model displayed the same mixture of spatial location and sequence rank as observed by Barone and Joseph. We further demonstrated that in this configuration, PFC encodes task context, and striatum encodes action selection, again as observed in the primate (Dominey & Boussaoud, 1997), thus supporting the analogy between reservoir–readout and cortex–striatum. More recently the claim that cortex corresponds to a reservoir (based on dense local recurrent connections) has been supported by anatomy and physiology, and modeling (Nikolic et al., 2009, Rigotti et al., 2010, Rigotti et al., 2013).

In this context, we attempt to determine if this approach to modeling the corticostriatal system can be applied to sentence production. We are particularly interested in the problem of how different word orders can be used to describe the same event, but with different focus. As will be described in more detail below, given a mental model with two events, and three arguments each, there is a small combinatorial explosion of the different ways that this meaning can be expressed in a sentence in English. The explosion is even greater in Japanese where there are fewer restrictions on word order.

When faced with this level of possible degrees of freedom, sentence production can take on an aspect of motor planning, in that the sequence of words to be produced is specific for a particular communicative goal, like a motor sequence trajectory may be specific for a particular action goal. The framework that we use to address this problem is based on the sequence processing capabilities of the corticostriatal system which plays a central role in the sequential organization of behavior, and action sequence selection (Hikosaka, Nakamura, Sakai, & Nakahara, 2002). In order to appreciate the functional significance of the corticostriatal system, one should recall that all of the primary and associative cortices including the language areas project to the striatum – the input nucleus of the basal ganglia (Alexander et al., 1986, Yeterian and Pandya, 1998). The integrity of the corticostriatal system is thus likely required both for language comprehension and production (Argyropoulos et al., 2013, Friederici and Kotz, 2003, Friederici et al., 2003, Frisch et al., 2003, Hochstadt, 2009, Hochstadt et al., 2006, Kotz et al., 2003).

We have previously examined how the corticostriatal system could implement aspects of the mechanism that learns to interpret sentences in language (Dominey, 2001, Dominey, 2013, Dominey et al., 2003, Dominey and Inui, 2009, Dominey et al., 2009) where language is considered a structured inventory of grammatical constructions mapping sentence form to meaning (Goldberg, 1995, Goldberg, 2003). The model was based on the hypothesis that thematic role assignment (determining who did what to whom) can be determined by the order and position of closed class elements (grammatical function words and grammatical morphemes) (Dominey, 2001, Dominey and Inui, 2009, Dominey et al., 2003, Dominey et al., 2009, Hinaut and Dominey, 2013). In this family of models, the input to the recurrent network was the sequence of activation of neurons coding the closed class words as they appeared in the sentence. This drove the recurrent network into a specific trajectory for each different sentence type. Learning in connections between the recurrent reservoir nodes and the output neurons allowed the output neurons to correctly decode the thematic roles for the open class words for input sentences. In the current research we invert this process, that is, we provide the input as activation of neurons coding the meaning of the desired sentence. Meaning is coded as the ordered set of open class elements, and their corresponding thematic roles, that we together refer to as the focus hierarchy. This drives the recurrent network through a specific trajectory of activation. We train the output connections to activate word-coding units in the appropriate order to generate the corresponding sentence to express the input meaning.

This model of language processing thus places strong requirements on the function of the corticostriatal system. Indeed we have proposed the notion of a corticostriatal language loop (Dominey, 2013) that would take its place in the set of functional loops initially proposed by Alexander et al. (1986). This is based on several modeling studies that link corticostriatal function to language comprehension (Dominey and Inui, 2009, Dominey et al., 2009), including predicting results in EEG, fMRI and aphasia that were subsequently confirmed (Dominey and Hoen, 2006, Dominey et al., 2006, Dominey et al., 2003, Hoen et al., 2006).

Thus, this model would predict that deficits in the corticostriatal system would have repercussions in language production. There is significant evidence for the role of the corticostriatal system in language that is derived from studies of healthy subjects with fMRI and EEG (e.g. (Friederici & Kotz, 2003)), and from studies of pathology including Parkinson’s disease (Friederici et al., 2003), Huntington’s disease (Teichmann et al., 2005) and subcortical aphasia (e.g. (Moro et al., 2001). The principal core aspect of subcortical aphasia is impaired generative language production. While the ability to read a sentence out loud is generally intact, the ability to generate a sentence given a verb is severely impaired (Mega & Alexander, 1994). The role of the fronto-striatal system in language production is highlighted by these authors: “We propose that the severity of the language profile reflects the extent of damage to frontal-striatal systems.” (Mega & Alexander, 1994, p. 1827). Damage to the paraventricular white matter can produce this core deficit, underlying the importance of the intact corticostriatal system.

Mega and Alexander situate this subcortical aphasia in a historical context. They note that Luria and Tsvetkova (1967) described what they called “dynamic aphasia” as an impairment in “subjective generative grammar” producing a disturbance in the “transition from the initial thought to the ‘linear scheme of the phrase’.” They further consider more recent analysis suggesting that the problem is actually a language deficit in automatic access to or recruitment of proceduralized syntactic systems necessary for sentence construction” (p. 1828) (Mega & Alexander, 1994). Such proceduralized systems could reflect an extension of the procedural system for grammatical rules attributed to the corticostriatal system by Ullman (Ullman, 2001a, Ullman, 2001b, Ullman, 2004, Ullman et al., 1997). The core profile of deficits in the generation of syntactically rich language, not just in speech generation, following lesions in the dorsolateral striatum or its frontal connections provides additional support for this view.

Multiple phenomena may intervene in these pathologies, and some authors have challenged the role of the basal ganglia in subcortical aphasia (Nadeau & Crosson, 1997). Recent data from brain imaging in healthy subjects examined caudate and putamen activation in different conditions including sentence generation and sentence repetition (Argyropoulos et al., 2013). Their findings support the idea that it is primarily the caudate that contributes selection processes in sentence generation vs. repetition. A recent review of language production in Parkinson’s disease notes impaired grammaticality and reduced syntactic complexity among the recurrent deficits (Altmann & Troche, 2011). Modeling can contribute to this discussion related to the functional role of the corticostriatal system in sentence production.

Our modeling approach extends our model of corticostriatal function from sentence comprehension to sentence production, and is based on the hypothesis that the corticostriatal system can learn to select appropriate sentence forms for expressing meanings, based on learning from matched meaning-sentence pairs. From a brain and language perspective, this is useful, as it makes a clear hypothesis about the functional role of the corticostriatal system in sentence production, as a form of grammatical construction selection. The challenges we face are the following: The system should be capable of learning meaning-to-sentence mappings, with little language specific processing. This includes the ability to process relative clauses. The system should demonstrate cross-linguistic capabilities, and we will thus consider the learning of grammatical constructions in English and Japanese. The model should also address language production deficits in the presence of striatal damage as observed in patients with Parkinson’s disease (Altmann & Troche, 2011) and in patients as observed by Mega and Alexander (1994) with deficits in the generation of syntactically rich language following lesions in the dorsolateral striatum or its frontal connections. An additional objective of this work is to demonstrate that without modification of the recurrent connection weights the recurrent network still provides a foundation for representing the transformation between meaning and sentence structure. This is of interest because of the contrast with all other recurrent network models of language production which focus on modification of the recurrent network itself. Here we will demonstrate that a recurrent network with fixed recurrent connections can generate appropriate dynamics that can be used to learn to produce sentences via modifiable readout connections, in the reservoir computing framework (Dominey, 1995, Dominey et al., 1995, Jaeger and Haas, 2004, Lukosevicius and Jaeger, 2009, Maass et al., 2002). This is important with respect to characterizing the minimal requirements for language universality. It demonstrates that without modification, the recurrent network is inherently capable of representing the grammatical structure of language.

It is now known that one of the fundamental properties of such reservoirs – or recurrent networks with fixed connections – is that they project the inputs into a high dimensional space, in which a multitude of non-linear combinations of the inputs (and their serial and temporal order) are represented (Antonelo and Schrauwen, 2012, Hermans and Schrauwen, 2012, Jaeger and Haas, 2004, Manjunath and Jaeger, 2013). Arbitrary functions of these representations can then be learned and read-out from the output neurons that are connected to the recurrent network by modifiable connections. Interestingly, it has been observed that neural representations in the cerebral cortex have these high-dimensional reservoir properties (Nikolic et al., 2009, Rigotti et al., 2010, Rigotti et al., 2013). This research will then attempt to contribute to the argument that, even without learning within the recurrent connections, the existing high dimensional dynamics allow a re-coding of language structure into a space where regularities are represented, and can be learned in a linear readout.

How does this notion of fixed connections in the recurrent network apply to the question of cross-linguistic competence? Our reservoir-based model of sentence production should function for different languages with no changes to the reservoir concept. That is, while the reservoir operates with fixed connectivity, it is in the readouts that language-specific changes are made.

Section snippets

Functional requirements for language production

We thus return to the question of the meaning-to-sentence transformation. In this context we consider that there is a mental model of a situation – an internal representation that can be formed via multiple methods including perception, hearing or reading a description (Johnson-Laird, 1980, Johnson-Laird, 2004, Johnson-Laird, 2010), and that based on diverse attentional and discourse effects, a particular construal of that mental model is generated. The construal takes a specific perspective on

Corticostriatal models for sentence comprehension and production

We have previously developed a model of sentence comprehension that takes sentences as inputs, and generates as output the thematic roles of the open class elements, and the corresponding focus hierarchy. This model is illustrated in Fig. 1.

We demonstrated that the model can reliably learn how patterns of closed class words in a sentence correspond to the thematic role labeling of the open class words in the sentence. Thus, as illustrated in Fig. 1, after training, when the model is given a new

Experiment 1: Validation of model and parameter sensitivity

We first test the model with 500 neurons on a representative corpus of 10 different constructions from the 462 corpus. The model is trained and tested on the same constructions. Two key parameters control the temporal dynamics in the reservoir. The reservoir neuron leak rate specifies the rate at which activity in the individual recurrent neurons decays in time. The spectral radius is a more global property of the reservoir connection matrix that determines the effective time constant of the

Experiment 2.1

We first tested a larger corpus of 26 English constructions (Appendix B) that had been used in Hinaut and Dominey (2013). The training and testing set were the same. The experiments were performed with 10 instances that differ only in the seed value used to assign the random weights. With 1500 neurons the model learns to produce these sentences with no errors. In order to assess if there are systematic performance differences for different construction types, we examined partially “degraded”

Experiment 3: Generalization based on corpus structure

If the model is able to learn these grammatical constructions for language production, we must ask about the nature of what is being learned. We can formulate two potential hypotheses about what is being learned.

Experiment 4.1 learning

We further tested the system with progressively larger reservoirs. Because of the high computational requirements of these larger reservoirs, we were required to use high-performance computing resources, including parallel computing on the French National Institute of Nuclear and Particle Physics (IN2P3) computing grid. Likewise, in order to accelerate matrix computations involved in the training, we also benefitted from GPU matrix manipulation with CUDA, using the NVIDIA Corporation Tesla K40

Experiments with Japanese

Another form of generalization involves testing exactly the same model with a different language. In Dominey et al. (2006) we demonstrated that our grammatical construction model based on the closed class hypothesis could operate equally well on English and Japanese. In Japanese, the grammatical markers –ga, -wo, ni, -yotte, etc. serve as a form of case marker, and thus, allow the model to identify a unique pattern of these markers corresponding to a unique assignment of meaning. Here we

Experiment 5.1 learning

We determined that with a reservoir with 500 neurons with a 24 element corpus, the model was able to correctly process the set of sentences. In Fig. 7 we observe reservoir neuron activation on the left, and readout neuron activation on the right, coding the output sentences. It is of interest to compare this figure to Fig. 3, illustrating the same activation, but in processing English sentences. What is interesting is that the essential aspects of the processing do not differ between the two

Discussion

The objective of this research was to determine if a computational neuroscience model of the corticostriatal system could demonstrate language-learning capabilities for sentence production. The reason behind this objective was twofold: First, the implication of the corticostriatal system in language production has a long and rich history. The development of a neural network based on the corticostriatal system that can demonstrate language production capabilities can help to shed light on the

Acknowledgments

This research was supported by the European Union through FP7 Grant WYSIWYD. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Tesla K40 GPU used for this research, and the French National institute of nuclear and particle physics (IN2P3) computing grid. We also acknowledge Victor Barres for useful discussion related to the different modes of generalization when randomizing sentences vs. meanings. We particularly thank the editor and three anonymous reviewers

References (89)

  • P.F. Dominey et al.

    Neurological basis of language and sequential cognition: Evidence from simulation, aphasia, and ERP studies

    Brain and Language

    (2003)
  • P.F. Dominey et al.

    Cortico-striatal function in sentence comprehension: Insights from neurophysiology and modeling

    Cortex

    (2009)
  • P.F. Dominey et al.

    Neural network processing of natural language: II. Towards a unified model of corticostriatal function in learning sentence comprehension and non-linguistic sequencing

    Brain and Language

    (2009)
  • J. Elman

    Finding structure in time

    Cognitive Science

    (1990)
  • J.L. Elman

    Learning and development in neural networks: The importance of starting small

    Cognition

    (1993)
  • S. Frisch et al.

    Why the P600 is not just a P300: The role of the basal ganglia

    Clinical Neurophysiology

    (2003)
  • A.E. Goldberg

    Constructions: a new theoretical approach to language

    Trends in Cognitive Sciences

    (2003)
  • J.A. Grahn et al.

    The role of the basal ganglia in learning and memory: Neuropsychological studies

    Behavioural Brain Research

    (2009)
  • O. Hikosaka et al.

    Central mechanisms of motor skill learning

    Current Opinion in Neurobiology

    (2002)
  • J. Hochstadt

    Set-shifting and the on-line processing of relative clauses in Parkinson’s disease: Results from a novel eye-tracking method

    Cortex

    (2009)
  • J. Hochstadt et al.

    The roles of sequencing and verbal working memory in sentence comprehension deficits in Parkinson’s disease

    Brain and Language

    (2006)
  • M. Hoen et al.

    When Broca experiences the Janus syndrome: An ER-fMRI study comparing sentence comprehension and cognitive sequence processing

    Cortex

    (2006)
  • P. Johnson-Laird

    Mental models in cognitive science

    Cognitive Science

    (1980)
  • A. Moro et al.

    Syntax and the brain: Disentangling grammar by selective anomalies

    Neuroimage

    (2001)
  • S.E. Nadeau et al.

    Subcortical aphasia

    Brain and Language

    (1997)
  • A. Roelofs

    A dorsal-pathway account of aphasic language production: The WEAVER++/ARC model

    Cortex

    (2014)
  • D. Sussillo

    Neural circuits as computational dynamical systems

    Current Opinion in Neurobiology

    (2014)
  • M. Takac et al.

    Mapping sensorimotor sequences to word sequences: A connectionist model of language acquisition and sentence generation

    Cognition

    (2012)
  • M. Tomasello

    The item-based nature of children’s early syntactic development

    Trends in Cognitive Sciences

    (2000)
  • M.T. Ullman

    Contributions of memory circuits to language: The declarative/procedural model

    Cognition

    (2004)
  • A. Wray et al.

    The functions of formulaic language: An integrated model

    Language & Communication

    (2000)
  • G.E. Alexander et al.

    Parallel organization of functionally segregated circuits linking basal ganglia and cortex

    Annual Review of Neuroscience

    (1986)
  • L.J. Altmann et al.

    High-level language production in Parkinson’s disease: A review

    Parkinson’s Disease

    (2011)
  • P. Barone et al.

    Prefrontal cortex and spatial sequencing in macaque monkey

    Experimental Brain Research

    (1989)
  • E. Bates et al.

    Competition, variation, and language learning

  • D.V. Buonomano et al.

    Cortical plasticity: From synapses to maps

    Annual Review of Neuroscience

    (1998)
  • F. Chang

    Symbolically speaking: A connectionist model of sentence production

    Cognitive Science

    (2002)
  • A. Cleeremans et al.

    Learning the structure of event sequences

    Journal of Experimental Psychology: General

    (1991)
  • G.S. Dell et al.

    The P-chain: Relating sentence production and its disorders to comprehension and acquisition

    Philosophical Transactions of the Royal Society B: Biological Sciences

    (2014)
  • P.F. Dominey

    Complex sensory-motor sequence learning based on recurrent state representation and reinforcement learning. [Research Support, Non-U.S. Gov’t]

    Biological Cybernetics

    (1995)
  • P.F. Dominey

    A model of learning syntactic comprehension for natural and artificial grammars

    (2002)
  • P.F. Dominey

    A conceptuocentric shift in the characterization of language: Comment on Jackendoff

    Behavioral and Brain Sciences

    (2003)
  • P.F. Dominey

    Aspects of descriptive, referential, and information structure in phrasal semantics: A construction-based model

    Interaction Studies

    (2005)
  • P.F. Dominey

    Recurrent temporal networks and language acquisition-from corticostriatal neurophysiology to reservoir computing

    Frontiers in Psychology

    (2013)
  • Cited by (0)

    View full text