Remembering the “When”: Hebbian Memory Models for the Time of Past Events

Humans and animals can remember how long ago specific events happened. In contrast to interval-timing on the order of seconds and minutes, little is known about the neural mechanisms that enable remembering the “when” of autobiographical memories stored in the episodic memory system. Based on a systematic exploration of neural coding, association and retrieval schemes, we develop a family of hypotheses about the reconstruction of the time of past events, consistent with Hebbian plasticity in neural networks. We compare several plausible candidate mechanism in simulated experiments and, propose how combined behavioral and physiological experiments can be used to pin down the actual neural implementation of the memory for the time of past events.


Introduction
Humans and animals track temporal information on multiple timescales, to estimate, for example, the location of a sound source based on millisecond time differences of sound arrival at the two ears, the interval duration between the perception of lightning and thunder, or the days, months and years that have elapsed since an autobiographical event took place (Carr and Konishi, 1990;Gerstner et al., 1996;Grothe et al., 2010;Buhusi and Meck, 2005;Addyman et al., 2016;Paton and Buonomano, 2018;Issa et al., 2020;Tsao et al., 2022).For autobiographical memories, recall of the "when" information is often an explicit and conscious reconstruction-based process (Friedman, 1993), for example, "we went to Turkey the year my sister got married, she is five years older than me, got married at the age of 30, and I am now 37 years old, so this must have been 12 years ago."However, even without explicit reconstruction, healthy human adults usually have a good sense of whether a recalled event happened yesterday, a year ago or decades ago, and there is evidence for automatic processes, in particular in young infants without a fully developed episodic memory system (Friedman, 2013;Pathman et al., 2013;Jelbert and Clayton, 2017).Also corvids, rodents, and other species with an episodic-like "what-where-when" memory can remember the time of past events on timescales of days to months (Clayton and Dickinson, 1998;Jelbert and Clayton, 2017).
Whereas on timescales from milliseconds to minutes, multiple mechanisms based on changing neuronal activity patterns are known to support accurate interval timing (Buhusi and Meck, 2005;Addyman et al., 2016;Paton and Buonomano, 2018;Issa et al., 2020;Tsao et al., 2022), little is known about the neuronal mechanisms that support the recall of the time of past events on much longer timescales.Memories on these timescales are likely to rely on long-term synaptic plasticity (Abraham et al., 2019) and possibly on systems consolidation (Moscovitch and Gilboa, 2021).
Multiple research communities have developed models of episodic memory to explain recall of past events (Norman et al., 2008).These models focus on different aspects, like replicating data from laboratory-based memory tasks, such as free or serial recall of lists (reviewed in Sederberg and Darby (2023)), embedding episodic memory in cognitive architectures (reviewed in Laird et al. (2017)), developing attractor neural networks consistent with anatomical and physiological knowledge of the hippocampal formation (reviewed in Rolls and Treves (2024)), or explaining systems consolidation (reviewed in Moscovitch and Gilboa (2021)).Although attempts at categorizing different theories of the time of past events have been made (Friedman, 1993), there exists no systematic exploration of computational models that focus on the time of past events.
Here, we study from a theoretical perspective different neural mechanisms that enable automatic reconstruction of the time of past events on long timescales.Whereas our theoretical considerations about the representation of temporal information are relevant for tracking time on any scale, we focus, in particular, on settings used to investigate episodic-like memory, where a stream of sensory inputs on a timescale of days or months is perceived by an organism that can recall past events and respond with some actions (Figure 1).As an example of the abstract setting described in Figure 1, one may think of experiments where food caching animals learn to retrieve from caches they made the same day and ignore caches they made a few days ago (Clayton and Dickinson, 1998) or experiments with children who are asked for recency judgments of personally experienced events that took place a few months ago (Pathman et al., 2013).For memory tasks on this timescale, it is unlikely that some kind of persistent neural activity bridges the gap between storage and recall.Instead, it is commonly believed that memories are stored over long timescales in an activitysilent way: synaptic connections are changed through a Hebbian plasticity mechanisms (Hebb, 1949;Abraham et al., 2019) at the moment when an event is experienced and later retrieved by stimulating neurons upstream of these changed connections, which turns the dormant memories back into a neural activity patterns.Our goal is to explore possible Hebbian mechanisms that allow to decode the age of such dormant memories at the moment of retrieval.As the neural mechanisms underlying the retrieval of the age of dormant memories are largely unknown and may vary across different species, we do not want to limit ourselves, a priori, to specific classes of models; rather we aim at spanning the space of possible mechanisms.We discuss in detail some representative examples and propose in simulations specific behavioral experiments that allow to distinguish between different model classes.Importantly, and going beyond behavioral predictions, we also show that detailed neural that detailed neural and synaptic recordings are needed to identify the exact mechanisms that allow retrieval of the "when" of past events.

Representing Information: the Space of Possible Codes
The activity in a network of neurons can represent a memory in multiple ways.For simplicity, we assume that the "what", the "where" and the "when" of each memory are elements of discrete sets, like the sets of colors, shapes and time points in Figure 1.The value of such discrete variables can be represented with, (i) the firing rate of a neuron (rate), (ii) the identity of an active neuron within a group of neurons (onehot) or (iii) the distributed activity pattern in a group of neurons (distr; see Figure 2A and section "Formal Description of Codes").For example, in a rate code of color, the sight of red and blue objects evokes different activity levels in the same neuron, whereas in a one-hot code, different neurons are tuned to different colors.A strict rate code with a single neuron or a strict one-hot code, where a given stimulus feature activates a single neuron, are idealizations that are unlikely to be found in any brain.Instead, stimulus features may be represented by a distributed code, where multiple neurons become active, when perceiving the color "red", for example.However, certain distributed codes can be reduced to rate or one-hot codes by summing the activity of subsets of neurons.Trivial examples are redundant rate or one-hot codes with groups We consider discrete sensory streams, like perceiving a red triangle at time  1 and a blue square at time  2 .In all illustrations we use color, shape and time as abstract analogies of the "what", "where" and "when" of specific events, respectively.The time points  1 ,  2 , … are not necessarily equally spaced and may be separated by hours or days.At time  3 , the white triangle first triggers activity in the brain that corresponds to the perception of a white triangle and, second, activity that corresponds to remembering the red triangle, including the information of how long ago the red triangle was perceived.An action is performed upon memory retrieval, like saying "red, two".In the next time step, a new stimulus can be given, together with a reinforcement signal (clapping hands).Our goal is to find neural network dynamics and synaptic plasticity rules that change the connections between neurons (red lines) such that the sensory stream can be remembered and action selection rules that depend on the recalled event can be learned.
of identical neurons.Another example is the population rate code (poprate), where the number of active neurons encodes the value of a variable.We use the term "distributed code" only when such a reduction by summation is impossible.
Representing Time: Timestamp and Age Codes, Internal and External Zeitgeber Information about the "when" of an event can be represented by any code discussed above, as soon as a reference point for measuring time is defined.We distinguish timestamp and age representations (Figure 2B).In timestamp representations, time is measured relative to a fixed reference point in time.The fixed reference point could be the birth of an individual and the "when" information of an event could be represented as "5 months since birth".In timestamp representations of time, the neural activity representing the "when" information during recall of a given event is always the same, no matter when recall happens; this neural activity code can thus be seen as representing a timestamp attached to each memory.Importantly, we do not assume that this timestamp representation encodes literally the date and time of an event.In fact, any neural activity pattern can be a timestamp, if it allows to infer the time of a given event and does not change with the age of the memory.In contrast, in age representations, time is measured relative to changing moments in time.The changing reference point could be the current moment in time and the "when" information of an event could be represented as "8 months ago".In contrast to timestamp representations of time, age coding implies that the neural activity during recall of a given event is not the same at different moments, because the "when" information depends on how much time has elapsed between storage and recall; this neural activity code can thus be seen as representing the age of each memory.An example of an age code is shown in Figure 2B, where the elapsed time between storage and recall is represented by the location of the activity peak.
We use the term zeitgeber to refer to the process that generates either timestamps or changes the age code.Unlike a clock, that is synchronized with physical time, a zeitgeber may drive the representation of time with variable speed that depends, for example, on the frequency of events that are worth to be memorized.The zeitgeber can either be an internal process that runs almost autonomously inside the time-perceiving agent or it can depend mostly on the agent's interaction with the external world.Internal zeitgebers can be any biological process inside the agent with a fairly stable time constant such as ramping or decreasing synaptic strengths, neurogenesis, spine turnover, the circadian rhythm (in the absence of exposure to sunlight) or even changes in satiety, thirst or tiredness level.Examples of external zeitgebers are processes like the ticking of a clock, the day-night cycle or the change of seasons; also processes that involve the agent's actions, like changes of context (leaving or entering a house) or changes of the main activity (switching from working to eating lunch), could act as external zeitgebers.
Associating Information: How to Combine What, Where and When The ability to remember everything about a given event, the "what", "where" and "when" information needs to be associated in some way.In the following, we assume the "what" and the "where" are given as a content variable in some code and focus exclusively on how the content is associated with the "when" information.The question of associating the "what" (and "where") with the "when" becomes the question of building an "association function" that produces a neural activity pattern in response to the "when" information on one side and the encoded content ("what" and "where") information on the other side (see section "Formal Description of Association Schemes").
The number of possibilities to write down such an association function is huge, even if we restrict ourselves to those functions that do not "loose" any information, in the sense that the content and "when" information can be faithfully reconstructed from the momentary neural activity pattern.In the following we focus on three specific examples of association functions: concatenation, (outer) product and random projection codes (Figure 2C).
In a concatenation code, neurons can be split into two separate groups: one representing the "when" information and the other one the content information (⊕ in Figure 2C).Closely related is a linear mixed code, where such a split is not directly possible, because single neurons contribute to the representation of both content and "when" information, but a linear transformation of the neural population activity would allow to represent the content and "when" information in a concatenation code.An example of a non-linear mixed code is the product code (⊗ in Figure 2C), where the neural population activity is given by the outer product of the content and the "when" code.This product code is a special case of tensor product variable binding (Smolensky, 1990).Such a product code requires, in general, more neurons than a concatenation code: if content and "when" information could be represented separately by  and  neurons, respectively, their association with a product code requires  ×  neurons, whereas  +  neurons would be sufficient for a concatenation code.Other non-linear mixed codes can be constructed with (circular) convolutions (Kelly et al., 2013) or random projections (RP in Figure 2C), where the activity of each neuron in a group depends non-linearly on a randomly weighted mixture of content and "when" information.
The way a neuronal population represents the association of content and "when" information has important implications for the readout of retrieved memories, as we will discuss in the next section and demonstrate in the section "Simulations".

Storage, Retrieval and Readout of Memories
So far, we considered only the representation of information.However, the description of a memory system is incomplete without a characterization of the storage, retrieval and readout mechanism.

Retrieval
In the field of neural networks, memory retrieval is typically implemented with hetero-or autoassociative memories (Figure 2D).In both, hetero-and auto-associative networks, the output activity of a neural network in response to an input cue represents the retrieved memory.In a heteroassociative memory, retrieval is performed in a single step whereas a recurrent auto-associative network requires convergence to a fixed point (Amit, 1989).However, a single update step is often sufficient to almost reach the fixed point and retrieve a memory almost perfectly, in particular in kernel memory networks (Iatropoulos et al., 2022).

Storage and Retrieval Phases
Most models require separate storage and retrieval phases.Suppose "red triangle" has already been stored at time step  1 .At time  3 , the input "white triangle" triggers recall of the stored memory, i.e. the neural code for the "when" information  1 together with the content information "triangle" and "red" should be accessible at time  3 and become active while the remembered event "red triangle" is retrieved from memory (retrieval phase).At the same time step  3 , however, it must also be possible to store the new event "white triangle", as an event that happens at time  3 (storage phase).Separate storage and retrieval phases could be implemented by a periodic process in which external input drives the memory network during the storage phase and recurrent connectivity in the memory network dominates during the retrieval phase.The theta rhythm in the hippocampus, or some modulatory factors, like neurotransmitters, could drive such a periodic process (Hasselmo et al., 2002).
Alternatively, the separation into storage and retrieval phases could be implemented with synaptic delays: if multiple pathways exist between two groups of neurons, for example the "inputcontent" pathway and the "input-intermediate-content" pathway in Figure 3, and if information travels at different speeds along the different pathways, then stimulus-driven activation through one pathway could be used for storage ("input-content" pathway in Figure 3) and through the other pathway for retrieval ("input-intermediate-content" pathway in Figure 3).In the simulated models in section "Simulations" we use this mechanism to distinguish storage and retrieval phases.

Behavioral Readout
Once a previously stored activity pattern is retrieved, it can trigger some behavioral output.In contrast to computer memory, where recall success can be measured by the number of bits lost between storage and recall, successful retrieval of a memory in humans and animals is usually inferred from some behavioral output, which may have a different representation than the input that led to the formation of the memory (e.g.visual input and vocal output in Figure 1).Therefore, we must include a discussion of action selection that subjects perform in response to retrieved memories.
Behavioral rules control which action to perform in response to a specific retrieved memory.
Whereas learning is often irrelevant for memory tasks with humans -as the behavioral rule is usually instructed -it is crucial for memory experiments with animals.How easily different behavioral rules can be learned depends on the representation of recalled memories.This can be used to design experiments that discriminate between different kinds of "what-where-when" memory systems, as we will show in section "Simulations".For a strict one-hot code, for example the product of one-hot codes onehot(content) ⊗ onehot(age), any behavioral rule that maps content and age of a recalled event to a given action can be learned with direct readout, i.e. plastic connections between the layer of recalled activity to action neurons (Figure 2E).For distributed or rate coding, direct readout allows learning of some rules, but complex readout is needed to learn any rule.For example, the notorious XOR rule (Hertz et al., 1991), where a certain action is taken if and only if two input neurons are jointly active or jointly inactive, cannot be learned with direct readout, but it can be learned with a multilayer perceptron (Figure 2E).Learning all connections in a multilayer perceptron can be achieved with the backpropagation algorithm or biologically plausible variants thereof (Lillicrap et al., 2016;Illing et al., 2019;Roelfsema and Ooyen, 2005), but it is rather slow if learning happens in an online fashion, where each example is used just once.An alternative is to rely on fixed weights in most layers to transform the input into a useful feature representation and quickly learn flexible mappings with biologically plausible Hebbian plasticity in the last layer (Figure 2E).Signals take more time to travel in long pathways with multiple intermittent synapses than in short pathways, because signal transmission across chemical synapses takes time.Therefore, multiple pathways of different lengths between two groups of neurons can be used to separate storage and retrieval phases.During storage, the input drives the activity in the intermediate layer; the content layer receives input through the direct "input-content" pathway (indicated by yellow arrows).Connections between the intermediate and the content layer (red dashed arrows) are selected for growth with a Hebbian plasticity rule.Shortly thereafter, because of more synaptic delays along the longer pathway, the content neurons receive input through the "input-intermediate-content" pathway, i.e. the intermediate layer is the main input of the content layer (yellow arrow).Already grown connections (red arrows) enable recall of previous events.The actual growth of the connections selected in the storage phase (dashed red arrows) is not instantaneous and does therefore not interfere with the recall phase.Once the content neurons received input through the "input-intermediate-content" pathway, the content layer drives the action selection through weights that implement some learned rule (orange arrows).A The events "red triangle" and "blue square" were observed at times  1 and  2 , respectively.At time  3 the event "white triangle" is observed.B During storage, the current context (activity in layer "now") drives the "tag" neurons (yellow arrow) such that content and context can be bound together (dotted red lines).C Subsequently, the previously grown synaptic connections (red lines) allow auto-associative recall of the event "red triangle".During recall, the tag neurons are no longer driven by input from the "now" neurons, but, through auto-associative recall (yellow arrow), the activity of the tag neurons encodes the context at time  1 .The comparison of the current context (represented by the "now" neurons) with the recalled context (represented by the "tag" neurons) allows the readout network to estimate the age of the retrieved memory.Consequently, behavioral rules that depend on the age of the recalled memory can be learned (orange weights).

Synaptic Plasticity
Long-term synaptic changes are presumably involved for memorization in the storage phase, for learning actions to indicate successful retrieval and to reflect the passage of time in age representations of time.
If pre-and postsynaptic neurons are jointly active during the storage phase, Hebbian synaptic plasticity is sufficient to memorize and generate a trace of the event in the memory system.Behavioral rules that depend on the content and the age of recalled memories could be learned with neoHebbian synaptic plasticity (Lisman et al., 2011;Gerstner et al., 2018;Roelfsema and Holtmaat, 2018;Kuśmierz et al., 2017;Magee and Grienberger, 2020), where jointly active neurons generate an eligibility trace that is modulated by a subsequent signal.This modulating signal could communicate the reward received after a successful action.
For age representations of time, synaptic growth or decay could reflect the passage of time.
Examples of how this could be achieved are discussed in the next section.
For timestamp representations of time, the synaptic changes for memorization and behavioral learning are sufficient.Although this is an appealing advantage of timestamp representations of time, it comes at the cost of an increased complexity to compute the age of recalled memories, because a representation of the current moment in time needs to be compared with the time of storage of the recalled event.

Examples of Episodic-Like Memory Systems
With four different encoding schemes (Figure 2A), two different ways of representing time (Figure 2B memory systems.This is a lower bound, because even more association, storage-retrieval and readout mechanisms are conceivable.The number 288 looks daunting.However, in this section we discuss in more details six specific examples that are representative of the different kinds of models (Table 1).Detailed mathematical descriptions of these models can be found in section "Mathematical Description of the Models".
Timestamp Tagging with Auto-Associative Retrieval and Complex Readout Time tagging models are characterized by a concatenation code that combines one subgroup of neurons that represent content information ("content" in Figure 4) with another subgroup of neurons that represent "when" information ("tag" in Figure 4).
In the spirit of (temporal and random) context models (Howard and Kahana, 2002;Polyn et al., 2009;Howard, 2022), the "when" information could be given implicitly by activity patterns that encode context ("now" in Figure 4).This context could be a trace of recent observations of states that change on different timescales like emotional states, the presence of certain conspecifics, ambient temperature or the weather.In the following, we call this the "Context-Tagging model".At the moment of storage, this context information is bound together, e.g. by a Hebbian plasticity rule, with the specific event under consideration (Figure 4B).If an event triggers the recall of an earlier event (Figure 4C), the associated context, i.e. the "when" information, is also recalled .These synaptic connections are pruned after random, postsynaptic-neuron-specific durations, such that during recall at time  2 more neurons are activated in the content layer than during recall at time  3 >  2 .This is an example of a non-linear mixed code of content and time, because the activity of a given neuron in the content layer can mean, for example, "a red triangle was observed at most so-and-so long ago".The neurons in the content layer link directly to action neurons (orange arrows).

Age Tagging Models
Other tagging models can be constructed with one-hot coding or population-rate coding for the age of memories (Figure 2A).In the Onehot-Age-Tagging model, storage leads to the formation of a synaptic connection to the first tag neuron (Figure 5B).It is hypothesised that specific circuits allow to change the representation of the memory by growing new synapses and pruning old ones (Roxin and Fusi, 2013;Remme et al., 2021).Such a mechanism could implement a one-hot time code, where the identity of the activated tag neuron during recall indicates the age of the memory (Figure 5C).
In the Poprate-Age-Tagging model, storage leads to the formation of many synaptic connections to several "tag" neurons (Figure 5D).These connections are pruned at different moments in time.
Therefore, the number of activated tag neurons during recall is indicative of the elapsed duration between storage and recall (Figure 5E).
In contrast to the Context-Tagging model with a timestamp code (Figure 4), the representation of time changes in the Onehot-Age-Tagging model and the Poprate-Age-Tagging model, because the activity pattern of the tag neurons during retrieval depends on the elapsed time since storage.
Age tagging allows simple readout learning of age-dependent behavioral rules, in particular, when the order of pruning synaptic connections is fixed, i.e. whenever it is possible to order pairs of tag neurons  and , such that connections to tag neuron  are consistently lost earlier than simultaneously grown connections to tag neuron .One can even prove (see Equivalence of One-Hot coding and Deterministic Population-Rate coding) that this population rate code leads to the same action selection policy as a model with one-hot coding of "when" information, if the readout connections follow a special synaptic plasticity rule.

Age Organization with Systems Consolidation
Instead of using concatenation, as in tagging models, the content and "when" information could be associated with the (outer) product operation (⊗ in Figure 2).This leads to a class of models with a chronological organization of memories (Friedman, 1993), also known as a shift register in engineering (Howard et al., 2015).
The chronological organization is most obvious, when arbitrarily encoded content information and one-hot encoded age information is associated with the product operation.We call this the Age-Organization model.In this case, the configuration of active neurons during recall of a given event depends on the time of recall (Figure 6), similarly to how suitcases on a conveyor belt change their position relative to a fixed observation point.Because of the product operation, there are multiple groups of neurons that code for content (e.g. the groups Δ 0 , Δ 1 , Δ 2 in Figure 6A), but the neurons in only one of these groups become active during recall of a specific event.The identity of the active group encodes implicitly the age of the memory: if recall happens some time interval Δ 1 after storage, the content neurons in group Δ 1 become active, whereas the neurons in other groups become active during recall at other times (Figure 6A).
Such an age code requires rewiring of synaptic connections.Similarly to the Onehot-Age-Tagging model, this could be mediated by a system's consolidation process, where new synapses are grown to groups of neurons that code for older memories.For example, in consolidation phases during sleep, randomly activated input neurons could trigger recall of past events in neurons connected to the input by an indirect pathway, thereby allowing to learn direct-pathway connections (Figure 6A, cf.parallel pathway theory, Remme et al. (2021)).As a result of synaptic plasticity, the location of the memorized event inside the memory system shifts forward over time (Figure 6A), inducing a "chronological" organization of memories.

Age Organization with Synaptic Pruning
Another instantiation of a chronological organization model arises when considering the product between content and population-rate encoded "when" information (Figure 6B).This model is similar to the Poprate-Age-Tagging model.In both models, many synapses are grown during storage (storage at  1 in Figure 6B) and pruned at different moments in time, such that the age of a recalled memory can be decoded from the identities or numbers of active neurons during recall.The chronological organization across the memory system arises from the fact that synapses onto neurons in the first layer of the memory decay more quickly than those in the last layer (Figure 6B).

Sparse Encoding with Random Synaptic Pruning and Simple Readout
Despite the frequent appearance of the special association schemes "concatenation" and "product" in the literature (⊕ and ⊗ in Figure 2C), it is unclear why brains should favor them over other non-linear mixed association schemes.
Combining sparse random projections for association (RP in Figure 2C) with synaptic delays for storage and recall (Figure 3) and synaptic pruning (Figure 5E and Figure 6B) leads to the Random-Pruning model (Figure 7).During storage, the input triggers distributed activity patterns in the intermediate and content neurons.Because of the random projections, these activity patterns encode the input information implicitly, i.e. the neurons in these groups are not necessarily "tuned" to a single feature, like the redness of an object, but a given neuron may specialize to specific combinations of features and become active, for example, only when a red triangle is shown.
During the storage phase, a Hebbian plasticity rule can initiate the growth of synaptic connections between the intermediate layer and the content neurons (storage in Figure 7).
In the model of Figure 7, the sparse random projection code of the content is combined with a population rate code of the "when" information, similarly to the Poprate-Age-Tagging model (Figure 5E) and the chronological organization model with synaptic pruning (Figure 6B): synapses to the content layer are pruned after random durations that depend on the identity of the postsynaptic neuron, such that more neurons become active when recalling a recent event (recall at  2 in Figure 7), than when recalling an old event (recall at  3 in Figure 7).

Simulations
To highlight advantages and disadvantages of different systems and explore the limitations of purely behavioral experiments as a tool to learn about how brains allow to remember the "when" of past events, we simulated different "what-where-when" memory models.For these simulations we assume that the subjects have to learn the behavioral rule through reinforcement learning, as is  .Learning to take actions that depend only on the age of memories.A The experiment consists of multiple trials with random retrieval intervals Δ retrieval .If the retrieval interval satisfies Δ retrieval ≤ 2, action a 1 is rewarded (+1) and action a 2 is punished (reward -1); reward contingencies are reversed, if the retrieval interval is larger than 2. B The expected reward per trial is measured over 10 3 simulated agents.All models can learn this task, but learning with one-hot codes can be faster than with other codes, because large learning rates can be chosen.The optimal performance (dashed line) was computed by averaging 10 4 agents that make for each interval at most one mistake and always select the correct action afterwards.
typically the case in experiments with animals.
For the Context-Tagging model we assume a fixed preprocessing to a one-hot intermediate representation of the age of a memory (Figure 4C), which is identical to the one-hot representation of time in the Onehot-Age-Tagging model.Because of their similarities, we do not simulate these two models separately and refer to them as Context/Onehot-Tagging model.
Although the discrimination of some models requires recordings of neural or synaptic dynamics, purely behavioral experiments can provide valuable insights.Suppose, for example, that a subject has learned how to respond to recalling a memory with a certain content and age, like performing action a 2 when the event "red triangle" is remembered to have happened Δ train ago (Figure 8A, see also section "Protocols of Simulated Experiments").In a similar task jays learned to avoid food caches containing crickets that they cached 4 days ago (Clayton et al., 2003).If one tests the subject on untrained content-age combinations, for example "blue square" after Δ test (Figure 8A), different representations and associations of content and memory make different predictions.
How a model generalizes depends mostly on the overlap of the recalled memories.For the one-hot coded memories in the Age-Organization model, there is no generalization from training to test settings, because distinct neurons are active during the recall of "red triangle Δ train ago" and "blue square Δ test ago" for any Δ test (light blue curve in Figure 8B).In the Context/Onehot-Tagging model, the tags for "red triangle Δ train ago" and "blue triangle Δ test = Δ train ago" are identical, despite the contents being different, and therefore there is some generalization to other memories of the same age (yellow curve in Figure 8B).Even more generalization occurs with the Poprate-Age-Tagging and the Random-Pruning model, because there is also some overlap in the recalled activity patterns for Δ test ≠ Δ train .
Experiments that probe the learnability of different tasks can provide further evidence in favor or against specific models.Consider a task, where subjects are repeatedly trained to respond with action a 1 if the age of a remembered event is less than some threshold and respond with action a 2 otherwise (Figure 9A, see also section "Protocols of Simulated Experiments").Such a task can be learned with all the models considered here, but the Age-Organization model has potentially an advantage, because the one-hot encoding permits faster learning with higher learning rates .Learning to take actions that depend on the age and the content of memories.A The experiment consists of multiple trials with different retrieval intervals Δ retrieval and objects.If the object is red and the retrieval interval is Δ retrieval = 2 or the object is blue and Δ retrieval = 3, action a 2 is rewarded (+1) and action a 1 is punished (reward -1); reward contingencies are reversed, otherwise.We call the sequence of these four trials one session.B The expected reward per session is measured over 10 3 simulated agents.Because this is an XOR-like task, models with Context/Onehot-Tagging or Poprate-Age-Tagging encoding cannot reach better performance than the best linear model (correct in 3 and wrong in one condition leads to an expected reward of (+3 − 1)/4 = 0.5).The Random-Pruning model eventually learns the task, but it learns slower than the Age-Organization model encoding and sufficiently large learning rate.The optimal performance (dashed line) was computed by averaging 10 2 agents that make for each interval and content at most one mistake and always select the correct action afterwards.than other representations (Figure 9B; learning rates for all models are optimized for best final  (2021).The age of a memory can roughly be estimated by how much the hippocampus is involved and needed for memory retrieval.
Table 1.Examples of episodic-like memory systems.Models above the horizontal line are discussed in the text.Below the horizontal line are two broad classes of episodic-memory models that allow to reconstruct the "when" information of past events, although they do not specifically focus on this aspect.

Discussion
We showed that different choices of neural encoding, reference point of time, content-time associations, retrieval and readout mechanisms lead to a family of models, where the "what", "where" and "when" of events can be stored and retrieved through automatic processes and Hebbian plasticity.
The concrete implementations are idealized "toy"-models, that illustrate the central ideas succinctly.
The considered neural codes (rate, one-hot, distributed and poprate, Figure 2A) are spatial codes, in the sense that all relevant information about the "what", "where" and "when" of an event is given by the activity pattern of a group of neurons in a single time step.This activity pattern could be, for example, the average firing rates of neurons in a time window of 100 milliseconds.In addition, one could consider spatio-temporal codes, where some information is encoded in the temporal evolution of activity patterns.For example, a single neuron could implement a temporal one-hot code, where the information is encoded by the duration between some fixed reference point in time and a spike (time-to-spike code).For spatio-temporal codes, more sophisticated readout and learning mechanisms than the ones in Figure 2E would be needed to extract information from the temporal evolution of activity patterns.With spatio-temporal codes, the already large lower bound of 288 models (see section "Simulations") would further increase and include models with spatio-temporal retrieval (e.g.Jensen and Lisman (2005)).
Memory storage and learning of new tasks in the proposed models rely on Hebbian and neo-Hebbian synaptic plasticity, for which there is ample experimental evidence (Lisman et al., 2011;Gerstner et al., 2018;Roelfsema and Holtmaat, 2018;Kuśmierz et al., 2017;Magee and Grienberger, 2020).An implementation of age representations of time with a rate code could rely on synapses that decay at different rates on a timescale of days to weeks (Abraham, 2003;Statman et al., 2014).
Rewiring of networks in one-hot or distributed encoding of the age of memories is consistent with the observed phenomena of rewiring of connections (Bennett et al., 2018) and systems memory consolidation (Squire et al., 2015;Moscovitch and Gilboa, 2021) and could be achieved with the hypothesised mechanisms of parallel synaptic pathways (Remme et al., 2021) or memory transfer (Roxin and Fusi, 2013).Although there is experimental evidence for all the synaptic processes needed to implement the above models, further experiments should be done to determine which processes are actually used to remember the time of past events by different species.
Chronological organization models can be generalized to include models where the different groups of content neurons are not just copies of one another, but the representations of the memory content in each group differ from on another.For example, a lossy, age organized model, similar to the one described in section "Age Organization with Synaptic Pruning", could store the gist of an event in some groups of neurons together with a detailed representation in other groups of neurons and forget the detailed representation faster than the gist.This could be a simple model of the trace transformation theory (Moscovitch and Gilboa, 2021), which postulates that recall of details requires a functional hippocampus, whereas the gist can be recalled without hippocampus.
On a conceptual level, multiple theories of the processing of "when" information have been proposed.For example, Friedman (1993) described eight theories: strength, chronological organization, time tagging, contextual overlap, encoding perturbation, associative chaining, reconstruction and order codes.The last four theories of this list are beyond the scope of this article.Our work, however, provides concrete hypotheses for neural implementations of the first four theories and discusses their implications on readout of temporal and content information.
The richness of biological phenomena in general, and memory phenomena in particular in combination with the "theory-ladenness" of observations (Kuhn, 1996), allow to find support in experimental data for different theories.Although we focused on long-term memory with behavioral readout, we briefly summarize here known results where storage and retrieval are separated by less than a few minutes, including those with human subjects where the behavior is instructed.
In psychology, there is a huge literature on laboratory-based memory tasks on short timescales (Kahana and Wagner, 2023).Among the most popular computational models to explain these memory experiments are different variants of temporal context models (Howard and Kahana, 2002;Polyn et al., 2009;Howard, 2022;Sederberg and Darby, 2023) (see also Table 1), which keep memories of the recent past with decaying activity traces (temporal context) and learn associations between temporal context and individual memory items with fast synaptic changes.These models can thus be seen as implementing both a rate code of age information and a distributed timestamp code (Table 1).Whereas the decaying activity traces are unlikely to extend to timescales of days or years, there is some experimental support for the contextual overlap theory also for long timescales: for example, recency judgments were found to be context-dependent (Taub et al., 2022); furthermore, the contiguity effect of remembering multiple events jointly, if they happened at similar moments in time, seems to generalize to autobiographical memory and can be explained with temporal context models (Kahana et al., 2022).However, there is also evidence for hippocampus-dependent reconstruction-based theories in humans (Bellmund et al., 2022).Time cells, neurons that fire at successive moments in temporally structured experiences on timescales from a few hundred milliseconds to tens of seconds, were found in region CA1 of the hippocampus of rodents (Eichenbaum, 2014(Eichenbaum, , 2017;;Tsao et al., 2018;Taxidis et al., 2020;Tsao et al., 2022)  Behavioral experiments with California scrub-jays showed convincingly that these birds have a flexible "what-where-when" memory system (Clayton andDickinson, 1998, 1999;Clayton et al., 2001Clayton et al., , 2003Clayton et al., , 2005;;Brea et al., 2023).Although the existing experimental results cannot discriminate between the different models considered here, the observation that they can learn different behavioral rules that depend on the content and age of memories within few trials, speaks in favor of an Age-Organization model, that allows fast and flexible learning.
Remembering the "when" is an idiosyncratic feature of episodic and episodic-like memory.
Thus, revealing the mechanisms that underlie the ability to estimate the age of memories is a crucial step towards a better understanding of episodic memory systems.The different models discussed here can serve as concrete hypotheses and the simulated experiments as inspirations for future experiments that combine behavioral and physiological recordings to learn more about how humans and animals remember the "when" of past events.• distributed code: a one-to-one -tuple-valued function or random vector distr(  ) ≔ ( distr 1 (  ), … , distr  (  ) ) with  > 1 and 0 < distr  (  ) ≤ distr  ′ (  ) for at least one pair  ≠  ′ and at least one  (Figure 2A).

Formal Description of Codes
In addition to these elementary coding schemes, we consider population rate codes, where the value   is encoded by the number of active neurons.An example of a deterministic population rate code is given by distr  (  ) = 1 if  ≤  and distr  (  ) = 0 otherwise (poprate in Figure 2A).
A generalization to continuous variables  (space, time, color, etc.) can easily be found with, e.g.continuous rate coding rate() = , generalized one-hot coding with e.g.radial basis functions rbf  () =  −(−) 2 for a population of neurons with indices  = 1, … ,  or generalized distributed code with e.g.mixtures of radial basis functions.

Formal Description of Association Schemes
Any function  ∶  ×  →  defines an association  (, ) ∈  between elements  ∈  and  ∈ .
If the function  is one-to-one, the association  (, ) keeps the full information about the associated elements  and .

Mathematical Description of the Models
We model brains that observe sensory states   ,  +1 , … and take actions (or decisions)   ,  +1 , … on a slow timescale.These brains have internal neural states   and synaptic connection parameters   that evolve on a faster timescale, indicated by the time index .
For all the models we used the storage and recall mechanism with synaptic delays, described in where  () , the activation function of neurons in group , is applied element-wise to the matrix-vector product of synaptic weight matrix  ( where  is the postsynaptic neuron in group  and  the presynaptic neuron in group ., otherwise,  a learning rate and   ∈ {−1, 1} is the reward obtained after performing action  −1 .This plasticity rule can be seen as a policy gradient (REINFORCE) rule (Williams, 1992), where the terms z Learning to take age-and content-dependent actions (Figure 10)

Figure 1 .
Figure1.A simple setting to study episodic-like memory.We consider discrete sensory streams, like perceiving a red triangle at time  1 and a blue square at time  2 .In all illustrations we use color, shape and time as abstract analogies of the "what", "where" and "when" of specific events, respectively.The time points  1 ,  2 , … are not necessarily equally spaced and may be separated by hours or days.At time  3 , the white triangle first triggers activity in the brain that corresponds to the perception of a white triangle and, second, activity that corresponds to remembering the red triangle, including the information of how long ago the red triangle was perceived.An action is performed upon memory retrieval, like saying "red, two".In the next time step, a new stimulus can be given, together with a reinforcement signal (clapping hands).Our goal is to find neural network dynamics and synaptic plasticity rules that change the connections between neurons (red lines) such that the sensory stream can be remembered and action selection rules that depend on the recalled event can be learned.

Figure 3 .
Figure 3. Example of a Storage and RetrievalMechanism with Synaptic Delays.Signals take more time to travel in long pathways with multiple intermittent synapses than in short pathways, because signal transmission across chemical synapses takes time.Therefore, multiple pathways of different lengths between two groups of neurons can be used to separate storage and retrieval phases.During storage, the input drives the activity in the intermediate layer; the content layer receives input through the direct "input-content" pathway (indicated by yellow arrows).Connections between the intermediate and the content layer (red dashed arrows) are selected for growth with a Hebbian plasticity rule.Shortly thereafter, because of more synaptic delays along the longer pathway, the content neurons receive input through the "input-intermediate-content" pathway, i.e. the intermediate layer is the main input of the content layer (yellow arrow).Already grown connections (red arrows) enable recall of previous events.The actual growth of the connections selected in the storage phase (dashed red arrows) is not instantaneous and does therefore not interfere with the recall phase.Once the content neurons received input through the "input-intermediate-content" pathway, the content layer drives the action selection through weights that implement some learned rule (orange arrows).

Figure 4 .
Figure 4.The Context-Tagging model: Timestamp Tagging with Auto-Associative Retrieval and Complex Readout.A The events "red triangle" and "blue square" were observed at times  1 and  2 , respectively.At time  3 the event "white triangle" is observed.B During storage, the current context (activity in layer "now") drives the "tag" neurons (yellow arrow) such that content and context can be bound together (dotted red lines).C Subsequently, the previously grown synaptic connections (red lines) allow auto-associative recall of the event "red triangle".During recall, the tag neurons are no longer driven by input from the "now" neurons, but, through auto-associative recall (yellow arrow), the activity of the tag neurons encodes the context at time  1 .The comparison of the current context (represented by the "now" neurons) with the recalled context (represented by the "tag" neurons) allows the readout network to estimate the age of the retrieved memory.Consequently, behavioral rules that depend on the age of the recalled memory can be learned (orange weights).

Figure 9
Figure9.Learning to take actions that depend only on the age of memories.A The experiment consists of multiple trials with random retrieval intervals Δ retrieval .If the retrieval interval satisfies Δ retrieval ≤ 2, action a 1 is rewarded (+1) and action a 2 is punished (reward -1); reward contingencies are reversed, if the retrieval interval is larger than 2. B The expected reward per trial is measured over 10 3 simulated agents.All models can learn this task, but learning with one-hot codes can be faster than with other codes, because large learning rates can be chosen.The optimal performance (dashed line) was computed by averaging 10 4 agents that make for each interval at most one mistake and always select the correct action afterwards.
Figure10.Learning to take actions that depend on the age and the content of memories.A The experiment consists of multiple trials with different retrieval intervals Δ retrieval and objects.If the object is red and the retrieval interval is Δ retrieval = 2 or the object is blue and Δ retrieval = 3, action a 2 is rewarded (+1) and action a 1 is punished (reward -1); reward contingencies are reversed, otherwise.We call the sequence of these four trials one session.B The expected reward per session is measured over 10 3 simulated agents.Because this is an XOR-like task, models with Context/Onehot-Tagging or Poprate-Age-Tagging encoding cannot reach better performance than the best linear model (correct in 3 and wrong in one condition leads to an expected reward of (+3 − 1)/4 = 0.5).The Random-Pruning model eventually learns the task, but it learns slower than the Age-Organization model encoding and sufficiently large learning rate.The optimal performance (dashed line) was computed by averaging 10 2 agents that make for each interval and content at most one mistake and always select the correct action afterwards.
and in the entorhinal cortex of macaque monkeys (Bright et al., 2020) (but see Ahmed et al. (2020), for an example where no time cells were found in CA1).Time cells could be seen as evidence for a Context-Tagging model operating on short timescales, but a random spatio-temporal feature model, akin to the Random-Pruning model, would probably lead to similar results.XOR-like experiments, like the one suggested in Figure 10, could potentially be used to distinguish these hypotheses.More relevant for the topic of this paper are experiments involving longer timescales.For example, Rubin et al. (2015) and Mau et al. (2018) recorded over multiple days calcium levels of neurons in the CA1 of mice.Both studies found slow representational drift(Rule et al., 2019) which would be consistent with a timestamp mechanism similar to the Context-Tagging model, but could also be unrelated to the storage and recall of events and depend, for example, on behavioral variability(Sadeh and Clopath, 2022).

Figure 3
Figure 3 and hetero-associative recall.For the tagging models, this implementation differs from the descriptions with auto-associative recall in Figure 4 and Figure 5, but it leads to the same predictions for the behavioral experiments.The internal neural states   are organized into groups of neurons.Tagging models have sensor, intermediate, content, tag and actuator neurons and we write the neural state   =  sensor of group  in the previous time-step  − 1.The activation function of the actuator group is the soft-max function  actuator ()  =    / ∑     .For all other groups of neurons we use the Heaviside function  () ()  = (  − ) = 1 if   >  and (  − ) = 0, otherwise, where bias  = 0 for all groups of neurons except for the content group in the Random-Pruning model, where  = 1.5.The non-zero bias in the Random-Pruning assures sparse activity in the content layer.Action   is sampled with probability  actuator  after recall has happened.Because we used the recall mechanism with synaptic delays (Figure 3) and it takes three time steps for sensory activity to propagate along the "input-intermediate-content-actuator" pathway, there is a simple relationship between the slow timescale indexed by  and the fast timescale indexed by : if  sensor  = onehot(color(  )) ⊕ onehot(shape(  )), action   will be sampled with probability  actuator +3 .The sensory neurons are inactive during propagation of the activity through the neural network, i.e.  sensor +1 =  sensor +2 =  sensor +3 = 0; the sensory neurons are reactivated, once action   has been taken, i.e.  sensor +4 = onehot(color( +1 )) ⊕ onehot(shape( +1 )).Synaptic weight matrices are static or evolve according to one of the following plasticity rules: =   and z actuator , = − actuator , actuator , arise as a consequence of taking the derivative of the logarithm of the soft-max policy function in the derivation of the REINFORCE rule.This plasticity Generalization of actions (Figure8)Learning to take age-dependent actions (Figure9)

Figure 7. Sparse Encoding with Random Synaptic Pruning and Simple Readout. The
Friedman (1993)of the current context with the recalled context may allow a rough estimate of the age of the recalled memory (cf.contextual overlap theory,Friedman (1993)).Because the estimation of age from the comparison of two context-related activity patterns does, in general, not induce a Tagging model, a connection to the first tag neuron is formed during storage of event "red triangle" at time  1 .C Thanks to a process that prunes some synapses and grows new ones, tag neuron 3 is activated during recall at time  3 , indicating how long ago the red triangle was observed.DIn the Poprate-Age-Tagging model, connections to all tag neurons are formed during storage.E These connections are pruned at different moments in time, such that during recall at time  3 , fewer connections are present than at time  1 .The number of active "tag" neurons during recall encodes the elapsed time since storage: if many "tag" neurons are active during recall, the recalled event happened recently and if few "tag" neurons are active, it happened long ago.Simultaneously, the original weights between input and Δ 0 neurons decay, such that after consolidation only the newly grown weights remain.During recall (right), all active neurons in layers Δ 0 , Δ 1 , Δ 2 , … give input to the action neurons (orange connections).B Another age model relies on pruning of synapses at different moments in time, similar to the Poprate-Age-Tagging model (Figure5E).Synapses onto neurons in the first memory layer have a faster decay rate than those onto the last layer.During recall, the number of active neurons across all layers is indicative of the age of a memory: at  1 more "red" and "triangle" neurons are activated than at  3 .
linearly separable problem, a complex readout network with at least one hidden layer is required, to learn arbitrary readout rules.actionFigure 5. Age tagging models.A We consider the same sequence of events as in Figure 4A.B In the Onehot-Age-Figure 6. Examples of Chronological Organization Models.A In this age model, a systems consolidation mechanism shifts the location where a memory is stored.During storage (left), the red connections are strengthened, whereas the feedforward weights (gray dotted arrows) are inactive.During consolidation (middle), input neurons are randomly active and activity is forward propagated (gray arrows; indirect pathway from input to Δ 1 ), such that new, direct-pathway connections (dashed red) can grow between input and Δ 1 neurons.action → input is sparsely and randomly connected to an intermediate and a content layer (gray arrows).During storage, Hebbian plasticity connects co-activated neurons (red arrows)

Generalization of actions that depend on the age of memories, based on a single rewarded trial
. A A subject learns in a binary forced-choice task that action  2 is rewarded (or action  1 is punished), when recalling an event that happened Δ train ago.Training occurs with the same stimulus (red triangle).After training, the subject is tested once with a different stimulus (e.g.blue square) and a retention interval Δ test which may differ from the training interval.B Average probability across 10 4 simulated subjects of taking action  2 as a function of Δ test .For the sparsest code (Age-Organization) there may not be any generalization to other stimuli, even when the test interval is the same as the training interval (blue dot at Δ test = 3).Conversely, for distributed representations (Poprate-Age-Tagging and Random-Pruning) there is generalization to other stimuli and test intervals different than the training interval.Quantitatively the results would be different for other learning rates or other values of the probability of  2 before learning, but qualitatively the results stay the same.