MotorNet: a Python toolbox for controlling differentiable biomechanical effectors with artificial neural networks

Artificial neural networks (ANNs) are a powerful class of computational models for unravelling neural mechanisms of brain function. However, for neural control of movement, they currently must be integrated with software simulating biomechanical effectors, leading to limiting impracticalities: (1) researchers must rely on two different platforms and (2) biomechanical effectors are not generally differentiable, constraining researchers to reinforcement learning algorithms despite the existence and potential biological relevance of faster training methods. To address these limitations, we developed MotorNet, an open-source Python toolbox for creating arbitrarily complex, differentiable, and biomechanically realistic effectors that can be trained on user-defined motor tasks using ANNs. MotorNet is designed to meet several goals: ease of installation, ease of use, a high-level user-friendly API, and a modular architecture to allow for flexibility in model building. MotorNet requires no dependencies outside Python, making it easy to get started with. For instance, it allows training ANNs on typically used motor control models such as a two joint, six muscle, planar arm within minutes on a typical desktop computer. MotorNet is built on PyTorch and therefore can implement any network architecture that is possible using the PyTorch framework. Consequently, it will immediately benefit from advances in artificial intelligence through PyTorch updates. Finally, it is open source, enabling users to create and share their own improvements, such as new effector and network architectures or custom task designs. MotorNet’s focus on higher order model and task design will alleviate overhead cost to initiate computational projects for new researchers by providing a standalone, ready-to-go framework, and speed up efforts of established computational teams by enabling a focus on concepts and ideas over implementation.

For neural control of movement, production of theoretical work using ANN models may be viewed as a two-step effort: (1) building a realistic simulation environment that mimics the behaviour of bodily effectors -often called the "physical plant" -and (2) implement the ANN controllers themselves to train on the environment. Many open-source platforms achieve each of these steps individually, such as MuJoCo (Todorov et al., 2012) or OpenSim (Delp et al., 2007;Seth et al., 2018) for building effectors, and JAX, PyTorch or TensorFlow for building and training controller ANNs. However, approaches using these platforms lead to two important impracticalities.
First, the user must rely on two different software platforms, one for the effector and one for the controller. Communication between platforms is not built-in, requiring users to produce custom code to link the ANN controller software with the software implementing the simulation of the physical plant. This forces significant overhead cost to initiate computational projects and creates barriers to research teams who lack the technical background to build those custom pipelines. A current remedy to this issue is gym (Brockman et al., 2016), a Python toolbox that provides an interface between controllers and environments.
However, gym constrains the user to reinforcement learning algorithms (Fujimoto et al., 2018;Lillicrap et al., 2019;Mnih et al., 2015) despite the existence and potential biological relevance of faster training methods such as backpropagation (Lillicrap et al., 2020;Whittington & Bogacz, 2017). The inability to use backpropagation to train controllers represents the second impracticality. To date, this has been circumvented by training separate ANNs such as multi-layer perceptrons or recurrent neural networks (RNNs) as "forward models" approximating the behaviour of effectors that are normally implemented in a separate software package (e.g., Lillicrap & Scott, 2013;Willett et al., 2021). This approach does not address the need for custom pipelines, and remains a slow, cumbersome process when iterating over many different controllers and effectors, because new approximator ANNs must be trained each time.
Solving the issues described above requires both the controller and effector to rely on the same software (no-dependency requirement), and for the effector to allow for backpropagation through itself (differentiability requirement) so that typical gradient-based algorithms may be employed. Ideally, the solution would also be open source, modular for flexibility of coding and focus on ideas, and have reasonable training speeds on commercially available computers.
We developed MotorNet with these principles in mind. MotorNet is a freely available opensource Python toolbox (https://github.com/OlivierCodol/MotorNet) that allows for the training of ANNs to control arbitrarily complex and biomechanically realistic effectors to perform user-defined motor tasks. The toolbox requires no dependency besides standard Python toolboxes available on pip or Anaconda libraries. This greatly facilitates its use on remote computing servers as no third-party software needs to be installed. The effectors are fully differentiable, enabling fast and efficient training of ANNs using standard gradientbased methods. It is designed with ease of installation and ease of use in mind, with a highlevel, documented, and user-friendly application programming interface (API). Its programming architecture is modular to allow for flexibility in model building and task design. Finally, MotorNet is built on TensorFlow, which makes innovation in machine learning readily available for use by MotorNet as they are implemented and released by TensorFlow.

2.1.
Training an ANN to perform a centre-out reaching task against a curl field.
A canonical experimental paradigm in the study of neural control of movement is the centre-out reaching task with a "curl field" applied at the arm's end point by a robot arm (Conditt et al., 1997;Shadmehr & Mussa-Ivaldi, 1994). In this paradigm, visual targets are placed around a central starting position in a horizontal plane. Participants must move the handle of a robot arm from the starting position to the target that appears on a given trial. During the reaching movement, the robot applies forces at the handle that scale linearly with the velocity of the hand and push in a lateral direction. This leads the central nervous system to adapt by modifying neural control signals to muscles to apply opposite forces to counter-act and nullify the lateral forces produced by the robot. Finally, removal of the curl field leaves an opposite after-effect (Shadmehr & Mussa-Ivaldi, 1994). This paradigm is well suited to assess the functionality of MotorNet because it is well understood and extensively documented, and highlights physical, biomechanical, and control properties of human behaviour.
We specified a one-layer RNN composed of 50 gated recurrent units (GRUs; Cho et al., 2014) to control a two degrees of freedom, six muscle planar arm model (arm26; Figure 1a; Kistemaker et al., 2006Kistemaker et al., , 2010. The muscles were rigid-tendon, Hill-type muscle models, with "shoulder" mono-articular flexors/extensors, "elbow" mono-articular flexors/extensors, and a bi-articular pair of muscles producing flexion or extension at both joints (see Methods section 4.1.1 and 4.2.1.).
Training the model above took about 13 minutes on a 2022 Mac Studio with an M1 Max central processing unit (Apple Inc., Cupertino CA, USA). Because the arm26 effector and the centre-out reaching task are particularly common in the motor control literature, they are included in the toolbox as pre-built objects. Consequently, one can recreate the effector instance and task instance in one line of code for each. Note however that users can easily declare their own custom-made effector and task objects if desired by subclassing the base Plant and Task class, respectively (see below for more details on base classes and subclassing).
Including the implementation of the controller RNN and training routine, the above example can be reproduced in 10 lines of code, illustrating the ease of use of MotorNet's API. Once the model is trained, it can produce validation results via a forward pass (Figure 1b-c), which can then be saved and analysed afterwards. The results the model produces include joint and cartesian states (positions, velocities), muscle states (lengths, velocities, activations, contributing forces), musculo-tendon states (lengths, velocities), efferent motor commands (time-varying muscle stimulation) and afferent feedback responses (proprioceptive, visual), as well as any activity states from the network if applicable ( Figure 1c). Note that motor commands are different from muscle activations, in that they are input signals to the Ordinary Differential Equation that produces muscle activation (see Methods; Millard et al., 2013;Thelen, 2003).

Figure 1: Controlling an arm-like effector in a centre-out reaching task with a curl field. (a)
Schematic of the effector and controller. (b) Endpoint trajectories of centre-out reaching movements in a null and curl field, for an RNN controller that is untrained (naive) and then trained to reach in that curl field. The effector was as defined in Kistemaker et al. (2010) (c) Different variables over time during a rightward reaching movement.

Structure of MotorNet
Functionally, a MotorNet model can be viewed as an ANN acting as a controller, that sends motor commands to an effector (the plant), which is actuated accordingly and in turn sends feedback information back to the ANN (Figure 2). This closed-loop cycle repeats for each timestep. By default, "visual" feedback consists of a vector indicating endpoint cartesian coordinates, while "proprioceptive" feedback consists of a 2 -elements vector of muscle length and velocity, with the number of muscles of the effector. Noise may be added in various parts of the model, such as on network activity, on descending motor commands, or on feedback signals. Finally, time delays may be added to feedback signals before they reach the controller ANN.

Running flow
At runtime, a more detailed representation of the information flow best describes how a MotorNet model behaves ( Figure 3a). Models are based on five object classes: Skeleton, Muscle, Plant, Network, and Task objects (Table 1). Each object has its own base class, from which the user can create a custom subclass if desired. MotorNet comes with a set of prebuilt subclasses for each, which implement commonly used computational model formalizations (table 1). Network objects are the entry point of the model (Figure 3a). They take arbitrary initial inputs, which may then be recomputed at each timestep to adjust for dynamic information via crosstalk with the Task object. This step is optional, and can be used for changes that occur online, such as a cursor or a target jump, or a new target appearing in a sequential task. The (potentially recomputed) inputs are then passed to the ANN to perform a forward pass, producing motor commands that are sent down to the Plant object.
Plant objects are essentially wrapper objects that hold the Muscle and Skeleton objects and handle coordination of information flow between them (Figure 3a-b), as well as concomitant numerical integration to ensure numerical stability. They pass the motor commands to the Muscle object, which produces forces in return. The Plant will adjust those forces using geometry-dependent moment arms (see section 2.3 for details) and send the resulting generalized forces to the Skeleton object. These generalized forces will actualize the Skeleton's joint state, which the Skeleton will return to the Plant object alongside the equivalent cartesian state. Finally, the Plant object will return proprioceptive and visual feedback signals to the Network.

Object structure
The classes presented above rely on each other to function correctly. Consequently, they must be declared in a sensible order, so that each object instance retains as attribute the object instances on which they rely. This leads to a hierarchical class structure, where each instance lives in the computer memory in a nested fashion with other instances, as laid out in Figure 3b. Note that this does not mean that each class is a subclass of the class that contains it, but that each contained class is saved as an attribute of the container class. The outermost class is a MotorNetModel, which itself is a subclass of TensorFlow's Model class.

Biomechanical properties of the plant
The modular structure detailed above allows MotorNet to flexibly compute detailed biomechanical properties of Plant objects, such as arbitrary muscle paths (Nijhof & Kouwenhoven, 2000), geometry-dependent moment arms (Murray et al., 1995;Sherman et al., 2013), non-linear muscle activations and passive force production from muscle stretch (Cheng & Scott, 2000;Millard et al., 2013;Thelen, 2003). This enables training ANNs on motor tasks whose dynamics are highly non-linear and close to biological reality. In this section we outline some examples of biomechanical properties displayed by MotorNet effectors.

Assessing moment arms with a simple point-mass plant
The geometrical path -fixation body(s) and fixation point(s) on that body -of each Muscle object can be declared by the user, allowing for arbitrary linkage between muscles and bones (see Methods section 4.3, Nijhof & Kouwenhoven, 2000). Using geometric first principles (Sherman et al., 2013), the Plant object can then calculate the moment arm of forces produced, which is defined for each muscle as the change in value of the degrees of freedom (DoF) of the skeleton for a given change in the muscle's length (Murray et al., 1995;Sherman et al., 2013). In lay terms, this is the capacity of a muscle to produce a torque on a joint based on the muscle's pulling angle on the bones forming that joint. The relationship between pull angle and torque can intuitively be understood using a door as an example: it is easier to push a door when pushing with an angle orthogonal to that door than in a nearparallel angle to that door.
Moment arms generally vary depending on the positional configuration of the Plant. To illustrate this, let us consider a simple case of a point-mass skeleton (one fixation body) with four muscles attached to it in a "X" configuration ( Figure 4a). When the point-mass is positioned in the centre of the workspace space (red position in Figure 4a-b), any muscle pulling will change the position of the point-mass equally in the dimension and in the dimension. Note that and are the DoFs of the point-mass skeleton since they do not have hinge joints. In contrast, if the point-mass is positioned below the central position ( = 0, = −0.9; black position in Figure 4a), a pull from e.g., the lower left muscle will produce a greater change in the dimension than in the dimension because of the different muscle alignment (Figure 4b).
The moment arm can then be calculated for all possible positions in the workspace, as represented by the solid black square in Figure 4a. This can be done for each of the four muscles, and each of the two DoFs, resulting in 8 moment arms ( Figure 4c). We can see that each moment arm forms a slightly bent hyperplane. Importantly, for each hyperplane the diagonal with constant moment arm lines up with the path formed by the muscle when the point mass is at the centre of the workspace. For instance, the moment arm of the upper right muscle is identical when the point-mass is in position ( = 1, = 1) and in position (−1, −1). This is true both with respect to the DoF ( Figure 4c, upper row, leftmost axis) and with respect to the DoF ( Figure 4c, lower row, leftmost axis). Note also that muscles whose shortening leads to an increase in the DoF considered -or inversely whose lengthening leads to a decrease in the DoF -express negative moment arms. For instance, a shortening of the lower right muscle would lead to an increase in the DoF and a decrease in the DoF. Or more plainly, a pull from the lower right muscle would bring the point-mass closer to the lower right corner of the workspace. This leads to the negative moment arm of that muscle with respect to (Figure 4c, upper row) and positive moment arm with respect to (lower row).

Moment arms with a two-joint arm
To consider a more complex plant, we assessed the moment arm of two muscles wrapping around a two-joint arm skeleton. We first assessed a mono-articular muscle, that is, a muscle that spans only one joint -here, the elbow. As expected, the moment arm of that muscle with respect to the shoulder joint is always null (black arrows, Figure 4d) regardless of the joint configuration since the muscle does not span that joint. In contrast, the moment arm with respect to the elbow joint varies as the elbow joint angle changes. Finally, as expected from an extensor muscle, the moment arm is positive, indicating that the elbow angle would decrease as the muscle shortens.
In comparison, a bi-articular muscle's moment arm is non-zero with respect to both joints ( Figure 4e). This also leads the moment arms with respect to each joint to show a small interaction as the other joint's angle changes, as indicated by a slight "bend" in the hyperplane (black arrows, Figure 4e). Finally, as expected for a bi-articular flexor muscle, the moment arms are negative with respect to both joints, indicating that muscle shortening would result in an increase in joint angle.

Passive drift with Hill-type muscles
Finally, we assessed the positional drift induced by passive forces of Hill-type muscle models (Millard et al., 2013;Thelen, 2003) in an arm26 plant model. We initialized the model's starting position at fixed intervals across the range of possible joint angles, resulting in a grid of 21-by-21 possible starts. We then simulated the plant with null inputs for 200 ms and plotted the drift in the arm's endpoint position from its original position. Because the model received no input, all forces produced are due to the passive component of the Hill-type muscles, which occurs when the muscle is stretched beyond its slack length (Cheng & Scott, 2000;Millard et al., 2013;Thelen, 2003). We can see that drift is negligible at the centre of the joint space but starts to increase toward the edge (Figure 4f), indicating that the associated joint configurations lead to overstretched muscle lengths and resulting in passive force production. Note that since this phenomenon is dependent on the slack length value of each muscle, which is user-defined, the presence of passive drift is dependent on the user's modelling choices.

Training ANNs to produce naturalistic behaviour
Now that we can implement biomechanically complex plants, we next assessed whether a controller ANN can learn a control policy to move those plants using backpropagation (Jordan & Rumelhart, 1992;Rumelhart et al., 1986). A typical way to ensure the computation learnt by an ANN is functionally meaningful is to test its out-of-distribution generalization. To assess this, we trained a 1-layer RNN with = 110 GRUs controlling an arm26 model to perform reaching movements in 0.8 sec simulations using the following paradigm. Starting positions and targets were randomly drawn from a uniform distribution across the full joint space. Movements were to be delayed until the occurrence of a visual "go" cue randomly drawn from a uniform distribution spanning the full simulation window. The appearance of the go cue reached the RNN as input after a delay corresponding to the visual feedback delay, which was set at = 50 ms (Figure 2; Dimitriou et al., 2013;Pruszynski et al., 2010). In half of trials, no go cue was provided (catch trial), in which case the task effectively reduced to a postural control task. A 100 ms endpoint mechanical perturbation, whose orientation, magnitude, and time were also randomly drawn occurred in half of trials, independently of whether the trial was a catch trial or not. Importantly, the perturbation magnitude was drawn from a uniform distribution ranging between 0 and 4 N. If the perturbation occurred during a catch trial, the distribution ranged between 0 and 8 N. The network was trained using the following loss: With the global loss including a kernel regularization term with penalty coefficient = 10 −6 , and the kernel weight matrix of the RNN's hidden layer. The operators ‖ · ‖ 1 and ‖ · ‖ 2 indicate the L1 and L2 vector norm, respectively. is the instantaneous loss at time , with coefficients = 2, = 5, = 0.1. is the positional penalty at time , with , * the position and desired position (target) vector, respectively, and = 0.01 the target radius. is the muscle activation penalty at time , with , two vectors representing muscle activations at time and maximum isometric force, respectively. Finally, ℎ is the network hidden activity penalty at time , with the -elements vector of GRU hidden activity, ̇ its time derivative, and κ = 0.05. While superficially this loss appears complex, a direct relationship to biology can be drawn for all terms. Essentially, this loss enforces the control policy to be learned using a simple, straightforward rule ("get to the target"), while promoting low metabolic cost from network input connectivity (cost on kernel norm), from the muscles (cost on activation, scaled by muscle strength), and from network activity (cost on hidden activity and its derivative to discourage oscillatory regimes).
Behavioural performance on a training set can be seen in Figure 5a, with trials with a large perturbation (> 3 ) highlighted in blue. This illustrates the rich variability of the training set, encouraging the RNN to learn computationally potent and generalizable solutions to the control problem given the sensorimotor feedback provided (Figure 2). Despite this variability, the loss value decreased smoothly (Figure 5b). We tested the model's behavioural output in 0.8 sec simulations with a centre-out reaching task. Eight targets were positioned in 45 degrees increments and 10 cm away from a starting position corresponding to a shoulder and elbow angle of 45 and 90 degrees, respectively (Figure 5c). The RNN reached to each of these targets following a visual go cue at 100 ms. 70 ms after the "go" cue was "perceived" (i.e., 70 ms plus the visual feedback delay), a mechanical perturbation was applied at the arm's endpoint and orthogonally to the reaching direction. This perturbation could be either within-distribution (±3 N) or out-ofdistribution (±6 N) or null (no perturbation). In all cases, the RNN could correct for the mechanical perturbation, reach to the target, and stabilize ( Figure 5c).
Next, we tested the RNN in a postural control task, where it had to bring the arm's endpoint back to the target following a mechanical perturbation (Pruszynski et al., 2014). No go cue was provided. We applied perturbations in either of the four cardinal directions (0°, 90°, 180°, 270°) at 170 ms plus visual delay after the trial started. Again, the set of perturbations for testing outputs included within-distribution magnitudes (±6 N) and out-of-distribution magnitudes (±12 N). In all cases, the RNN could integrate the sensorimotor information to bring the arm's endpoint back into the target (Figure 5d). Interestingly, in some cases this led to an oscillatory trajectory (e.g., for a rightward +12 N perturbation, Figure 5d), indicating that perturbations beyond a given magnitude remain increasingly challenging to control for.
Finally, we compared muscle activations for an upward reach with no perturbation to that of an identical reach with a -6 N perturbation (Figure 5e). We can see that muscle activations are similar before the occurrence of the perturbation, and remain similar immediately after, indicating a time delay in the response. The fastest responses occurred for the bi-articular muscles and the shoulder extensor muscle. Other muscles, particularly the shoulder flexor, showed very delayed or non-existent changes in muscle activation. This illustrates that the RNN's response to a perturbation is not a mere stimulus-driven reactive response, but an integrated response that can delay or withhold the production of counteracting forces if necessary. Note that for the non-perturbed movement (black line in Figure 5e), we can observe the canonical tri-phasic muscle activation pattern reported in empirical studies (Wierzbicka et al., 1986).
To assess how the existence of sensorimotor feedback impacted the control policy acquired by the controller network, we trained a second, identical network to perform the same task but with no mechanical perturbation during training (perturbation-free). Interestingly, following the same amount of training, the model with a perturbation-free network can handle perturbations during reaching relatively well, even up to ±6 N (Figure 5f). We can compare muscle activations for an upward reach with a -6 N perturbation to that of the same movement in the network trained with perturbations ( Figure 5f, green versus blue lines). Even though kinematics appeared superficially similar (Figure 5c, f), this comparison shows that muscle activations tend to differ in response to a perturbation (Figure 5e), suggesting that the perturbation-free network might learn a slightly different control policy. Testing the perturbation-free network on the postural task shown in Figure 5d emphasizes this difference (Figure 5g). The perturbation-free network is much less capable of stabilizing against the forces than its perturbation-trained counterpart.
Therefore, even though the mere existence of a sensorimotor feedback input can help handle simple perturbations (Figure 5f), exposing the model to perturbations during training does provide the network with additional information to learn a more robust control policy. Overall, these simulations show that MotorNet can train ANNs to reliably find a control policy for the plant. Importantly, the resulting networks learn generalizable control policies that integrate sensorimotor feedback into its computation. This also illustrates the importance of the training procedure to which the network is exposed to produce these control policies (Driscoll et al., 2022).

Plant Geometry Defines Preference Distribution of Firing Rates: A Replication Study
Finally, to assess MotorNet's capacity to replicate established results in the literature, we sought to reproduce key observations from Lillicrap and Scott (2013). In that study, the authors show that training an RNN to perform a simple centre-out reaching task using an arm model similar to the arm26 in Figure 1a results in the RNN neurons displaying a preferential movement direction (PMD) where they are more likely to fire. The distribution of PMDs was asymmetrical, with a greater proportion of neurons firing for reaches around 135 degrees and 325 degrees, matching empirical observations from non-human primate electrophysiological recordings in the primary motor cortex (Scott et al., 2001). Next, they showed that this asymmetrical representation of PMDs during reaching movements did not occur when RNNs were trained to control a plant that lacked the geometrical properties of an arm. Specifically, they compared the PMD distribution of RNN neurons controlling a point-mass (no geometry) against that of an arm26 (geometry present). We sought to reproduce the two results outlined above. First, we trained an RNN composed of 90 GRUs in a single layer to control for an arm26 (Figure 6a see Methods section 4.5.). Because our RNN employs GRUs instead of a multi-layer perceptron, 90 units were sufficient to efficiently train the network to perform the task, as opposed to up to 1000 perceptron nodes in the original study. We also increased the number of targets from 8 to 24 to obtain a finer resolution over movement direction in our analyses (Figure 6b).

Figure 6: The distribution of preferential movement direction tuning is sensitive to the geometry of the plant. (a) Schematic of the two models compared. The RNNs and their architecture were identical, but the plant differed, with one RNN controlling a two-joint arm26 (left) and the other controlling a point-mass (right). (b) Centre-out reaching trajectories to 24 targets for the arm26 (left) and point-mass (right) model. (c) Distribution of preferential movement directions (PMDs) for the arm26 (left) and point-mass (right) model. The PMDs were determined by regression of each GRU's hidden activity averaged over time against reach angle (see Methods for details). (d) Normalized muscle activations across reaching angles and for the 300 ms following the "go" cue for the arm26 model. (e) Normalized β coefficients of the regression models used for (c). The GRUs were ordered
Following training, we first ensured that muscle activation patterns in the arm26 plant were like those reported in the original study (Figure 6d). Regarding network activity, we observed a great variety of activation patterns over movement direction (Figure 6f). Some GRUs showed a preference for timing (e.g., neuron A4, C5), while others showed a strong preference for reaching direction that was sustained over time (neuron C3, A2). Finally, most neurons showed a mixed preference for encoding time and reaching direction (neuron C8, A8). This heterogeneous set of responses matches empirical observations in non-human primate primary motor cortex recordings (Churchland & Shenoy, 2007;Michaels et al., 2016).
We then assessed each GRU's PMD using linear regression (see methods) and sorted them based on their PMD before plotting the tuning curve of each neuron. The resulting colormap (Figure 6e, left panel) yields a "ridge" of maximal activity whose peak varies across reach angle, forming a crooked line, illustrating a representational bias. This crooked ridge line was not observed in an RNN trained to control for a point-mass plant instead using an identical training procedure and analysis (Figure 6e, right panel). We replicated this procedure with 7 more RNNs for each model, resulting in a total of 8 RNNs trained on an arm26 and 8 RNNs trained on a point-mass. We determined each GRU's PMD and averaged the resulting polar histogram across each RNN (Figure 6c). The same bias was reproduced invariably for the RNNs controlling an arm26 plant, while it failed to arise for those controlling a point-mass. Therefore, these results mimic the observations made in the original study (Lillicrap & Scott, 2013), specifically, that RNNs controlling a plant with no arm-like geometrical properties will not result in the biased PMD representation during reaching movements commonly observed in non-human primate electrophysiological studies (Scott & Kalaska, 1997).

Iterating quickly through the model development cycle
In the field of machine learning, an established best practice is to iterate quickly around a cycle of (1) formulating an idea, (2) implementing that idea in functionally efficient code, and (3) testing the idea through running the simulations. The results of the simulations can then be leveraged to adjust the idea, thus closing the loop, and enabling iterative refinement of a model. This [idea → code → test → idea] cycle is reminiscent of the [hypothesis → design task → test → hypothesis] cycle in empirical work, also known as the hypothetico-deductive method. An important practice in ML is to ensure that one iteration of that cycle is quick enough, because producing an efficient model may require many such iterations. Based on this framework, a way to view MotorNet is as a means to improve iteration speed through this cycle. The modular architecture of MotorNet enables users to alter aspects of the plant model while keeping everything else identical, and TensorFlow's ability to do the same at the controller level is preserved. Therefore, user capacity to proceed through the "implementation" step is enhanced.

Expandability
MotorNet naturally allows users to create and tune objects to fit individual requirements. This makes the toolbox easily expandable to add novel models that are not pre-built in the original distribution. This flexibility will likely vary depending on the goal (Figure 7). Some extensions only require adjusting parameter values of existing object classes, such as editing the Arm26 Skeleton class to match the arm of a non-human primate. Other extensions will require subclassing, such as creating a Plant for an eyeball, which might require special geometric properties building on the point-mass Skeleton object (Table 1). Conversely, Plants that stray away from typical vertebrate effectors will likely prove more challenging, such as an octopus arm, because they do not rely on bones. Importantly, while all these extensions vary in the difficulty of their implementation, each has the capacity to fit and work harmoniously within the framework of the MotorNet architecture.

Open source
Typically, when motor control researchers want to create canonical models, they must implement their own version of said model based on methodological descriptions of previously published scientific articles. However, because MotorNet is open source, individual contributions can easily be shared online for the benefit of others. For instance, if a researcher creates a muscle class with a parametrizable pennation angle (Millard et al., 2013;Thelen, 2003), future researchers and team will not have to re-create their own implementation of the same object anymore. This also allows more dynamical peerchecking, avoiding dissemination of errors and improving consistency of model implementations. In other words, MotorNet will be able to benefit from community driven incremental work through open-source practices.

Innovation scalability
For the past several years, ML has been standing out as one of the most dynamic research fields, achieving breakthroughs and successfully scaling innovative work toward solving everyday problems. It would be challenging for MotorNet to keep up with the pace of ML innovation to provide users with implementations of the latest architectures and algorithms. Rather, we rely on Google's TensorFlow to build our own controllers. This ensures that any innovation in model design quickly finds its way to a viable MotorNet implementation, because TensorFlow capabilities allow for fast adaptation aligned with progress in ML. Generally, MotorNet is built with the following logic in mind: anything TensorFlow can build, MotorNet should be able to use as a controller.

Collision physics
Typical biomechanical software distributions implement some form of collision physics in their physics engine (Delp et al., 2007;Seth et al., 2018;Todorov et al., 2012). This is not the case for MotorNet.

Complex biomechanical features
Some biomechanical software distributions such as OpenSim propose a large array of joint types such as hinge joints or rotational joints, and complex muscle paths such as wrap points that trigger only when the muscle collides with them (Delp et al., 2007;Seth et al., 2010Seth et al., , 2018. While these features increase the realism of a biomechanical model, MotorNet does not yet implement these types of features. In practice, this constrains what types of Plants MotorNet can realistically implement and adding some of these features is under consideration.

Future considerations
As an open-source, freely available Python toolbox, MotorNet is subject to change over time. Some of the limitations outlined above are considered as future routes for improvement. Additionally, we hope that individual contributions will help refine and extend the capabilities of the toolbox as well. In this section we outline prospective improvements for implementation and release in the main distribution.

Spinal Compartment
It is becoming increasingly evident that spinal contribution plays a prominent role in motor control beyond the typically considered spinal reflex (Reschechtko & Pruszynski, 2020;Weiler et al., 2019). To an extent, one may consider that supraspinal control integrates the spinal contribution to define a motor control policy (Loeb, 2021). Within MotorNet, this suggests that a controller's dynamics will be significantly impacted by the presence of a spinal compartment acting as an interface with the Plant. Consequently, it may be worthwhile to implement one such spinal compartment to explore the consequences of such biological design (Cisek, 2019).

Modular controllers
A deeply established idea in neuroscience is that distinct regions will perform different computations, and thus that a complex system may not be considered as a uniform, fully connected network (Abbott & Svoboda, 2020;Keeley et al., 2020;Pesaran et al., 2021;Semedo et al., 2020). This is also true for the motor control system, where using a modular network architecture with controlled communication between each module has been shown to have more explanatory power than a non-modular system (Michaels et al., 2020). Therefore, a potential development for MotorNet is to include a model class with a modular architecture to study how cross-regions networks work to enable neural control of the body.

Muscle models
Most published work in motor control relies either on Hill-type muscle models (Bhushan & Shadmehr, 1999;Kistemaker et al., 2006Kistemaker et al., , 2010Nijhof & Kouwenhoven, 2000) or direct torque actuators (Lillicrap & Scott, 2013) similar to the ReLu muscle that MotorNet provides. However, despite its popularity, even the more-detailed Hill-type muscle remains a phenomenological model of real muscle behaviour, which can easily show its limits when trying to understand how the brain controls movement (Blum et al., 2020). Alternative muscle model formalizations exist, such as the Distribution-Moment muscle model (Zahalak, 1981), which may be worth implementing within MotorNet as well.

General modelling design
This section describes modelling elements that were used for several models in this study. For all models, the timestep size was 0.01 sec, and a proprioceptive delay of = 20 ms and visual delay of = 50 ms were used (Figure 2). Plants were actuated using numerical integration with the Euler method.

Arm26 model
The arm26 model used in this study is available online on the open-source toolbox code under the RigidTendonArm26 plant class. It is briefly described below for convenience.
The skeleton was actuated by six rigid-tendon versions of Hill-type muscle actuators: a shoulder flexor, a shoulder extensor, an elbow flexor, an elbow extensor, a bi-articular flexor, and a bi-articular extensor. Their parameter values are defined in table 3.  Table 3: parameters for the Hill-type muscle actuators used in the arm26, taken from Kistemaker et al. (2010).

Maximum isometric force (N) Tendon length (m) Optimal muscle length (m)
The full formalization of the Hill-type muscles can be found in Thelen (2003) equations 1-7, and with the parameter values used in that study. When different parameters were provided for young and old subjects, the values for young subjects were used (Thelen, 2003, table 1) While in custom-made Plant objects the moment arms of each muscle are computed based on geometric first principles (Figure 4d-f; Sherman et al., 2013), in the RigidTendonArm26 class the moment arms are approximated as described in Kistemaker et al. (2010), equations A10-A12, with parameters for this study defined in  Table 4: parameters used to compute moment arms in the arm26 models with moment arm approximation, taken from Kistemaker et al. (2010).

Point-mass model
The point-mass model used in this study is available online on the open-source toolbox code under the ReluPointMass24 plant class. It is briefly described below for convenience.
The point-mass had a mass of = 1 kg. Its actuation followed an ordinary differential equation such that ̈= ⁄ with ̈, the 2-elements cartesian acceleration vector at time and the 2-elements force vector applied at time , respectively.
The forces were produced by four linear muscle actuators, whose formalization is available online on the open-source toolbox code under the ReluMuscle muscle class. Each muscle's force production is a linear piecewise function of its activation , scaled by its maximum isometric force = 500 N: The activation function was the same as for the Hill-type muscles used in the arm26 model, and can be found in Thelen (2003), equations 1-2.
The four muscles were fixed to the point-mass in a "X" configuration (Figures 4a, 6a) with the first fixation point for the upper right, lower right, lower left, and upper left muscle being respectively ( = 2, = 2), (2, −2), (−2, −2), (−2, 2). The second fixation point of each muscle was on the point-mass, therefore moving in general coordinates alongside the point-mass (Figure 4a).

Network architecture
All network controllers used in this study consisted of one layer of GRUs with a sigmoid recurrent activation function and a ℎ activation function. Kernel and recurrent weights were initialized using Glorot initialization (Glorot & Bengio, 2010) and orthogonal initialization (Hu et al., 2020), respectively. Biases were initialized at 0.
The GRU layer was fully connected to an output layer of perceptron nodes with a sigmoid activation function. The output layer contains one node per descending motor command, or equivalently one node per muscle in the plant. The output layer's kernel weights were initialized using a random normal distribution with a standard deviation of 0.003, and its bias was initialized at a constant value of -5. Because the output activation function is a sigmoid, this initial bias forces the output of the controller to be close to 0 at the start of initialization, ensuring a stable initialization state.
For all networks used in this study, a 2-elements vector of ( , ) cartesian coordinates for the start position and target position were provided as input, alongside a go-cue, resulting in a 5-element input vector. The go-cue was a "step" signal whose value changed from 1 to 0 when the movement should be initiated.

General training design
During training, the models reached from a starting position drawn from a random uniform distribution across the full joint space to a target position drawn from a random uniform distribution as well. The occurrence time of the go-cue was drawn from a random uniform distribution across the full simulation duration. In 50% of simulations, no go-cue was provided (i.e., a catch trial) to ensure the network learnt to wait for the go-cue and avoided any anticipatory activity. The desired position * was set to be the start position until the go cue was provided, at which point * was defined as the target position. Note that the go-cue was treated as a visual signal. Therefore, while the desired position * was updated immediately as the go-cue was provided (with no time delay), the network was informed of the go-cue occurrence via a change in the target position input and go-cue input only following the visual feedback delay . Depending on the models, additional training manipulations were also applied, as described in the sections below.

4.2.
Centre-out reaches task against a curl field

Model
The plant used to learn to reach against a curl field was an arm26 model as described in section 4.1.1. The controller was as described in section 4.1.3., with the GRU layer containing = 50 units.

Training
The model was trained according to the procedure in section 4.1.4. with the loss described in eq. 1, using a kernel regularization = 10 −6 , coefficients = 2, = 5, = 0.1, κ = 0.05, and target radius = 0.01 m. The model was trained on 7680 batches with a batch size of 64, on simulations of 1 sec.
The model was trained according to section 4.1.4., except that the go-cue time was fixed at 100 ms from the start of the simulation. Following initial training, the model was then tested as described in the next section to produce the "naïve" behaviour shown in Figure 1b-c. Following testing, training was then resumed, but employing the curl-field, fixed starting position, and set of 8 targets used in testing. 50% simulations were still catch trials, as in the initial training session. This second training session lasted 768 batches with a batch size of 64. Finally, following this second training session, the model was tested again, to produce the "adapted" behaviour of Figure 1b.

Testing
The model was tested in 1 sec simulations against a null field, and against external forces applied at the arm's endpoint that produced a counter-clockwise curl field: With ̇ the 2-elements cartesian velocity vector at time , and = 8 a scalar defining the strength of the curl field. In the null field, we have = 0.
The testing procedure consisted of 8 centre-out reaches from a fixed starting position at a shoulder and elbow angle of 45° and 90°, respectively, to 8 target positions 10 cm away and distributed in increments of 45° around the starting position ( Figure 1b). This set of simulations were repeated against a null field and against the curl field in eq. 2, resulting in a total of 16 reaches. For all testing simulations, the go-cue time was fixed at 100 ms from the start of the simulation and no catch trials were employed.

Biomechanical properties of the plant
The point-mass model used was as described in section 4.1.2. The arm26 model used was as described in section 4.1.1., except that the moment arms were not approximated based on the parameters of table 4, but computed based on the geometry of the muscle paths (Nijhof & Kouwenhoven, 2000;Seth et al., 2010;Sherman et al., 2013). Accordingly, the muscle paths were manually declared by defining how many fixation points each muscle has, and on which bone and where on each bone each point fixes.
MotorNet handles declaration of these paths using a relative reference frame for each fixation point (Seth et al., 2010). Specifically, a fixation point on a bone will have two coordinates. The first coordinate defines how far along the bone the point is, from the bone's origin, e.g., the shoulder for the upper arm (Figure 8). The second coordinate defines how far the point deviates from the bone orthogonally. If the fixation point is an anchor point, that is, it is not fixed on a bone but on the world space, then general coordinates ( , ) are used (color-coded in green in Figure 8). These anchor points are important to ensure that the plant can be actuated with respect to the environment. The full set of coordinates defining the model's muscle paths are indicated in table 5 and are derived from Nijhof & Kouwenhoven (2000).

4.4.
Training ANNs to produce naturalistic behaviour

Model
The two models used to produce Figure 5 were arm26 models as described in section 4.1.1. For both models, the controller was as described in section 4.1.3., with the GRU layer containing = 110 units. In addition, excitation and GRU hidden activity noise were added, with values = 10 −3 , ℎ = 10 −4 , respectively.
In one of the two models, which we refer to as the "perturbation-free" model, the training procedure was as described in section 4.1.4. In the second model, which we refer to as the "perturbation-trained" model, a 100 ms endpoint mechanical perturbation was added to the training procedure. The perturbation occurred in 50% of trials, independently of whether the trial was a catch trial or not, and its orientation and time were randomly drawn as well. The magnitude of the perturbation was drawn from a uniform distribution ranging between 0 and 4 N. If the perturbation occurred during a catch trial, the distribution ranged between 0 and 8 N.

Testing
Both the perturbation-trained and perturbation-free models were tested in 800 ms simulations in two distinct tasks, a centre-out reaching task and a postural task.
In the centre-out reaching task, 8 targets were positioned in 45 degrees increments and 10 cm away from a starting position corresponding to a shoulder and elbow angle of 45° and 90°, respectively (Figure 5c, g). The visual go cue was provided at 100 ms following the simulation start. 70 ms after the go-cue was "perceived" (i.e., 70 ms plus the visual feedback delay ), a mechanical perturbation was applied at the arm's endpoint and orthogonally to the reaching direction. This perturbation could be either within-distribution (±3 N) or out-ofdistribution (±6 N) or null (no perturbation).
In the postural control task, no go cue was provided, and the arm's endpoint was pushed away from the start position by the mechanical perturbation at 170 ms plus visual delay after the simulation started. We applied perturbations in either of the four cardinal directions (0°, 90°, 180°, 270°). Again, the set of perturbations for testing outputs included within-distribution magnitudes (±6 N) and out-of-distribution magnitudes (±12 N).

Models
All arm26 and point-mass plants used to produce Figure 5 were as described in section 4.1.1. and 4.1.2., respectively. For all models, the controller was as described in section 4.1.3., with the GRU layer containing = 90 units.

Training
All models were trained with the loss described in eq. 1, using a kernel regularization = 10 −6 , coefficients = 2, = 5, = 0.1, κ = 0.05, and target radius = 0. The models were trained on 38,400 batches of size 64, on simulations of 800 ms. The training procedure was as described in section 4.1.4.

Testing
The testing procedure consisted of 8 centre-out reaches in 800 ms simulations. Simulations started from a fixed position at a shoulder and elbow angle of 45° and 90° for the arm26 models, and at an ( = 0, = 0) cartesian position for the point-mass models. Reaches were to 24 target positions 10 cm away and distributed in increments of 15° around the starting position (Figure 6b). For all testing simulations, the go-cue time was fixed at 100 ms into the simulation and no catch trials were employed.

Analysis
To obtain the preferential movement direction of each GRU, we averaged each unit's hidden activity in a 150 ms time window starting when the go cue was input to the network (i.e., following visual feedback delay ) for each reaching direction independently, and regressed that average to a diagonal design matrix encoding the reach direction. The absolute value of the resulting regression coefficients was then normalized between 0 and 1, and neurons were sorted according to these normalized coefficients to produce Figure 6e.
As mentioned in the results section, we trained 8 networks to control an arm26 and 8 networks to control a point-mass. For each network, we took the count of GRUs whose normalized regression coefficient is maximal for each target considered and averaged that count across all 8 networks to produce Figure 6c.

Contributions
OC, JAM, and PLG conceptualized the toolbox; OC implemented the toolbox; OC, JAM, MK, JAP, and PLG tested the toolbox and designed the simulation paradigms; OC and PLG performed the simulations; OC, JAM, MK, JAP, and PLG interpreted the results; OC made the figures, wrote the first draft, and wrote the online documentation; OC, JAM, MK, JAP, and PLG edited and approved the final version of the manuscript.