A Deep Learning Framework for Character Motion Synthesis and Editing Daniel Holden∗ University of Edinburgh Jun Saito† Marza Animation Planet Taku Komura‡ University of Edinburgh Figure 1: Our framework allows the animator to synthesize character movements automatically from given trajectories. Abstract We present a framework to synthesize character movements based on high level parameters, such that the produced movements re- spect the manifold of human motion, trained on a large motion cap- ture dataset. The learned motion manifold, which is represented by the hidden units of a convolutional autoencoder, represents motion data in sparse components which can be combined to produce a wide range of complex movements. To map from high level param- eters to the motion manifold, we stack a deep feedforward neural network on top of the trained autoencoder. This network is trained to produce realistic motion sequences from parameters such as a curve over the terrain that the character should follow, or a target location for punching and kicking. The feedforward control net- work and the motion manifold are trained independently, allowing the user to easily switch between feedforward networks according to the desired interface, without re-training the motion manifold. Once motion is generated it can be edited by performing optimiza- tion in the space of the motion manifold. This allows for imposing kinematic constraints, or transforming the style of the motion, while ensuring the edited motion remains natural. As a result, the system can produce smooth, high quality motion sequences without any manual pre-processing of the training data. Keywords: deep learning, convolutional neural networks, autoen- coder, human motion, character animation, manifold learning Concepts: •Computing methodologies → Motion capture; 1 Introduction Data-driven motion synthesis allows animators to produce con- vincing character movements from high level parameters. Such approaches greatly help animation production as animators only ∗email:s0822954@sms.ed.ac.uk † email:saito@marza.com ‡ email:tkomura@ed.ac.uk Publication rights licensed to ACM. ACM acknowledges that this contribu- tion was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only. SIGGRAPH ’16 Technical Paper, July 24 - 28, 2016, Anaheim, CA, ISBN: 978-1-4503-4279-7/16/07 DOI: http://dx.doi.org/10.1145/2897824.2925975 need to provide high level instructions rather than low level details through keyframes. Various techniques that make use of large mo- tion capture datasets and machine learning to parameterize motion have been proposed in computer animation. Most data-driven approaches currently available require a signifi- cant amount of manual data preprocessing, including motion seg- mentation, alignment, and labeling. A mistake at any stage can easily result in a failure of the final animation. Such preprocess- ing is therefore usually carefully performed through a significant amount of human intervention, making sure the output movements appear smooth and natural. This makes full automation difficult and so often these systems require dedicated technical developers to maintain. In this paper, we propose a model of animation synthesis and edit- ing based on a deep learning framework, which can automatically learn an embedding of motion data in a non-linear manifold using a large set of human motion data with no manual data preprocessing or human intervention. We train a convolutional autoencoder on a large motion database such that it can reproduce the motion data given as input, as well as synthesize novel motion via interpola- tion. This unsupervised non-linear manifold learning process does not require any motion segmentation or alignment which makes the process significantly easier than previous approaches. On top of this autoencoder we stack another feedforward neural network that maps high level parameters to low level human motion, as repre- sented by the hidden units of the autoencoder. With this, users can easily produce realistic human motion sequences from intuitive in- puts such as a curve over some terrain that the character should fol- low, or the trajectory of the end effectors for punching and kicking. As the feedforward control network and the motion manifold are trained independently, users can easily swap and re-train the feed- forward network according to the desired interface. Our approach is also inherently parallel, which makes it very fast to compute and a good fit for mainstream animation packages. We also propose techniques to edit the motion data in the space of the motion manifold. The hidden units of the convolutional au- toencoder represent the motion in a sparse and continuous fashion, such that adjusting the data in this space preserves the naturalness and smoothness of the motion, while still allowing complex move- ments of the body to be reproduced. One demonstrative example of this editing is to combine the style of one motion with the timing of another by minimizing the difference in the Gram matrices of the hidden units of the synthesized motion and that of the reference