Table of Links
2 Related Work
2.2 Creativity Support Tools for Animation
2.3 Generative Tools for Design
4 Logomotion System and 4.1 Input
4.2 Preprocess Visual Information
4.3 Visually-Grounded Code Synthesis
5.1 Evaluation: Program Repair
7 Discussion and 7.1 Breaking Away from Templates
7.2 Generating Code Around Visuals
2 RELATED WORK
2.1 Program Synthesis
Program synthesis, the formal name for code generation, is the idea that given a high-level specification of a problem, a search space of potential program solutions can be automatically searched to find a provably correct solution [30]. While program synthesis originated in the domain of formal methods and boolean SAT solvers, it has evolved greatly since the introduction of machine learning and large language models.
The state of the art models for code generation include GPT-4, AlphaCode, CodeGEN, Code Llama, and GEMINI [42, 47, 49, 52, 57]. These models generally take in a natural language specification of the problem (e.g. docstrings), test cases, and examples of inputs and outputs. These models have shown remarkable ability at being able to solve complex programming problems at the level of the average human programmer [42]. Prompting for code generation generally differs from traditional prompting interactions, because code has underlying abstract syntactic representations, while natural language prompts can be more declarative and focused on conceptual intent [26]. Converting a user intention into code often involves intermediate representations such as scratchpads [48] and chainof-thought / chain-of-code operations to derive and implement a technical specification [23, 41].
While code generation models have primarily been benchmarked on text-based programming problems (e.g. LeetCode problems), they have also shown to capably handle visual tasks. ViperGPT demonstrated that a code generation model can be used to compose computer vision and logic module functions into code plans that derive answers to visual queries [56]. HCI systems have also shown that code generation models can be integrated within creative workflows and provide interactive assistance [17, 58]. Spellburst demonstrated how LLMs can be purposed to help end users explore creative coding, a form of generative art, by writing prompts in natural language and merging underlying code representations [17]. BlenderGPT is an open-source plugin that allows users to translate a prompt into actions within Blender involving scene creation, shader generation, and rendering [6]. Design2Code recently illustrated that front-end programming can also be generatively created by finetuning code models and applying self-revision prompting [54]. However, Design2Code is currently outperformed by state-of-the-art LLMS (GPT-4V). As in these earlier works, code generation models often compose abstractions from libraries that were written to programmatically create visuals (bpy, CSS, p5.js) [27, 53].
A recent direction within the program synthesis space has been program repair through self-refinement. Program repair refers to automatic approaches for bug fixing, and self-refinement is the idea that LLMs can inspect and edit their code [22]. However, these approaches have generally been focused on text-based tasks and programming problem benchmarks [21, 32]. Our work shows how self-refinement can be extended into the visual domain by detecting visual errors at the layer level and providing image “diffs" that describe the bug for visually-grounded program repair.
2.2 Creativity Support Tools for Animation
Animation is a highly complex creative task. Tools that support it can be as novice-friendly as Google Slides [7] or as steep in learning curve as Adobe After Effects [1] and Autodesk Maya [3]. Animation spans a broad range of creative tasks, from conceptualization (scriptwriting, creating animatics) to asset creation (graphic design and storyboarding) to motion design (particle, primary, and path motion) [35]. Research tools often help users with the end-to-end creation of a target artifact. For example, Katika is an end-to-end tool that helps users create animated explainer videos by converting animation scripts into shot lists and finding relevant graphic assets and motion bundles [34]. Other systems have helped users create animated unit visualizations [20], 3D animations [46], and kinetic illustrations [38] by basing interactions around fundamental animation principles [36]. These principles help maximize the effect of animation by separating out dimensions such as primary and secondary motion, staging, timing, anticipation.
Many approaches focus on the specific task of converting static assets to animated ones by designing ways to define motion. Motion can be derived from a number of places: it can be customized from templates [10, 12], isolated from videos [37], orchestrated through particle and path motion [2, 39], or directed through language-based transformations [20, 43]. Templates and page-level animations are popular in commercial tools such as Adobe Express, Canva, Capcut, and Pinterest Shuffles [4, 5, 8–10], because they allow users to explore a diverse range of animation possibilities while reducing manual effort –users do not have to animate each element independently. Templates for video and animation have been found to be helpful for introducing novice designers to expert patterns within a design space, adding structure to their creative process, and boosting the overall quality of their creations [40, 60].
Authors:
(1) Vivian Liu, Columbia University ([email protected]);
(2) Rubaiat Habib Kazi, Adobe Research ([email protected]);
(3) Li-Yi Wei, Adobe Research ([email protected]);
(4) Matthew Fisher, Adobe Research ([email protected]);
(5) Timothy Langlois, Adobe Research ([email protected]);
(6) Seth Walker, Adobe Research ([email protected]);
(7) Lydia Chilton, Columbia University ([email protected]).
This paper is