Table of Links
2 Related Work
2.2 Creativity Support Tools for Animation
2.3 Generative Tools for Design
4 Logomotion System and 4.1 Input
4.2 Preprocess Visual Information
4.3 Visually-Grounded Code Synthesis
5.1 Evaluation: Program Repair
7 Discussion and 7.1 Breaking Away from Templates
7.2 Generating Code Around Visuals
4 LOGOMOTION SYSTEM
We present LogoMotion, a LLM-based method that automatically animates logos based on their content. The input is a static PDF document which can consist of image and text layers. The output is an HTML page with JavaScript code that renders the animation. The pipeline has three steps: 1) preprocessing (for visual awareness), which represents the input in HTML and augments it with information about hierarchy, groupings, and descriptions of every element, 2) visually grounded code generation, which takes the preprocessed HTML representation and the static image of the logo and outputs JavaScript animation code, and 3) visually-grounded program repair, which compares the last frame of the animation to the target image and does LLM-based self-refinement if there are visual errors on any layer.
4.1 Input
A user begins by importing their PDF document into Illustrator. Within Illustrator, using ExtendScript, they can export their layered document into an HTML page. We use HTML as a fundamental representation to suit the strengths of an LLM and construct a text representation of the canvas. The HTML representation includes the height, width, z-index and top- and bottom- positions of every image element. Text elements are represented as image layers. Each word is captured as a separate image layer, and its text content is the alt text caption, except in the case of arced text (e.g. logo title in Figure 1), where each letter is a separate image layer. Every element is given a random unique ID. This representation allows the LLM to understand what layers make up the logo image.
The ExtendScript script automatically extracted the bounding boxes and exported each layer into two PNG images: 1) a crop around the bounding box of the design element and 2) a magnified 512×512 version of the design element, which was used for GPT-4-V for captioning.
Authors:
(1) Vivian Liu, Columbia University ([email protected]);
(2) Rubaiat Habib Kazi, Adobe Research ([email protected]);
(3) Li-Yi Wei, Adobe Research ([email protected]);
(4) Matthew Fisher, Adobe Research ([email protected]);
(5) Timothy Langlois, Adobe Research ([email protected]);
(6) Seth Walker, Adobe Research ([email protected]);
(7) Lydia Chilton, Columbia University ([email protected]).
This paper is