This story draft by @escholar has not been reviewed by an editor, YET.

Logomotion System and Input

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
0-item

Table of Links

Abstract and 1 Introduction

2 Related Work

2.1 Program Synthesis

2.2 Creativity Support Tools for Animation

2.3 Generative Tools for Design

3 Formative Steps

4 Logomotion System and 4.1 Input

4.2 Preprocess Visual Information

4.3 Visually-Grounded Code Synthesis

5 Evaluations

5.1 Evaluation: Program Repair

5.2 Methodology

5.3 Findings

6 Evaluation with Novices

7 Discussion and 7.1 Breaking Away from Templates

7.2 Generating Code Around Visuals

7.3 Limitations

8 Conclusion and References

4 LOGOMOTION SYSTEM

We present LogoMotion, a LLM-based method that automatically animates logos based on their content. The input is a static PDF document which can consist of image and text layers. The output is an HTML page with JavaScript code that renders the animation. The pipeline has three steps: 1) preprocessing (for visual awareness), which represents the input in HTML and augments it with information about hierarchy, groupings, and descriptions of every element, 2) visually grounded code generation, which takes the preprocessed HTML representation and the static image of the logo and outputs JavaScript animation code, and 3) visually-grounded program repair, which compares the last frame of the animation to the target image and does LLM-based self-refinement if there are visual errors on any layer.

4.1 Input

A user begins by importing their PDF document into Illustrator. Within Illustrator, using ExtendScript, they can export their layered document into an HTML page. We use HTML as a fundamental representation to suit the strengths of an LLM and construct a text representation of the canvas. The HTML representation includes the height, width, z-index and top- and bottom- positions of every image element. Text elements are represented as image layers. Each word is captured as a separate image layer, and its text content is the alt text caption, except in the case of arced text (e.g. logo title in Figure 1), where each letter is a separate image layer. Every element is given a random unique ID. This representation allows the LLM to understand what layers make up the logo image.


The ExtendScript script automatically extracted the bounding boxes and exported each layer into two PNG images: 1) a crop around the bounding box of the design element and 2) a magnified 512×512 version of the design element, which was used for GPT-4-V for captioning.


Authors:

(1) Vivian Liu, Columbia University ([email protected]);

(2) Rubaiat Habib Kazi, Adobe Research ([email protected]);

(3) Li-Yi Wei, Adobe Research ([email protected]);

(4) Matthew Fisher, Adobe Research ([email protected]);

(5) Timothy Langlois, Adobe Research ([email protected]);

(6) Seth Walker, Adobe Research ([email protected]);

(7) Lydia Chilton, Columbia University ([email protected]).


This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.


L O A D I N G
. . . comments & more!

About Author

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
EScholar: Electronic Academic Papers for Scholars@escholar
We publish the best academic work (that's too often lost to peer reviews & the TA's desk) to the global tech community

Topics

Around The Web...

Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks