表題番号:2023C-436 日付:2024/02/15
研究課題創造的な思考を促進するイラスト制作システム
研究者所属(当時) 資格 氏名
(代表者) 理工学術院 基幹理工学部 講師 福里 司
研究成果概要
To build a system that creates high-quality illustrations imagined by users, as a first step we develop an AI system to generate photorealistic images (e.g., facial images) from rough illustrations. Synthesizing photorealistic facial images from monochromatic rough illustrations is one of the most fundamental tasks in the field of image-to-image translation. However, it is still challenging to simultaneously consider (1) high-dimensional face features such as geometry and color, and (2) characteristics of input sketches. Existing methods often use sketches as indirect inputs to guide AI models, resulting in the loss of sketch features or in alterations to geometry information. This research proposes an LDM-based network architect trained on the paired sketch–face dataset, named ``Sketch-Guided Latent Diffusion Model (SGLDM).’’ We first apply a Multi-Auto-Encoder (AE) to encode the input sketch from the pixel space to a feature map in the latent space by dividing the sketch into several regions, enabling us to reduce the dimensions of the sketch input while preserving the geometry-related information of the local face details. Next, we construct a sketch-face paired dataset based on an existing method that extracts the edge map from an image. In addition, we augment our dataset to improve the robustness of the SGLDM to handle arbitrarily abstract sketch inputs. The evaluation study shows that the SGLDM can synthesize high-quality face images with different expressions, facial accessories, and hairstyles from various sketches having different abstraction levels.