A team of UC Davis computer scientists’ paper, “FineGAN: Unsupervised Hierarchical Disentanglement for Fine-Grained Object Generation and Discovery,” was accepted for an oral presentation at the 2019 IEEE Computer Vision and Pattern Recognition (CVPR) conference. CVPR, the world’s premiere conference for computer vision research, only gives the honor to 5% of submissions, which distinguishes the paper as one of the top in its field.
The paper, authored by Ph.D. student Krishna Kumar Singh, visiting researcher Utkarsh Ojha and assistant professor Yong Jae Lee, describes an algorithm the team developed that was able to identify, disentangle and layer different parts of generated images by associating random codes with different parts of the image.
Given two images of birds that look similar and have similar backgrounds, humans can recognize the similarities in shape and background while still being able to understand differences in color or texture and categorize them based off these fine-grained differences. However, this doesn’t come naturally to a machine.
The computer model was shown raw images containing objects that belong to specific categories, such as birds. It started with this set of random codes and learned to correspond different codes to different parts of the generated image by maximizing the mutual information between them. From this, it identified and disentangled four layered components—background, bird shape, texture/color and orientation/position. Manipulating specific codes changed the corresponding part of the generated image.
“Basically, you have full freedom to generate any image you want by controlling these four random codes,” said Singh.
By changing any of the four random codes, the algorithm is able to manipulate a generated image. Photo courtesy of Krishna Kumar Singh.
With this knowledge, the model could analyze and predict these different components in real images too. “When the model sees duck-shaped birds, it should be able to predict some common code or feature—regardless of background or texture details—that’s going to represent the duck shape,” explained Ojha. The team tested this by having the model cluster images into fine-grained categories using shape and texture-related features, where it significantly outperformed previous clustering methods.
What makes the algorithm special is that it’s able to do all of this in an unsupervised setting, meaning it learns from raw, unlabeled data as opposed to being trained with human-annotated data. Singh, Ojha and Lee’s work suggests that developing this type of generative modeling algorithms is possible in unsupervised settings.
“We still have a long way to go, but this is a very exciting first step towards unsupervised modeling of fine-grained real images” said Lee.
The team will present their research at the 2019 IEEE CVPR conference on June 19 in Long Beach, CA.