The media industry is not only investing in ML tools today but is actually implementing them in production. As resolution demands by consumers increase, the computational ability to render that content is not keeping pace. Some real examples of this include:
- At Pixar, it takes 50 CPU hours to render a single 2K frame; at 26 frames/second and a 90-minute feature film, this cost becomes onerous.
- Facebook’s virtual reality platform needs to render high-resolution images with close to no lag in order to create a seamless experience that doesn’t make users motion sick.
- And slightly differently, Covid times are making it harder to acquire in-person video and photo shoots for generating corporate content or stock photos. All the corporate training videos must be updated for changing standards, and recording these in many languages starts to become expensive.
With the use of upsampling tools like GAN’s or other learned super-samplers, Pixar, Facebook, and others are able to more quickly, and cheaply, generate high-quality images and videos.
Pixar’s Super-Resolution Upscaling with GAN’s
Pixar has implemented a GAN trained on a corpus of their own movies that performs the super-resolution upscaling on each of the individual frames drawn by artists. While they are light on the architectural details in their published paper, they highlight a few challenges.
- HDR (High Dynamic Range) imagery is contained in arrays with large floating-point numbers; DNN’s perform best with normalized inputs [-1,1]. They applied a novel range-compression technique to transform the input data into a better format with better I/O performance.
- To prevent color shifts between the low-resolution input and the high-resolution output, they minimize the L1 loss between a down-sampled version of the output and the input image.
- While some GAN artifacts are present, they’ll abate it with more training data.
With this approach, they’re able to decrease the upscaling time on a single frame to 15 seconds and reduce their rendering-farm footprint by 75%.
Facebook’s Real-Time Upscaling for VR
Facebook’s unique challenge, unlike Pixars, is that the upscaling in a VR context has a spatio-temporal component. The environment and your view in it are constantly changing. For their neural super-sampler at inference time, they account for features like color, depth maps, and motion vectors with their low-resolution images to upscale the images to 2K in real-time. The output images below are of high quality.
Corporate DeepFakes When You Can’t Film the CEO or Need Them to Speak French
Ensuring corporate training videos on important topics like harassment, privacy, and export controls need to be made engaging and accessible to everyone. Synthesia fulfills this requirement with GAN augmented avatars of the speaker that can speak the audiences’ language. While it’s admittedly not as high quality as an actually filmed presenter, the footage is compelling enough and enables companies to deliver these important messages. This is probably better than watching slides with audio-only.
Refresh Your GAN Knowledge
In case you’re interested in a refresher on GAN’s after reading these applications, check out this technical-sided primer. #ML#Highlights-home#Highlights#Featured-area-1#Featured-area-1-home