Global AI and Data Science

Global AI & Data Science

Train, tune and distribute models with generative AI and machine learning capabilities

 View Only

AI is Solving our Problems in Coronavirus Imposed Video-Conferencing

By Michael Mansour posted Wed May 06, 2020 01:48 AM

  

With Video Conferencing becoming the new norm for communications, the ML community has been quickly releasing a number of features to help smooth out the experience, or just make it more fun.  We cover a few of those advancements that you can actually use today at home.


Impersonate Elon Musk (Among Others) in Realtime on Your Next Zoom Call

With video conferencing starting to become tiresome, there’s now a way to add some surprise by, in real time, casting yourself as a celebrity in your next video call. With Avatarify, now you can pretend to be Elon Musk or Mona Lisa among many options.  The app runs locally on your machine and plugs into a number of applications easily.  The plug and play tool allows real time controls for calibration and operation. However, it’s not perfect and depends on the performance of your initial calibration + computing power; it’s at least good enough to get a rise out your friends.



The implementation is based on the recent paper “First Order Motion Model for Image Animation.”  The introduced contribution of this SOTA approach is a more efficient method for modeling object transformations.  Their model learns to encode motion as a combination of motion-specific keypoint displacements and local affine transformations, which allows them to make first order approximations about complex movements. This, plus some other tricks increases the performance of this application.


While the CLI based app ships with a few pre-trained models of popular likenesses to mimic, it should not be too hard to train your own model to broadcast you as anyone.  Interestingly, and probably responsibly so, the creators of this app don’t natively support this dual-use ability.  This is a good choice on their part such that the chances of using an offensive impersonation in a zoom-bombing attack or causing actual deceit are reduced.  Dual-use of AI research and tools seems to have not been getting as much attention lately, so it’s great to see that considered here. 



Background Noise Suppression - Challenges and Solutions for Video Conferencing Application Providers


A number of video conferencing features are being fast tracked right now given the extraordinary high demand, most useful of all is real time background noise suppression. It might be surprising that there is a wide variance in the current capabilities offered, but there are a number of challenges to getting it right


Previously, there was no canonical dataset representative of video-conferencing environments to experiment with.  That’s now been open sourced for you to develop with.  The dataset contains audio samples with a wide array of background noises overlayed, along with the target extraction audio. 


Computational cost of deploying this is a barrier.  Implementing this server side is prohibitively expensive and would add too much latency since voice and video packets are disseminated directly to participants from the source (in at least 1-1 settings).  Most of the work needs to be pushed out to the client machines.  This comes with a host of issues like not consuming all the CPU or draining the battery while also maintaining a memory-friendly model that can work on a wide array of computer and mobile setups. 


If you’re interested in how background noise suppression can be done with deep learning,
check out this tutorial



Have an Nvidia GPU?  Suppress the Background Noise Yourself Even Better with RTX Voice


Another ML powered enhancement to our stuck-at-home work conference calls is the ability to cancel out your own background noise with state-of-the-art technology and hardware from Nvidia with RTX Voice.  If you own one of their RTX series GPUs, you can locally filter out keyboard noises and yelling children on your own machine before it gets inputted into any of the most popular video conferencing apps.  One of the limiting downsides of the pre-shipped models with video conferencing apps is that they have to remain extremely lightweight in operation and size while needing to operate on a wide range of hardware.  That’s not an issue here.  RTX Voice will also eliminate background noises from output sources, like your co-workers barking dog being broadcasted on your headphones.  Check out the official site to locally configure RTX Voice


Better yet, the community is claiming that you don’t technically need an RTX card to enable RTX Voice; it will work on other Nvidia cards.


However, if you’re still left without an Nvidia card (Mac users), it might be worthwhile starting your own open source project to mirror these capabilities using the dataset here.
#GlobalAIandDataScience
#GlobalDataScience
0 comments
6 views

Permalink