“
In order to construct a flawless imitation, the first step was to gather as much video data as possible with a web crawler. His ideal targets were fashionable Yoruba girls, with their brightly colored V-neck buba and iro that wrapped around their waists, hair bundled up in gele. Preferably, their videos were taken in their bedrooms with bright, stable lighting, their expressions vivid and exaggerated, so that AI could extract as many still-frame images as possible. The object data set was paired with another set of Amaka’s own face under different lighting, from multiple angles and with alternative expressions, automatically generated by his smartstream. Then, he uploaded both data sets to the cloud and got to work with a hyper-generative adversarial network. A few hours or days later, the result was a DeepMask model. By applying this “mask,” woven from algorithms, to videos, he could become the girl he had created from bits, and to the naked eye, his fake was indistinguishable from the real thing. If his Internet speed allowed, he could also swap faces in real time to spice up the fun. Of course, more fun meant more work. For real-time deception to work, he had to simultaneously translate English or Igbo into Yoruba, and use transVoice to imitate the voice of a Yoruba girl and a lip sync open-source toolkit to generate corresponding lip movement. If the person on the other end of the chat had paid for a high-quality anti-fake detector, however, the app might automatically detect anomalies in the video, marking them with red translucent square warnings
”
”