Earlier this month, I attended Tencent’s Global Digital Ecosystem Summit in Shenzhen, where I got to experience their Digital Human showcase at the Shenzhen World Exhibition & Convention Center.
Along with other international media — including journalists from Malaysia and Indonesia — we were introduced to a digital version of Mr Dowson Tong, CEO of Tencent Cloud. Big crowds gathered to watch a virtual version of Mr Tong appear on a large screen and deliver a presentation in three languages: Chinese, English, and then Bahasa Indonesia.
Then it was my turn (I was quite eager to try out the technology).
With my phone, I scanned a QR code which directed me to Tencent’s website where I submitted a recent photo, a 30-second voice recording, and 100 words of text paragraph for my digital human to say.
I also had the freedom to choose from nine different languages for my digital human: Chinese, English, Korean, Japanese, Arabic, Bahasa Indonesia, Thai, French, or German. There was even an option to switch the voice to a different gender, which I found fascinating.
Unfortunately there were technical difficulties. Despite help from Tencent staff, I wasn’t able to generate my digital human on site so the process had to be completed remotely once I returned to Singapore.
I had help from the team at Tencent Cloud and Smart Industries Group. I provided a one-minute voice recording, along with a 30-second video. I was told that my face and mouth had to be clearly visible throughout. And for fun, I decided to recite tongue twisters —making sure my face moved in sync with my words.
Before the final step, I was required to provide consent so I recited: “I, Melody Chan, am aware that recordings of my voice will be used by Tencent Cloud to create and use a synthetic version of my voice.”
When the final product arrived, it was startlingly lifelike, almost a mirror of the video I had submitted. The voice, while slightly robotic, was impressive nonetheless. Hearing myself speak fluently in languages I don’t know, like Thai, Arabic, and French, was surreal.
However, there were a few small glitches. The sentence breaks felt a bit unnatural, and if you paid close attention, you could spot my hand gestures repeating themselves.
In the end, creating my own hyper-realistic digital doppelganger was undeniably fun, but also a little dystopian. It felt like something straight out of Black Mirror.
Seeing ‘Digital Melody’ come to life made me wonder… how long before AI comes for our jobs?