“The Hardest Part,” a new song from indie pop artist Washed Out, is all about love lost, among the most human of themes.
But ironically, to illustrate the tune’s sense of longing, the musician turned to something far less flesh-and-blood: artificial intelligence.
With Thursday’s release of “The Hardest Part,” Macon, Ga.-based Washed Out, whose real name is Ernest Greene, has the first collaboration between a major music artist and filmmaker on a music video using OpenAI’s Sora text-to-video technology, according to the singer-songwriter’s record label Sub Pop.
The roughly four-minute video, directed by Paul Trillo, speedily zooms the viewer through key elements of a couple’s life. The audience sees the characters — a red-haired woman and a dark-haired man — go from making out and smoking in a 1980s high school to getting married and having a child. “Don’t you cry, it’s all right now,” Greene croons. “The hardest part is that you can’t go back.”
The couple aren’t played by real actors. They’re created entirely digitally through Sora’s AI.
The video could mark the beginning of a potentially groundbreaking trend of using AI in video production.
“I think where we are now — that’s about to explode, and so I look forward to being able to incorporate some of this brand-new technology and seeing how that informs what I can come up with,” Greene said in an interview. “So, if that’s pioneering, I would love to be part of that.”
“The Hardest Part” — the lead single from Greene’s new self-produced album, “Notes From a Quest Life,” set for release on June 28 — is the longest music video made through Sora technology so far. The program creates short clips based on written text prompts. This enabled Trillo to build scenes in a way that would’ve been many times more expensive with actual actors, sets and locations.
“Not having the limitations of budget and having to travel to different locations, I was able to explore all these different, alternate outcomes of this couple’s life,” Trillo said.
Trillo is one of the creatives who has early access to Sora, which is not yet publicly available. OpenAI unveiled Sora in February and has been testing the system with directors and meeting with Hollywood executives and producers. It’s working out kinks and trying to address intellectual property concerns.
The innovations in AI have been hugely controversial in many corners, including in the music industry, which has been plagued by the use of “deepfakes,” or video and audio that falsely uses an artist’s image or voice. Musicians and others have pushed for legislation to combat such misleading creations, and talent agencies are working with tech startups to clamp down on unauthorized digital mimicry.
The introduction of Sora — coming from the same company that created the text-based AI model ChatGPT — raised concerns within Hollywood and elsewhere about its potentially devastating impact on jobs and production. Still, it inspired excitement among some creatives for the ways it could help them achieve their vision onscreen without being constrained by special effects budgets and travel limitations.
Both Greene and Trillo said they were able to do more with Sora than they would have with real-life sets on their budget. Sub Pop did not disclose the costs for the video. The music artist did not pay OpenAI to use the tech for the music video.
The two men had explored other ideas, including hiring dancers, and filming in a location that resembled the green hills in the art for Greene’s new album, but that proved difficult because of time and financial constraints. So Trillo suggested experimenting with Sora.
Greene, whose music TV audiences may recognize from the theme song of the satirical sketch comedy show “Portlandia,” was hesitant at first.
“I feel like with my music and most of the videos I’ve made over the years, it always starts from like a real emotional, sincere place,” Greene said, noting that many of the examples of AI video he’d seen existed in the dreaded “uncanny valley,” human-like but eerily artificial.
Nonetheless, Greene was willing to experiment. So Trillo tried out different concepts to see what would work in the video. Using the technology, he could explore all the various outcomes of the couple’s life across multiple locations by creating elaborate text-based prompts. He completed the video in about six weeks, editing together about 55 clips in the video from the roughly 700 that he generated using Sora.
“With this, there was no editing myself,” Trillo said. “I was really able to just try things and so that organically creates a different kind of story because of that, being able to throw so much at the wall and see what sticks.”
To generate usable clips, Greene needed to write prompts with enough specific details about not just the image itself but the shot angles and movements of the characters. “We zoom through the bubble it pops and we zoom through the bubblegum and enter an open football field,” Trillo wrote as part of his prompt for one brief snippet of video. “The scene is moving rapidly, showing a front perspective, showing the students getting bigger and faster.”
The final music video for “The Hardest Part” shows several locations, including a high school, a grocery store, rolling hills, a hallway with billowing white sheets and fire burning through the walls.
There were some limitations. Sometimes Trillo would have an idea and Sora would nail it. Other times, it would create something chaotic and unusable. The videos would come out with inconsistencies, which Trillo would sometimes choose to just overlook. The characters look a little different from clip to clip, as does the couple’s child.
Part of the video’s artsy charm is its dreamlike state — recollections of a couple’s life that illustrate the murkiness of human memory.
“You have to know where to pick your battles with it,” Trillo said of Sora. “You kind of have to relinquish a bit of your free will in working with this thing and you kind of have to accept the nature of how chaotic it is.”
“I was certainly blown away with just how far he could take it in piecing a story together,” Greene said.
Both Greene and Trillo said they see AI as potentially opening more opportunities for people to push the music video art form forward. Music videos are a logical medium in which to play around with AI, because they’re usually short and cost much less to make than feature films and television episodes.
However, Trillo said, it’s important to him that this is not used as a new main method for creation but rather another tool in the tool belt.
“A lot of music videos just don’t have the budgets to really dream big,” Trillo said. “I think AI can help the music industry in terms of creating things that even Ernest could dream of that maybe he wouldn’t have dared to dream before.”