What does that have anything to do with the subject.
Do I want AI voices in my video games? No. We have the evidence that audiences will not accept it from existing ways "AI characters" repeat the same lines and grunts over and over again.
FFXIV barely has any player-induced voiced sounds other than laughs and groans when they fall. It would get seriously annoying if the characters sounds one way until it has to say your name and you get "microsoft sam" every time. When it comes to permissive use of generative AI, it should be used as little as necessary. Generating sentences where the player's name is said, and localizations into languages that the game has not translated or dubbed yet are about the extent of generative AI's usefulness here. Generating dialog though? Forget that, LLM's could not write something interesting and funny because they don't understand the language. They are a parrot.
Generative AI can "translate" one voice into another language without needing a new voice actor. That Voice actor would still be providing the voice for the non-native language when the only alternative is "nothing" translated at all. The Voice actor still gets paid as though they dubbed those lines, and likewise for variable speech lines where the player's name is said, they're paid for that additional use as long as it's generated on the game server and not the game client (Which opens a pandora's box of players being able to make the AI voice say anything.)
Voice actors should oppose any dubbing of a video game with their voice unless they are paid for those lines as though they were actually performing them. That's the only way AI voices will be accepted by voice actors. Players however will never accept generative AI voices if they have to hear them all the time.
Shills for generative AI keep trying to push this idea that it can replace people in a creative sense, but that is never true, and will never be true. The AI has no experience, no wisdom, no third eye, no sense of embarrassment, shame, anger, happiness, love, etc. All it does is mash words together that fit together in LLM's and does the same thing with visual and audio signals.



Reply With Quote


