Vocaloid and similar software use vocal fragments from real people, and their sounds get synthesized into speech or song. You could try to approximate these voice banks into sounding like the people again or make something that people can’t truly do.
The most famous Vocaloid is Hatsune Miku, and she’s voiced by Fujita Saki. The voice actor is quite fond of the character she lent her voice to, and the music that came out has been very creative. As an aside, many music producers in Japan today came from the Vocaloid space.
The more relevant software would be Voiceroid, which does approximate quite well to spoken Japanese, and CeVIO Studio AI where voice banks can be used for English-language speech. But many people who use these programs are using it as a stylistic TTS, adding flourishes no human being will ever use (everyone has more or less agreed that Zundamon should always end their speech with -nanoda).
Now, there are several games that use these software:
The former is a Touhou fangame, a community that has traditionally been using a TTS. The latter is a bunch of characters from the Voiceroid family. These have not been controversial as far as I know.
I say all this to say that the technology and the intention of Vocaloids and friends are slightly different. Vocaloid and Voiceroid users are often tuning their voicebanks to do their let’s plays of Minecraft or make songs.
From my understanding, vocal cloning as it is used in entertainment industries is just copying the voice of a celebrity as-is.
This has different implications: if Chuck Norris was a voicebank established in the same way as Vocaloids were, his voice would be used as the vocals of a song or a sketch comedy. In other words, as part of original content by the author. The cloning stuff as it stands now may be more closer to deepfakes, creating things he could say.
There is no reason to believe that the practitioners of vocal cloning technology are going to be malicious forever, of course. I can see a future where the companies might create good contracts with actors for vocal cloning, and the users are still responsible for their actions the same way Vocaloid and others are done.
As for whether it’s useful for interactive fiction, I think we have so many varying opinions on audiovisual elements in IF in the first place that even if the “AI stigma” fades away, people will still feel like they need to grumble about.
I’m not sure if adding voice acting will be a bonus for a lot of IF games too (imagine Eat Me voice acted). A game that’s going to have voice acting should consider how voice acting will elevate the script.