Resemble.ai/fill has a feature which allows replacing a word in an audio with different words while matching the emotion/intonation/prosody/volume.
Essentially, being able to replace words while making it sound exactly the same (including emotion/intonation/prosody/volume/etc).
This would allow for editing parts of an audio quickly without changing or re-generating or re-recording of the entire audio but also have the same emotion/intonation/prosody/volume/etc.