Ability to dub audio based on words of an uploaded timed transcript. The reason why this isn't ideal with just a Text to Speech function is that the cloned voice isn't timed. Thus the resulting audio isn't very helpful since it cannot be synced to the original MP4 video. Thus, if we are able to upload a transcript of the audio with delineated times, this could not only help ensure accuracy of the words, but also help syncronize timing with the MP4.
Summary:
1) Upload mp3 and transcript text file.
2) Download synced generated audio (instead of just an audio of a text).