Japanese has a feature called "pitch accent", wherein syllables can have either a 'high' or 'low' pitch, with there being a set of pitch accent patterns you'll often see on words. This takes roughly the role of stress in languages like English, so as you can imagine, it does sound a bit weird when it's wrong. There are multiple words that are pronounced near identically with the only possible distinction being pitch accent and context, as an example here's two words pronounced 'kaeru' with differing pitch accent patterns: 蛙 (frog), which has a 'heiban' pitch accent pattern (low-high-high), and 帰る (to return), which has an 'atamadaka' pitch accent pattern (high-low-low).
Since currently Eleven Multilingual V2 handles words written in kanji very poorly, users will often need to rewrite portions of the sentence that are in kanji in hiragana instead, and in doing so this also causes any guesses for pitch accent to be essentially random. For me this is very burdensome as I've been attempting to use ElevenLabs to generate sentence audio for Japanese flashcards I make from reading material that contains words I don't know, and in a learning context it is especially detrimental to have the wrong pitch accent presented. Other than that, I'd imagine that for native speakers it probably just feels "weird" to hear the pitch accent be wrong. Many text-to-speech programs made in whole or in part by Japanese speakers for at least a decade or more now have had ways to control this because of how vital it is (i.e. VOICEROID).