EmoPP is an emotion-aware prosody analysis model that can more accurately mine the emotional cues of speech and predict more appropriate pause positions, thereby improving the emotional expression capabilities of the end-to-end speech synthesis system. This model demonstrates the strong correlation between emotion and prosody analysis through objective observation on the ESD data set. Target evaluation and subjective evaluation results show that the EmoPP model outperforms all baselines and achieves significant results in emotional expression.
Demand group:
["Emotional Speech Synthesis", "Dialogue System", "Voice Assistant"]
Example of usage scenario:
It can be used in speech synthesis systems that require emotional expression, such as virtual characters, conversational robots, voice assistants, etc.
It can be used to study the rhythmic rules of speech under different emotions.
It can be used to improve the naturalness and emotional expressiveness of speech synthesis.
Product features:
Prosodic analysis of emotion perception
Improve the emotional expression ability of speech synthesis