DreamTalk

    Diffusion-based Expressive Talking Head
    Generation Framework.
    dreamtalk

    When Expressive Talking Head Generation Meets Diffusion Probabilistic Models

    Yifeng Ma1, Shiwei Zhang2, Jiayu Wang2, Xiang Wang3, Yingya Zhang2, Zhidong Deng1

    1Tsinghua University, 2Alibaba Group, 3Huazhong University of Science and Technology

    Diffusion models have shown remarkable success in a variety of downstream generative tasks, yet remain under-explored in the important and challenging expressive talking head generation. In this work, we propose a DreamTalk framework to fulfill this gap, which employs meticulous design to unlock the potential of diffusion models in generating expressive talking heads. Specifically, DreamTalk consists of three crucial components: a denoising network, a style-aware lip expert, and a style predictor. The diffusion-based denoising network is able to consistently synthesize high-quality audio-driven face motions across diverse expressions. To enhance the expressiveness and accuracy of lip motions, we introduce a style-aware lip expert that can guide lip-sync while being mindful of the speaking styles. To eliminate the need for expression reference video or text, an extra diffusion-based style predictor is utilized to predict the target expression directly from the audio. By this means, DreamTalk can harness powerful diffusion models to generate expressive faces effectively and reduce the reliance on expensive style references. Experimental results demonstrate that DreamTalk is capable of generating photo-realistic talking faces with diverse speaking styles and achieving accurate lip motions, surpassing existing state-of-the-art counterparts.

    The code and checkpoints are released.

    Overview

    Generalization Capabilities: Songs
    送別 Farewell (Chinese), Love Story (English)
    More Songs
    上海灘 The Bund (Cantonese), Lemon (Japanese), All For Love (English)
    Generalization Capabilities: Out-of-domain Portraits

    Generalization Capabilities: Speech in Multiple Languages
    Speech in Chinese, French, German, Italian, Japanese, Korean, and Spanish
    Generalization Capabilities: Noisy Audio

    Speaking Style Manipulation
    Adjusting the Scale of Classifier-free Guidance; Style Code Interpolation
    Speaking Style Prediction

    If you are seeking an exhilarating challenge and the chance to collaborate with AIGC and large-scale pretraining, then you have come to the right place. We are searching for talented, motivated, and imaginative researchers to join our team. If you are interested, please don't hesitate to send us your resume via email yingya.zyy@alibaba-inc.com

    References

    @article{ma2023dreamtalk,
    title={DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models},
    author={Ma, Yifeng and Zhang, Shiwei and Wang, Jiayu and Wang, Xiang and Zhang, Yingya and Deng, Zhidong},
    journal={arXiv preprint arXiv:2312.09767},
    year={2023}
    }

    主站蜘蛛池模板: 国产韩国精品一区二区三区 | 一区二区三区波多野结衣 | 丝袜人妻一区二区三区网站| 亚洲av无码成人影院一区| 精品人体无码一区二区三区| 在线电影一区二区| 亚洲中文字幕在线无码一区二区| 亚洲一区二区三区自拍公司| 久久成人国产精品一区二区| 美女视频一区二区三区| 日本一区二区三区久久| 天海翼一区二区三区高清视频| 精品人妻一区二区三区浪潮在线 | 无码一区二区三区免费| 精品无码综合一区| 久久成人国产精品一区二区| 久夜色精品国产一区二区三区| 一区二区三区www| 亚洲第一区精品日韩在线播放| 日韩一区二区三区视频| 亚洲国产一区明星换脸| 国产福利电影一区二区三区久久久久成人精品综合 | 少妇激情AV一区二区三区 | 蜜臀AV无码一区二区三区| 无码夜色一区二区三区| 在线电影一区二区| 日亚毛片免费乱码不卡一区| 人妻无码一区二区视频| 国产精品亚洲高清一区二区| 农村人乱弄一区二区| 国产精品视频一区二区三区四| 亚洲制服中文字幕第一区| 精品黑人一区二区三区| 天天综合色一区二区三区| 精品理论片一区二区三区| 中文字幕一区二区在线播放 | 亚洲AV成人精品日韩一区18p | 无码日韩精品一区二区免费| 精品国产一区二区三区久久狼 | 在线|一区二区三区四区| 日本中文字幕在线视频一区|