Skip to content

    [COLING 2025] Idea23D: Collaborative LMM Agents Enable 3D Model Generation from Interleaved Multimodal Inputs

    Notifications You must be signed in to change notification settings

    yisuanwang/Idea23D

    Repository files navigation

    Idea23D: Collaborative LMM Agents Enable 3D Model Generation from Interleaved Multimodal Inputs

    2024.11: ?? Idea-2-3D has been accepted by COLING 2025! ?? See you in Abu Dhabi, UAE, from January 19 to 24, 2025!

    2025.01: gradio demo is available at https://3389f4ca9cd69aae21.gradio.live

    ? GitHub Repo Stars ? arXiv ? ? ?

    Junhao Chen *, Xiang Li *, Xiaojun Ye, Chao Li, Zhaoxin Fan ?, Hao Zhao ?


    ?Introduction

    idea23d Based on the LMM we developed Idea23D, a multimodal iterative self-refinement system that enhances any T2I model for automatic 3D model design and generation, enabling various new image creation functionalities togther with better visual qualities while understanding high level multimodal inputs.

    ??Compatibility:

    ??Run

    The Gradio demo is coming soon, and you can also clone this repo to your local machine and run pipeline.py. he main dependencies we use include: python 3.10, torch==2.2.2+cu118, torchvision==0.17.2+cu118, transformers==4.47.0, tokenizers==0.21.0, numpy==1.26.4, diffusers==0.31.0, rembg==2.0.60, openai==0.28.0 These are compatible with gpt4o, instantMesh, hunyuan3d, sdxl, InternVL2.5-78B, and llava-CoT-11B.

    pip install -r requirements-local.txt
    

    You can add new LMM, T2I, and I23D support components by modifying the content under tool/api. An example of generating a watermelon fish is provided in idea23d_pipeline.ipynb. Open Idea23D/idea23d_pipeline.ipynb, Explore freely in the notebook ~

    from tool.api.I23Dapi import *
    from tool.api.LMMapi import *
    from tool.api.T2Iapi import *
    
    
    # Initialize LMM, T2I, I23D
    lmm = lmm_gpt4o(api_key = 'sk-xxx your openai api key')
    # lmm = lmm_InternVL2_5_78B(model_path='OpenGVLab/InternVL2_5-78B', gpuid=[0,1,2,3], load_in_8bit=True)
    # lmm = lmm_InternVL2_5_78B(model_path='OpenGVLab/InternVL2_5-78B', gpuid=[0,1,2,3], load_in_8bit=False)
    # lmm = lmm_InternVL2_8B(model_path = 'OpenGVLab/InternVL2-8B', gpuid=0)
    # lmm = lmm_llava_CoT_11B(model_path='Xkev/Llama-3.2V-11B-cot',gpuid=1)
    # lmm = lmm_qwen2vl_7b(model_path='Qwen/Qwen2-VL-7B-Instruct', gpuid=1)
    
    
    
    # t2i = text2img_sdxl_replicate(replicate_key='your api key')
    # t2i = t2i_sdxl(sdxl_base_path='stabilityai/stable-diffusion-xl-base-1.0', sdxl_refiner_path='stabilityai/stable-diffusion-xl-refiner-1.0', gpuid=6)
    t2i = t2i_flux(model_path='black-forest-labs/FLUX.1-dev', gpuid=2)
    
    
    # i23d = i23d_TripoSR(model_path = 'stabilityai/TripoSR' ,gpuid=7)
    i23d = i23d_InstantMesh(gpuid=3)
    # i23d = i23d_Hunyuan3D(mv23d_cfg_path="Hunyuan3D-1/svrm/configs/svrm.yaml",
    #         mv23d_ckt_path="weights/svrm/svrm.safetensors",
    #         text2image_path="weights/hunyuanDiT")
    

    If you want to test on the dataset, simply run the pipeline.py script, for example:

    python pipeline.py --lmm gpt4o --t2i flux --i23d instantmesh
    

    Evaluation dataset

    1. Download the required dataset dataset from Hugging Face.
    2. Place the downloaded dataset folder in the path Idea23D/dataset.
    cd Idea23D
    wget https://huggingface.co/yisuanwang/Idea23D/resolve/main/dataset.zip?download=true -O dataset.zip
    unzip dataset.zip
    rm dataset.zip
    

    Ensure the directory structure matches the path settings in the code for smooth execution.

    ??ToDO List

    ?1. Release Code

    ?2. Support for more models, such as SD3.5, CraftsMan3D, and more.

    ??Citations

    @article{chen2024idea23d,
      title={Idea-2-3D: Collaborative LMM Agents Enable 3D Model Generation from Interleaved Multimodal Inputs}, 
      author={Junhao Chen and Xiang Li and Xiaojun Ye and Chao Li and Zhaoxin Fan and Hao Zhao},
      year={2024},
      eprint={2404.04363},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
    }
    

    ??Acknowledgement

    We have intensively borrow codes from the following repositories. Many thanks to the authors for sharing their codes.

    llava-v1.6-34b, llava-v1.6-mistral-7b, llava-CoT-11B, InternVL2.5-78B, Qwen-VL2-8B, llava-CoT-11B, llama-3.2V-11B, intern-VL2-8B, SD-XL 1.0 base+refiner, DALL·E, Deepfloyd IF, FLUX.1.dev, TripoSR, Zero123, Wonder3D, InstantMesh, LGM, Hunyuan3D, stable-fast-3d,

    ?? Star History

    Star History Chart

    About

    [COLING 2025] Idea23D: Collaborative LMM Agents Enable 3D Model Generation from Interleaved Multimodal Inputs

    Topics

    Resources

    Stars

    Watchers

    Forks

    Releases

    No releases published

    Packages

    No packages published

    Contributors 2

    •  
    •  
    主站蜘蛛池模板: 色噜噜AV亚洲色一区二区| 中文字幕精品一区影音先锋| 无码一区二区三区在线| 一区二区三区波多野结衣| 日本成人一区二区三区| 爆乳熟妇一区二区三区霸乳| 亚洲Av高清一区二区三区| 国产激情一区二区三区| 国产精品伦子一区二区三区| 国产美女在线一区二区三区| 亚洲av无码不卡一区二区三区| 国产嫖妓一区二区三区无码| 在线欧美精品一区二区三区| 亚洲日本乱码一区二区在线二产线 | 国产91精品一区二区麻豆网站| 国产伦精品一区二区三区精品| 亚欧在线精品免费观看一区| 人妻体内射精一区二区三四| 色噜噜狠狠一区二区三区果冻 | 少妇人妻精品一区二区| 精品视频一区二区三区四区| 中文字幕无码不卡一区二区三区| 中文字幕一区视频| 午夜AV内射一区二区三区红桃视| 蜜臀AV免费一区二区三区| 亚洲一区精品视频在线| 性盈盈影院免费视频观看在线一区| 国产成人精品一区二三区熟女| 日本一道高清一区二区三区| 一区二区精品在线观看| 无码人妻久久一区二区三区免费| 麻豆文化传媒精品一区二区| 四虎在线观看一区二区| 成人免费区一区二区三区| 久久精品一区二区三区不卡| 无码人妻精品一区二区蜜桃网站 | 亚洲综合色一区二区三区小说| 日韩精品区一区二区三VR | 久久精品一区二区三区中文字幕 | 精品视频一区二区三区四区五区| 中文字幕视频一区|