亚洲国产爱久久全部精品_日韩有码在线播放_国产欧美在线观看_中文字幕不卡在线观看

BuboGPT:

Enabling Visual Grounding in Multi-Modal LLMs


Bytedance Inc.   *Equal Contribution   +Project Lead

BuboGPT is an advanced Large Language Model (LLM) that incorporates multi-modal inputs including text, image and audio, with a unique ability to ground its responses to visual objects. It demonstrates remarkable chat abilities for arbitrary image-audio data understanding, whether aligned or unaligned.

Bubo owls are well known for having strong vision and hearing abilities that help them thrive.

Abstract

LLMs have demonstrated remarkable abilities at interacting with humans through language, especially with the usage of instruction-following data. Recent advancements in LLMs, such as MiniGPT-4, LLaVA, and X-LLM, further enlarge their abilities by incorporating multi-modal inputs, including image, video, and speech. Despite their effectiveness at generating precise and detailed language understanding of the given modality signal, these LLMs give up the ability to ground specific parts of inputs, thus only constructing a coarse-grained mapping. However, explicit and informative correspondence between text and other modalities will not only improve the user experience but also help to expand the application scenario of multi-modal LLMs.

  1. BuboGPT Architecture . We build a multi-modal LLM, BuboGPT for multi-modal understanding including image, audio and text by learning a common semantic space and further explore the fine-grained relation between different visual objects and different modalities.
  2. Multimodal Instruct Data. We construct a high-quality multi-modal instruction-tuning dataset including fine-grained audio descriptions and cross-modal sound localization, and introduce both positive and negative image-audio pairs for semantic matching to facilitate the cross-modal understanding..

BuboGPT Architecture

As the figure shown, we perform joint multi-modal understanding and chatting for text, vision and audio, which is achieved by learning a shared representation space that aligns well with pre-trained Vicuna. We also build an off-the-shelf visual grounding pipeline to explore the fine-grained relation between different visual objects and modalities.

The framework of BuboGPT.

BuboGPT: Training Procedure

BuboGPT connects different modality Q-Former with pre-trained large language model Vicuna, using a simple projection matrix. We consider a two-stage instruction-tuning procedure:

  • Stage 1: Single-modal Pre-training. We train the corresponding modality Q-Former and linear projection layer on a large number of modality-text paired data.
  • Stage 2: Multi-Modal Instruct Tuning. We curate a high-quality multi-modal instruction-following dataset to fine tune only the linear projection layer:
    • Image-Text: We employ two previously published datasets from MiniGPT-4 and LLaVa for visual instruct tuning.
    • Audio-Text: We build a series of expressive and descriptive data to facilitate this process based on Clotho dataset.
    • Audio-Image-Text: We build <audio, image, text> pairs to act as triple-modality instruction tuning dataset based on VGGSS dataset and further introduce negative set to enhance our model.

-->

Examples on Fine-grained Visual Understanding

We first consider using a single image as input for fine-grained visual understanding with grounding. As the exmaples shown, the model can accurately associate textural words or phrases with image regions in various scenarios with different complexities.


Examples on Audio Understanding

When a single audio clip is provided for audio understanding, BuboGPT gives informative descriptions covering nearly all acoustic parts included, even when some audio fragments are too short for humans to notice, see examples for details.


Examples on Aligned audio-image understanding

We show that BuboGPT can perform sound localization with a matched audio-image pair provided, which gives a perfect example for aligned audio-image understanding, see examples for details.


Examples on Arbitrary audio-image understanding

The BuboGPT can also tell whether the image and audio are relevant to each other and generate high-quality response for arbitrary audio-image understanding, see examples for details.

BibTeX


  @article{zhao2023bubogpt,
    author      = {Yang Zhao and Zhijie Lin and Daquan Zhou and Zilong Huang and Jiashi Feng and Bingyi Kang},
    title       = {BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs},
    publisher   = {arXiv:2307.08581},
    year        = {2023}
  }
  
亚洲国产爱久久全部精品_日韩有码在线播放_国产欧美在线观看_中文字幕不卡在线观看

    
    

    9000px;">

      
      

      国产精品99久久久久久似苏梦涵 | 国产成人99久久亚洲综合精品| 欧美日韩亚洲综合一区二区三区| 亚洲一区二区五区| 欧美色图第一页| 日本大胆欧美人术艺术动态| 亚洲精品在线免费观看视频| 成人高清av在线| 樱花影视一区二区| 欧美变态口味重另类| a亚洲天堂av| 亚洲第一激情av| 久久蜜桃一区二区| 欧洲色大大久久| 麻豆精品久久精品色综合| 国产精品三级电影| 日韩片之四级片| 成人av网站在线观看| 五月激情综合婷婷| 中文字幕日韩一区二区| 欧美大片国产精品| 在线免费观看日韩欧美| 国产一区二区福利视频| 亚洲成av人片在www色猫咪| 国产欧美日韩综合| 欧美一区日韩一区| 91在线观看免费视频| 美女一区二区三区在线观看| 亚洲欧美日韩一区二区三区在线观看| 日韩欧美二区三区| 欧美三区免费完整视频在线观看| 韩国av一区二区三区四区| 亚洲第四色夜色| 亚洲人成精品久久久久| 久久久久久久免费视频了| 欧美福利电影网| 在线观看av不卡| bt7086福利一区国产| 国产成人在线网站| 美女久久久精品| 日日摸夜夜添夜夜添精品视频| 国产精品二三区| 国产日韩欧美精品电影三级在线| 精品国产人成亚洲区| 欧美丰满美乳xxx高潮www| 欧美在线免费观看视频| 91在线看国产| 成人理论电影网| 成人免费视频一区| 国产成人精品影视| 国产一区二区不卡在线| 国产一区91精品张津瑜| 激情综合网激情| 久久精品噜噜噜成人av农村| 免费观看日韩av| 蜜桃av噜噜一区| 亚洲aⅴ怡春院| 七七婷婷婷婷精品国产| 另类中文字幕网| 国产主播一区二区三区| 国产福利视频一区二区三区| 国产高清精品久久久久| 99久久精品国产麻豆演员表| 午夜成人免费电影| 首页国产欧美日韩丝袜| 国产精品嫩草99a| 日本一区二区三区在线观看| 国产日本一区二区| 亚洲视频在线一区| 一区二区日韩av| 丝袜亚洲另类欧美| 国产一区二区在线看| 国产sm精品调教视频网站| 成人a免费在线看| 欧美亚洲一区三区| 日韩精品在线网站| 国产精品国产a级| 亚洲一二三区不卡| 国产在线麻豆精品观看| 99re在线精品| 欧美一区在线视频| 在线免费观看一区| 欧美性xxxxxxxx| 香蕉乱码成人久久天堂爱免费| 亚洲一区二区三区四区中文字幕| 亚洲午夜一区二区| 精品一二线国产| 色婷婷狠狠综合| 91麻豆精品国产91久久久使用方法| 欧美一区二区人人喊爽| 国产精品你懂的| 男女性色大片免费观看一区二区| 丁香婷婷深情五月亚洲| 欧美三电影在线| 欧美国产一区视频在线观看| 香蕉影视欧美成人| 91免费观看国产| xfplay精品久久| 一区二区三区欧美视频| 国产美女主播视频一区| 丁香一区二区三区| 精品成人一区二区三区四区| 4438x亚洲最大成人网| 日韩国产精品久久久久久亚洲| 国产激情偷乱视频一区二区三区| 欧美性受xxxx黑人xyx性爽| 久久久无码精品亚洲日韩按摩| 亚洲午夜久久久久久久久电影网 | 麻豆传媒一区二区三区| 91亚洲精品久久久蜜桃网站| 久久先锋影音av| 青青草国产精品97视觉盛宴| 色播五月激情综合网| 国产欧美1区2区3区| 理论片日本一区| 欧美一区二区三区视频免费| 亚洲第一久久影院| 欧美性猛交xxxxxx富婆| 亚洲美女免费在线| 色悠悠久久综合| ...av二区三区久久精品| 成人一级片网址| 国产精品久久久久久久久免费丝袜| 国产一区二区三区国产| 久久天堂av综合合色蜜桃网| 狠狠色丁香久久婷婷综合丁香| 制服丝袜成人动漫| 日本91福利区| 欧美一级黄色录像| 精品午夜一区二区三区在线观看| 日韩一区二区电影在线| 国产在线精品不卡| www国产精品av| 国产一区二区不卡老阿姨| 久久综合久久久久88| 国产一区二区三区黄视频 | 亚洲欧美日韩一区| 色哟哟精品一区| 亚洲在线视频网站| 日韩一区二区三区电影在线观看 | 日韩一本二本av| 麻豆精品一区二区三区| 久久久久久久久久看片| 成人av在线看| 亚洲综合免费观看高清在线观看| 欧美精品在欧美一区二区少妇| 免费人成在线不卡| 亚洲国产精品成人综合色在线婷婷| 成人做爰69片免费看网站| 成人欧美一区二区三区小说| 欧美日韩国产综合一区二区三区| 免费成人av资源网| 国产欧美一区二区三区网站| 99精品欧美一区| 日韩制服丝袜先锋影音| 久久综合精品国产一区二区三区| 成人午夜av在线| 亚洲成人激情自拍| 国产片一区二区| 欧美久久久久免费| 不卡大黄网站免费看| 天天影视色香欲综合网老头| 久久精品欧美日韩精品| 欧美日韩国产一区二区三区地区| 国产一区二三区好的| 亚洲精品中文在线| www国产成人| 91麻豆精品久久久久蜜臀 | 99精品欧美一区二区蜜桃免费| 日韩精品乱码免费| 亚洲欧洲国产日本综合| 日韩欧美国产精品| 欧美日韩国产精选| 色婷婷av一区二区三区之一色屋| 国产一区二区中文字幕| 日韩国产欧美在线播放| 亚洲免费在线看| 国产精品人人做人人爽人人添| 日韩一区二区三区电影| 欧美性生活一区| 日本精品视频一区二区| 成人国产在线观看| 国产在线播放一区三区四| 亚洲成人激情自拍| 亚洲女爱视频在线| ...av二区三区久久精品| 欧美激情一区二区三区不卡| 精品国产乱码久久久久久蜜臀| 欧美日韩精品一区视频| 在线国产电影不卡| 99精品视频免费在线观看| 国产精品系列在线播放| 久久91精品久久久久久秒播| 日本中文字幕一区| 亚洲va欧美va人人爽| 午夜久久电影网| 亚洲成人av福利| 五月婷婷欧美视频| 偷拍日韩校园综合在线| 日韩中文字幕区一区有砖一区|