Trending Research

SymbolicAI: A framework for logic-based approaches combining generative models and solvers

ExtensityAI/symbolicai ? ? 1 Feb 2024

Through these operations based on in-context learning our framework enables the creation and evaluation of explainable computational graphs.

Few-Shot Learning In-Context Learning +1

1,422

2.09 stars / hour

Paper
Code

Do Large Language Models Need a Content Delivery Network?

lmcache/lmcache ? ? 16 Sep 2024

As the use of large language models (LLMs) expands rapidly, so does the range of knowledge needed to supplement various LLM queries.

In-Context Learning

2,312

1.80 stars / hour

Paper
Code

OmniGen2: Exploration to Advanced Multimodal Generation

vectorspacelab/omnigen2 ? ? 23 Jun 2025

To facilitate the training of OmniGen2, we developed comprehensive data construction pipelines, encompassing image editing and in-context generation data.

Image Generation multimodal generation +1

2,787

1.78 stars / hour

Paper
Code

Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation

meigen-ai/multitalk ? ? 28 May 2025

Audio-driven human animation methods, such as talking head and talking body generation, have made remarkable progress in generating synchronized facial movements and appealing visual quality videos.

Human Animation Instruction Following +1

1,014

1.35 stars / hour

Paper
Code

Towards CausalGPT: A Multi-Agent Approach for Faithful Knowledge Reasoning via Promoting Causal Consistency in LLMs

hcplab-sysu/causal-vlreasoning ? ? 23 Aug 2023

Drawing inspiration from the orchestration of diverse specialized agents collaborating to tackle intricate tasks, we propose a framework named Causal-Consistency Chain-of-Thought (CaCo-CoT) that harnesses multi-agent collaboration to bolster the faithfulness and causality of foundation models, involving a set of reasoners and evaluators.

counterfactual Science Question Answering

1,029

1.02 stars / hour

Paper
Code

MEIA: Multimodal Embodied Perception and Interaction in Unknown Environments

hcplab-sysu/causalvlr ? ? 1 Feb 2024

To overcome this limitation, we introduce the Multimodal Embodied Interactive Agent (MEIA), capable of translating high-level tasks expressed in natural language into a sequence of executable actions.

Embodied Question Answering Language Modeling +4

1,028

1.02 stars / hour

Paper
Code

AutoSchemaKG: Autonomous Knowledge Graph Construction through Dynamic Schema Induction from Web-Scale Corpora

hkust-knowcomp/autoschemakg ? ? 29 May 2025

We present AutoSchemaKG, a framework for fully autonomous knowledge graph construction that eliminates the need for predefined schemas.

graph construction Knowledge Graphs

310

0.95 stars / hour

Paper
Code

FlashDepth: Real-time Streaming Video Depth Estimation at 2K Resolution

Eyeline-Research/FlashDepth ? ? 9 Apr 2025

A versatile video depth estimation model should (1) be accurate and consistent across frames, (2) produce high-resolution depth maps, and (3) support real-time streaming.

2k Decision Making +2

185

0.68 stars / hour

Paper
Code

ERNIE-ViL 2.0: Multi-view Contrastive Learning for Image-Text Pre-training

PaddlePaddle/ERNIE ? ? 30 Sep 2022

They attempt to learn cross-modal representation using contrastive learning on image-text pairs, however, the built inter-modal correlations only rely on a single view for each modality.

Ranked #1 on Image Retrieval on AIC-ICC

Computational Efficiency Contrastive Learning +7

6,969

0.60 stars / hour

Paper
Code

Adapting Precomputed Features for Efficient Graph Condensation

Xtra-Computing/GCPA ? ? ICML 2025

To address this, Graph Condensation (GC) methods aim to compress large graphs into smaller, synthetic ones that are more manageable for GNN training.

Diversity

0.55 stars / hour

Paper
Code