Visual Reasoning Examples

Causal reasoning meets visual representation learning: A prospective study

With the emergence of huge amounts of heterogeneous multi-modal data, including images, videos, texts/languages, audios, and multi-sensor data, deep learning-based methods have shown promising ...

GitHub

High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning

Inspired by the human visual system's top-down, task-driven search, we propose Multi-turn Grounding-based Policy Optimization (MGPO). MGPO equips LMMs with interpretable, iterative visual grounding: ...

blockchain

DeepSeek Primitives Boost Visual Reasoning

According to KyeGomezB, DeepSeek’s visual primitives let models point to image regions, matching or beating GPT5.4 and Claude Sonnet 4.6 on VQA benchmarks. In the rapidly evolving landscape of ...

avinteractive.com

PTZOptics launches Visual Reasoning AI video initiative

Developed with Moondream AI, PTZOptics’ Visual Reasoning roadmap interprets live camera feeds and triggers open workflows such as auto‑tracking, smarter search and automated indexing. PTZOptics has ...

Ventureburn

Elorian Raises $55M to Scale Visual Reasoning AI

Elorian has raised $55 million in a seed funding round, reaching a $300 million valuation. The company said the raise strengthens its long-term research roadmap. It also signals strong early investor ...

Nature

Computational modeling of human reasoning processes for interpretable visual knowledge: a case study with radiographers

Visual reasoning is critical in many complex visual tasks in medicine such as radiology or pathology. It is challenging to explicitly explain reasoning processes due to the dynamic nature of real-time ...

unite

Jigsaw Puzzles Boost AI Visual Reasoning

New research indicates that AI models can get smarter at seeing by solving jigsaw puzzles. Rearranging scrambled images, videos, and 3D scenes helps them sharpen their visual skills without the need ...

VGR: Visual Grounded Reasoning

Today's paper introduces Visual Grounded Reasoning (VGR), a new approach for multimodal large language models that enables them to selectively focus on specific image regions during reasoning tasks.

GitHub

visual_thoughts_a_unified_perspective_of_understanding_multi.md

description [NeurIPS 2025][LLM Reasoning][Multimodal CoT] This paper proposes "Visual Thoughts" as a unified framework for interpreting the effectiveness of multimodal chain-of-thought reasoning (MCoT ...

IEEE

Reliable Visual Perception and Reasoning via False Positive Detection and Correction

Abstract: In Internet of Things (IoT) scenarios, vision-language models (VLMs) are increasingly employed for visual perception and reasoning. However, their inherent tendency toward hallucinated and ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results