Cognition-driven Affective Computing

Building multimodal AI that can read affect, model minds, and recover what people mean beyond what they say. CogAFFC is motivated by a core gap: multimodal systems can often recognize visible emotion, yet still fail to explain why it emerges, what mental state causes it, and what hidden social meaning is being conveyed. It connects multimodal perception, Theory-of-Mind reasoning, implicit social context, and empathetic intelligence into one coherent agenda.

  • A flagship research works
  • B community workshops, surveys
Emotion reasoning
Theory of Mind
Implicit social context
Affection cognition affection intent stance emotion causes

Flagship Research

Layer 1

Multimodal affect signals

Start from the observable world: facial expression, speech, language, video context, and other cross-modal cues that reveal emotional state.

  • Emotion perception and recognition
  • Cross-modal evidence alignment
  • From cues to grounded observation
Layer 2

Theory-of-Mind reasoning

Move from what is seen to what is mentally simulated: beliefs, intentions, causes, and intermediate cognitive states that explain emotion.

  • Benchmarking cognitive depth
  • Process-level reasoning supervision
  • Faithful emotional explanation
Layer 3

Implicit social meaning

Go beyond explicit emotion to what people actually imply under social constraints, politeness, strategic expression, irony, and hidden stance.

  • Affection, intent, and stance
  • Conflict-driven abductive reasoning
  • Towards robust empathetic systems
Implicit Sentiment latent opinion cues Hop 1: aspect Hop 2: opinion Hop 3: polarity THOR three-hop reasoning CoT Prompting step-by-step inference supervised + zero-shot gains

THOR-ISA studies implicit sentiment analysis, where opinion cues appear in obscure and indirect forms. The work introduces a three-hop Chain-of-Thought prompting framework that progressively infers latent aspect, opinion, and final sentiment polarity.

  • Targets ISA settings that require commonsense and multi-hop reasoning over latent intent.
  • Introduces THOR, a three-step prompting principle for aspect, opinion, and polarity inference.
  • Improves the state of the art by over 6% F1 in supervised setup and over 50% F1 in zero-shot setting.
ICLR 2026 Benchmark + Reasoning

Unveiling the Cognitive Compass: Theory-of-Mind-Guided Multimodal Emotion Reasoning

HitEmotion reframes multimodal emotion understanding as a cognition-sensitive task. It introduces a Theory-of-Mind-grounded benchmark and a reasoning pipeline that explicitly tracks mental states instead of relying on shallow post-hoc rationales.

  • Restructures 24 datasets into three levels of cognitive depth.
  • Diagnoses where current multimodal models break as reasoning gets deeper.
  • Combines ToM-guided reasoning with TMPO process supervision and reinforcement learning.
Benchmark 24 aligned datasets Perception and recognition Understanding and analysis Cognition and reasoning ToM-guided mental-state reasoning TMPO process supervision + RL
Progressing Dataset + Abductive reasoning

MoCA: Implicit Social Context Analysis

CoDAR conflict-driven abductive reasoning affection intent stance mechanism subject / target observed expression

MoCA formalizes the problem of recovering what people truly feel, intend, or imply when their expression is indirect. The work expands the thread from explicit emotion understanding to latent social meaning under multimodal social context.

  • Defines three core dimensions: implicit affection, implicit intent, and implicit stance.
  • Builds a 3,000-instance multimodal benchmark from memes, debates, discussions, and sitcoms.
  • Introduces CoDAR, a conflict-driven abductive reasoning framework for hidden social inference.
Coming soon

Community

About Schedule Accepted Papers Call for Papers Speakers Organizers Register The 1st CogMAEC Workshop Cognition-oriented Multimodal Affective and Empathetic Computing 27-31 October 2025 | Dublin, Ireland | ACM Multimedia 2025
A workshop on moving beyond visible affect recognition towards cognition-aware, context-grounded multimodal understanding, reasoning, and empathetic AI systems.
Learn More About the CogMAEC Multimodal systems can detect visible affect, but often fail to explain why emotions arise, what mental states drive them, and how implicit social context shapes expression. CogMAEC connects cognition, reasoning, and empathetic computing into one coherent research agenda. Schedule Highlights Time Session Presenter 13:30 Keynote I: Social Intelligence with LLMs Speaker 1 14:15 Keynote II: Open Challenges for Multimodal Agents Speaker 2 16:00 Oral Session: Emotion, Memory, and Social Context Accepted Papers 17:00 Poster Session and Discussion All attendees Invited Speakers Speaker A Speaker B Speaker C Speaker D
Workshop

The CogMAEC Workshop series

The workshop anchors the community around cognition-oriented multimodal affective and empathetic computing. It frames the area around context, causal reasoning, emotion understanding, and the next generation of affect-aware multimodal models.

  • Co-located with ACM Multimedia 2025 and organized as the first CogMAEC edition.
  • Brings together work on affective computing, MLLM reasoning, and cognitive modeling.
  • Includes invited talks, oral presentations, posters, and 11 accepted papers.