Building multimodal AI that can read affect, model minds, and recover what people mean beyond what they say. CogAFFC is motivated by a core gap: multimodal systems can often recognize visible emotion, yet still fail to explain why it emerges, what mental state causes it, and what hidden social meaning is being conveyed. It connects multimodal perception, Theory-of-Mind reasoning, implicit social context, and empathetic intelligence into one coherent agenda.
Aflagship research works
Bcommunity workshops, surveys
Emotion reasoning
Theory of Mind
Implicit social context
Flagship Research
Layer 1
Multimodal affect signals
Start from the observable world: facial expression, speech, language, video context,
and other cross-modal cues that reveal emotional state.
Emotion perception and recognition
Cross-modal evidence alignment
From cues to grounded observation
Layer 2
Theory-of-Mind reasoning
Move from what is seen to what is mentally simulated: beliefs, intentions, causes,
and intermediate cognitive states that explain emotion.
Benchmarking cognitive depth
Process-level reasoning supervision
Faithful emotional explanation
Layer 3
Implicit social meaning
Go beyond explicit emotion to what people actually imply under social constraints,
politeness, strategic expression, irony, and hidden stance.
THOR-ISA studies implicit sentiment analysis, where opinion cues appear in obscure
and indirect forms. The work introduces a three-hop Chain-of-Thought prompting
framework that progressively infers latent aspect, opinion, and final sentiment polarity.
Targets ISA settings that require commonsense and multi-hop reasoning over latent intent.
Introduces THOR, a three-step prompting principle for aspect, opinion, and polarity inference.
Improves the state of the art by over 6% F1 in supervised setup and over 50% F1 in zero-shot setting.
HitEmotion reframes multimodal emotion understanding as a cognition-sensitive task.
It introduces a Theory-of-Mind-grounded benchmark and a reasoning pipeline that
explicitly tracks mental states instead of relying on shallow post-hoc rationales.
Restructures 24 datasets into three levels of cognitive depth.
Diagnoses where current multimodal models break as reasoning gets deeper.
Combines ToM-guided reasoning with TMPO process supervision and reinforcement learning.
MoCA formalizes the problem of recovering what people truly feel, intend, or imply
when their expression is indirect. The work expands the thread from explicit emotion
understanding to latent social meaning under multimodal social context.
Defines three core dimensions: implicit affection, implicit intent, and implicit stance.
Builds a 3,000-instance multimodal benchmark from memes, debates, discussions, and sitcoms.
Introduces CoDAR, a conflict-driven abductive reasoning framework for hidden social inference.
Coming soon
Community
Workshop
The CogMAEC Workshop series
The workshop anchors the community around cognition-oriented multimodal affective and
empathetic computing. It frames the area around context, causal reasoning, emotion
understanding, and the next generation of affect-aware multimodal models.
Co-located with ACM Multimedia 2025 and organized as the first CogMAEC edition.
Brings together work on affective computing, MLLM reasoning, and cognitive modeling.
Includes invited talks, oral presentations, posters, and 11 accepted papers.