M. Abbas Ansari

I’m a first year master’s student of Computational Neuroscience at University of Tübingen and a Graduate Research Assistant at Cognitive Neuroscience & Neurotechnology Lab, Max Planck Institute for Biological Cybernetics under Dr Romy Lorenz.

“The cosmos is within us. We are made of star stuff. We are a way for the universe to know itself.” ― Carl Sagan

I’m interested in understanding the underlying first principles of intelligence (such as FEP), focusing on unique behaviours and phenomena arising from/in human brains. Essentially, I want to understand what makes homo sapiens stand out from all other life forms in their ability to tame nature and their depth of creativity. Many life forms possess brains, and we share many of the same neural mechanisms; we may have much more in common than what is different. Yet, I’m interested in understanding that potent tiny difference in the underlying neural architecture and mechanisms of human brains. Those differences might exist at any scale, from molecular to proteins to channels to synapses to neurons, populations to regions to networks, to some unknown emergent architecture. I wish to find and understand them.

“What I cannot build, I do not understand.” — Richard Feynman

My interest in AI is only to test our extent of understanding of human intelligence. Current AI systems are energy-inefficient, and neuromorphic computing might hold some value in energy efficiency. Thus, I’m also interested in the thermodynamics of computation in human brains. I’m trying to keep up with the NeuroAI space and Active-Inference based methods.

Understanding systems lets one know when and why they fail and how to manipulate and improve them. A principled understanding of the brain and its interaction with the environment can help us fix failings at different scales, from neuronal ones such as Parkinson’s to psychological ones such as depression to social ones such as genocide. The human brain is taking action at the centre of psychology, sociology, economics and politics. These complex emergent phenomenons deserve to be studied at their own scales. Still, I want to bridge the gaps across scales, starting from the brain at the bottom. This is why I’m fascinated with complexity science and deeply inspired by “The Complex World” book by David C. Krakauer.

My interests are broad and ambitious. Thus, I’m trying to zero in on a few questions I want to pursue over the next few years as part of my master’s thesis and a PhD. I wish to not limit myself to demarcations of fields and tools and be open to learning/researching everything and anything to satisfy my curiosities. Suggestions on research questions worth pursuing will be highly appreciated!

Background

I completed my B.Tech. Computer Engineering at Jamia Millia Islamia University, New Delhi in June 2024. I was supervised by Prof. Tanvir Ahmad on my Bachelor’s thesis on “Music Generation from Brain Scans” which was my first major exploration into a problem of neuroscience involving reconstructing the music listened by subjects purely from their fMRI scans. This project made me fall in love with neuroscience.

As part of my Bachelors Minor Thesis, I explored GNNs, LLMs, and RAG techniques for the research competition of Multimodal Emotion-Cause Pair Extraction in Conversations, a SemEval 2024 Task.

I’ve also worked as a researcher at the HCTL Lab, TU Munich led by Prof. Dr. Enkelejda Kasneci. I was supervised by Yao Rong on the topic of self-supervised learning for scanpaths. I was a full-time summer research intern at TUM in 2023 funded by DAAD-WISE Scholarship. Previously, I’ve been a summer research intern at IIIT Allahabad under Prof. Anupam Agarwal where I worked on ASD diagnosis based on visual attention. I was mentored on my first research paper on wild scene text localization in images by Hitesh Hinduja.

I’ve had the honor of leading my team sCUDA_Divers to the Grand Finales of Smart India Hackathon for consecutive years of 2022 and 2023. Our hackathon project on the Super-Resolution of Digital Elevation Models culminated in a research paper published in IEEE IGARSS 2023.

Highlights

01/02/2025 Began Research Assistantship at Romy’s Lab!
01/10/2024 Began Master’s in Computational Neuroscience at Tübingen!
23/05/2024 Graduated with Honours in B.Tech. Computer Engineering!
19/03/2024 Paper on LLMs Accepted in SemEval 2024 Workshop!
06/03/2024 2nd Rank in Third Year of Computer Engineering!
31/01/2024 Team JMI Secured 4th Position at SemEval Task-3!
19/12/2023 Grand Finalist of SIH 2023 at KIT, Kolhapur!
01/08/2023 Completed Summer Research Internship at TU Munich!
16/07/2023 Master GAN Paper Published in IGARSS 2023!
10/04/2023 3rd Rank in Second Year of Computer Engineering!
17/02/2023 Text Localization Paper Published in IOSR Journal!
16/01/2023 Achieved DAAD-WISE Scholarship!
25/08/2022 Grand Finalist of SIH 2022 at GTU, Ahmedabad!
15/07/2022 Completed Summer Research Internship at IIIT Allahabad!

Publications

JMI at SemEval 2024 Task 3: Two-step approach for multimodal ECAC using in-context learning with GPT and instruction-tuned Llama models
Arefa, M. A. Ansari*, C. Saxena, T. Ahmad
ACL SemEval Workshop, 2024
[PDF] [Code]

Master GAN: Multiple Attention is all you Need: A Multiple Attention Guided Super Resolution Network for Dems
A. Mohammed, M. Kashif, M. H. Zama, M. A. Ansari* and S. Ali
IEEE IGARSS, 2023
[PDF] [Code]

Revisiting TextFuseNet: Text Context Enhanced Attention Networks For Scene Text Localization
H. Hinduja, M. A. Ansari*
IOSR Journal of Computer Engineering, 2023
[PDF] [Code]

Projects

In order of recency:

Music Generation from Brain Scans
Bachelors Major Thesis, 2024
[Thesis] [Slides]
Tackled reconstruction of music listened by a subject based on their fMRI scans. Using Nakai et al’s dataset of 5 subjects’ fMRI scans while listening to 540 music pieces. Modified Meta’s MusicGen model for music generation conditioned on fMRI scans using the Map Method. Experimented with EnCodec, Chromagram Tokenizer, and T5 encoders, achieving the best performance using the T5 encoder with total averaging (FAD: 8.41, KL: 2.42, MCD: 4.87). Identified the temporal lobe as crucial for music reconstruction, highlighting the importance of auditory processing, language comprehension, and multimodal integration in the neural representation of music.

Multimodal Emotion-Cause Analysis in Conversations using in-context learning and instruction-tuned LLMs
SemEval 2024 Workshop Task 3 Competition
[Paper] [Code]
Developed an efficient video captioning technique for conversational videos using GPT-4-Vision. Used Demonstration learning through retrieved examples for emotion recognition and cause prediction using GPT-3.5 for SemEval Task 3. Also implemented instruction-tuned Llama-2 model using QLoRA tecnique. Our approach won rank 4 in the competition.
(Paper Accepted!)

Multimodal Emotion-Cause Pair Extraction using Graph Neural Networks
Bachelors Minor Thesis, 2023
[Thesis] [Slides]
Developed a graph neural network for emotion-cause pair extraction from multimodal conversational data. Utilized CLIP, BERT, and HTS-AT audio encoder for diverse modality features. Explored multimodal fusion in transformers. Modeled conversational structure with graph attention networks.

Real-time Indoor Video Dehazing using Knowledge Distillation
Smart India Hackathon Grand Finale, 2023
[Slides] [Solution Proposal]
We proposed to modify MAPNet, a UNET-based dehazing network for outdoor environments by replacing some of the blocks with TAM-Net, a 2D convolutional variant for videos. We experimented with distillation by creating a smaller student network for dehazing. During the hackathon, we experimented with Dark Channel Prior and Boundary Contrainst Regularization approaches for benchmarking.

Self-Supervised Learning for Free-Viewing Scanpaths
DAAD-WISE Research Project at TU Munich, 2023
[Slides] [Code]
Eye movements can serve as a proxy for extracting neurological states of a subject. Our goal was to improve classifcation of a subject’s cognitive characteristics based on their free-viewing scanpaths on images. We experimented with using a non-contrastive self-supervised learning technique based on BarlowTwins where we implemented novel scanpath distortion techniques to create multiple views of scanpaths. A combined dataset was created using multiple publicly availble free-viewing scanpath datasets. Experiments demonstrated improvements in the downstream task of Autism detection.

Super-Resolution of Digital Elevation Models (DEM)
Smart India Hackathon Grand Finale, 2022 & IEEE IGARSS, 2023
[Slides] [Code] [Paper]
Led a team in developing a U-Net based convolutional network with attention for DEM super-resolution in ISRO’s Smart India Hackathon. DEMs collected from USGS LiDAR and SRTM, NASA ASTER and ISRO CartoSAT were used to curate the training set. Our team proposed MASTER GAN architecture achieving state-of-the-art results (PSNR 31.024, SSIM 0.908){:target=”_blank” rel=”noopener”} which got published at IEEE IGARSS 2023 conference.

Improved Visual Attention Classification for Autism Spectrum Disorder through Time-Dependent Representations.
Research Internship Project at IIIT Allahabad, 2022
[Slides] [Code]
Trained a deep learning network on Saliency4ASD dataset using ResNet-50 and LSTM using novel time-dependent representations. Encoded embeddings with duration via new techniques such as time-masking and joint embedding. Demonstrated improvements with incorporation of duration as a feature for ASD classification.

Text Localization using Efficient Attention
Research Project under Mentorship of Mr Hitesh Hinduja, 2021-23
[Code] [Paper]
Modified Mask R-CNN Architecture of Detectron2 library with efficient attention for improved text localization accuracy on SynthText dataset.

Robust Face Recognition Security System
HackJMI Hackathon Project, 2021
[Code]
Developed a robust face recognition security system using MTCNN, VGGFace, and inception-resnet siamese network capable of detecting spoof faces. A website with flask backend was created as MVP which won runner-up position at the hackathon.

Novel Bible Verse Generator
First Deep Learning Project, 2021
[Notebook] [Example]
Trained a character-level neural language model on Bible (KJV) using LSTM. Built a loop to generate 1000 characters from the seed text which was the fake Bible verse quote from pulp fiction (1994)