M. Abbas Ansari

Abbas Ansari

CS Senior @ JMI, New Delhi
Undergraduate ML Researcher

GitHub | LinkedIn | X
Google Scholar | Email
About Me
Highlights
Publications
Projects
CV

Hi! I am a final year undergraduate student of B.Tech. Computer Engineering at Jamia Millia Islamia University, New Delhi. I’m presently working on my Bachelors Major Thesis on Music Generation from Brain Scans under Prof. Tanvir Ahmad where I’m exploring how to leverage Meta’s MusicGen, a conditional music LLM, for brain scans.

I’m currently also a part-time researcher at the HCTL Lab, TU Munich led by Prof. Dr. Enkelejda Kasneci. I’m supervised by Yao Rong on the topic of self-supervised learning for scanpaths. I was a full-time summer research intern at TUM in 2023 funded by DAAD-WISE Scholarship. Previously, I’ve been a summer research intern at IIIT Allahabad under Prof. Anupam Agarwal where I worked on ASD diagnosis based on visual attention. I was mentored on my first research paper on wild scene text localization in images by Hitesh Hinduja.

I’ve had the honor of leading my team sCUDA_Divers to the Grand Finales of Smart India Hackathon for consecutive years of 2022 and 2023. Our hackathon project on the Super-Resolution of Digital Elevation Models culminated in a research paper published in IEEE IGARSS 2023.

Recently, as part of my Bachelors Minor Thesis, I explored GNNs, LLMs, and RAG techniques for the research competition of Multimodal Emotion-Cause Pair Extraction in Conversations, a SemEval 2024 Task. (paper accepted into conference!)

I’m interested in exploring multimodal LLMs with a specific focus towards the modality of brain scans. Currently exploring Neural Decoding. I would love to work on projects on the intersection of neuroscience and artificial intelligence to further our understanding of the brain and intelligence.

Highlights

19/03/2024   Paper on LLMs Accepted in SemEval 2024 Workshop!
06/03/2024   2nd Rank in Third Year of Computer Engineering!
31/01/2024   Team JMI Secured 4th Position at SemEval Task-3!
19/12/2023   Grand Finalist of SIH 2023 at KIT, Kolhapur!
01/08/2023   Completed Summer Research Internship at TU Munich!
16/07/2023   Master GAN Paper Published in IGARSS 2023!
10/04/2023   3rd Rank in Second Year of Computer Engineering!
17/02/2023   Text Localization Paper Published in IOSR Journal!
16/01/2023   Achieved DAAD-WISE Scholarship!
25/08/2022   Grand Finalist of SIH 2022 at GTU, Ahmedabad!
15/07/2022   Completed Summer Research Internship at IIIT Allahabad!

Publications

JMI at SemEval 2024 Task 3: Two-step approach for multimodal ECAC using in-context learning with GPT and instruction-tuned Llama models
Arefa, M. A. Ansari*, C. Saxena, T. Ahmad
SemEval Workshop, 2024 (ArXiv Preprint)
[PDF] [Code]

Master GAN: Multiple Attention is all you Need: A Multiple Attention Guided Super Resolution Network for Dems
A. Mohammed, M. Kashif, M. H. Zama, M. A. Ansari* and S. Ali
IEEE IGARSS, 2023
[PDF] [Code]

Revisiting TextFuseNet: Text Context Enhanced Attention Networks For Scene Text Localization
H. Hinduja, M. A. Ansari*
IOSR Journal of Computer Engineering, 2023
[PDF] [Code]

Projects

In order of recency:

Music Generation from Brain Scans
Bachelors Major Thesis, 2024
[Thesis] [Slides]
Tackled reconstruction of music listened by a subject based on their fMRI scans. Using Nakai et al’s dataset of 5 subjects’ fMRI scans while listening to 540 music pieces. Modified Meta’s MusicGen model for music generation conditioned on fMRI scans using the Map Method. Experimented with EnCodec, Chromagram Tokenizer, and T5 encoders, achieving the best performance using the T5 encoder with total averaging (FAD: 8.41, KL: 2.42, MCD: 4.87). Identified the temporal lobe as crucial for music reconstruction, highlighting the importance of auditory processing, language comprehension, and multimodal integration in the neural representation of music.

Multimodal Emotion-Cause Analysis in Conversations using in-context learning and instruction-tuned LLMs
SemEval 2024 Workshop Task 3 Competition
[Paper] [Code]
Developed an efficient video captioning technique for conversational videos using GPT-4-Vision. Used Demonstration learning through retrieved examples for emotion recognition and cause prediction using GPT-3.5 for SemEval Task 3. Also implemented instruction-tuned Llama-2 model using QLoRA tecnique. Our approach won rank 4 in the competition.
(Paper Accepted!)
Static Badge Static Badge Static Badge Static Badge Static Badge

Multimodal Emotion-Cause Pair Extraction using Graph Neural Networks
Bachelors Minor Thesis, 2023
[Thesis] [Slides]
Developed a graph neural network for emotion-cause pair extraction from multimodal conversational data. Utilized CLIP, BERT, and HTS-AT audio encoder for diverse modality features. Explored multimodal fusion in transformers. Modeled conversational structure with graph attention networks.
Static Badge Static Badge Static Badge

Real-time Indoor Video Dehazing using Knowledge Distillation
Smart India Hackathon Grand Finale, 2023
[Slides] [Solution Proposal]
We proposed to modify MAPNet, a UNET-based dehazing network for outdoor environments by replacing some of the blocks with TAM-Net, a 2D convolutional variant for videos. We experimented with distillation by creating a smaller student network for dehazing. During the hackathon, we experimented with Dark Channel Prior and Boundary Contrainst Regularization approaches for benchmarking.
Static Badge Static Badge

Self-Supervised Learning for Free-Viewing Scanpaths
DAAD-WISE Research Project at TU Munich, 2023
[Slides] [Code]
Eye movements can serve as a proxy for extracting neurological states of a subject. Our goal was to improve classifcation of a subject’s cognitive characteristics based on their free-viewing scanpaths on images. We experimented with using a non-contrastive self-supervised learning technique based on BarlowTwins where we implemented novel scanpath distortion techniques to create multiple views of scanpaths. A combined dataset was created using multiple publicly availble free-viewing scanpath datasets. Experiments demonstrated improvements in the downstream task of Autism detection.
Static Badge Static Badge

Super-Resolution of Digital Elevation Models (DEM)
Smart India Hackathon Grand Finale, 2022 & IEEE IGARSS, 2023
[Slides] [Code] [Paper]
Led a team in developing a U-Net based convolutional network with attention for DEM super-resolution in ISRO’s Smart India Hackathon. DEMs collected from USGS LiDAR and SRTM, NASA ASTER and ISRO CartoSAT were used to curate the training set. Our team proposed MASTER GAN architecture achieving state-of-the-art results (PSNR 31.024, SSIM 0.908){:target=”_blank” rel=”noopener”} which got published at IEEE IGARSS 2023 conference.
Static Badge Static Badge Static Badge

Improved Visual Attention Classification for Autism Spectrum Disorder through Time-Dependent Representations.
Research Internship Project at IIIT Allahabad, 2022
[Slides] [Code]
Trained a deep learning network on Saliency4ASD dataset using ResNet-50 and LSTM using novel time-dependent representations. Encoded embeddings with duration via new techniques such as time-masking and joint embedding. Demonstrated improvements with incorporation of duration as a feature for ASD classification.
Static Badge Static Badge

Text Localization using Efficient Attention
Research Project under Mentorship of Mr Hitesh Hinduja, 2021-23
[Code] [Paper]
Modified Mask R-CNN Architecture of Detectron2 library with efficient attention for improved text localization accuracy on SynthText dataset.
Static Badge

Robust Face Recognition Security System
HackJMI Hackathon Project, 2021
[Code]
Developed a robust face recognition security system using MTCNN, VGGFace, and inception-resnet siamese network capable of detecting spoof faces. A website with flask backend was created as MVP which won runner-up position at the hackathon.
Static Badge Static Badge Static Badge Static Badge

Novel Bible Verse Generator
First Deep Learning Project, 2021
[Notebook] [Example]
Trained a character-level neural language model on Bible (KJV) using LSTM. Built a loop to generate 1000 characters from the seed text which was the fake Bible verse quote from pulp fiction (1994)
Static Badge