LLMs.md

# Gokul

Source website: https://www.gokul.ch/home

Extracted: 2026-07-04

Purpose: single Markdown version for LLM-readable context.

## Site Pages

- Home: https://www.gokul.ch/home

- Experience: https://www.gokul.ch/experience

- Education: https://www.gokul.ch/education

- Publications: https://www.gokul.ch/publications

- Skills: https://www.gokul.ch/skills

- Honors: https://www.gokul.ch/honors

## Home

Source: https://www.gokul.ch/home

🙏 வணக்கம் 👋 Hello, world 🏔️ Grüezi

I am a machine learning researcher and engineer, currently a Marie Skłodowska-Curie Actions PhD Fellow at IBM Research Zurich × TU Wien, specializing in Large Language Models, Speech Processing, and AI for Code.

Previously, I worked at the Technology Innovation Institute, TCS Research, Microsoft Research, and IIT Madras - AI4Bharat, where I developed applied AI systems across language, audio/speech, vision, and time-series. I completed an MSc from MBZUAI and a BTech from Anna University.

I have led winning teams at the IEEE SLT international hackathon and several national AI challenges across the UAE and India. My research includes a U.S. patent and first-authored papers in venues such as ICASSP, ASRU, and PAKDD.

Location: Zurich, Switzerland 🇨🇭

Previously: Abu Dhabi, UAE 🇦🇪; Tamil Nadu, India 🇮🇳

CV date shown on site: Jul, 2026

### Home Links

- IBM Research Zurich: https://research.ibm.com/labs/zurich

- TU Wien: https://www.tuwien.at/

- Technology Innovation Institute: https://www.tii.ae/

- TCS Research: https://www.tcs.com/what-we-do/research

- Microsoft Research: https://www.microsoft.com/en-us/research/

- IIT Madras: https://www.iitm.ac.in/

- AI4Bharat: https://ai4bharat.iitm.ac.in/

- MBZUAI: https://mbzuai.ac.ae/

- Anna University: https://www.annauniv.edu/

- Curriculum Vitae: https://drive.google.com/file/d/1QK_IjcUspMcB2I_2aEd_GDKV9iByk0t0/view?usp=sharing

### Contact And Profiles

- Email: gokul.ch@outlook.com

- LinkedIn: https://www.linkedin.com/in/gokulkarthik/

- GitHub: https://github.com/gokulkarthik

- Google Scholar: https://scholar.google.com/citations?hl=en&user=seFW1dkAAAAJ

- X: https://x.com/gokul_ae

## Experience

Source: https://www.gokul.ch/experience

### IBM Research

Location: Zurich, Switzerland 🇨🇭

Role: Machine Learning Researcher | Marie Skłodowska-Curie Actions PhD Fellow

Dates: 2026/Jan - Present

Details:

- Working on Code LLMs under the ARMADA project, jointly funded by the Swiss SERI 🇨🇭 and the European Union 🇪🇺.

### Technology Innovation Institute (TII)

Location: Abu Dhabi, UAE 🇦🇪

Role: Machine Learning Research Engineer

Dates: 2023/Oct - 2025/Nov

Duration: 2 years and 2 months

Details:

- Led end-to-end development of audio and vision LLMs, combining research on synthetic fine-tuning datasets, novel architectures, training methodologies, and benchmark design with engineering of scalable multi-node training and inference systems.

- Explored LLM-based audio synthesis using neural audio codecs and its codebook compression with knowledge distillation.

- Developed WavLink, a CLIP-style audio-text embedding model that augments the Whisper encoder with a global token, achieving state-of-the-art retrieval and efficient 8x smaller embeddings with minimal performance drop. ICASSP paper: https://arxiv.org/abs/2601.15118

- Developed Falcon3-Audio, audio-understanding LLMs that surpass prior open-weight models by 10+ points on multi-domain benchmarks such as MMAU and AirBench, leveraging a simpler architecture, single-stage training, and under 30K hours of public data. ASRU paper: https://arxiv.org/abs/2509.07526

- Developed VisCon, a novel LLM fine-tuning dataset with leaky visual conversations and contextual information for enhanced visual QA. PAKDD paper: https://arxiv.org/abs/2502.10250

- Integrated image understanding in text-only LLMs via distributed fine-tuning with CLIP's vision encoder.

- Enhanced distributed pretraining of Falcon3 LLM, integrating Triton and optimizing backward-pass for architectural updates in the Megatron-based framework.

- Accelerated LLM inference through self-distillation, intermediate layer prediction, and recurrent multi-token prediction.

### G42

Location: Abu Dhabi, UAE 🇦🇪

Role: Applied Scientist

Dates: 2023/Jun - 2023/Aug

Duration: 3 months

Details:

- Collaborated on the development of Jais, an English-Arabic LLM, at G42's Inception.

- Managed the LLM Arena framework, benchmarking in-house, open-source, and GPT-4 models with human annotators using Elo ratings.

- Leveraged LLM Arena data to align LLMs with harmlessness and usefulness through RLHF and DPO techniques using the TRL package.

### Microsoft Research

Location: Bengaluru, India 🇮🇳

Role: Machine Learning Intern

Dates: 2022/May - 2022/Aug

Duration: 3 months

Details:

- Extensively evaluated the design choices for AI4Bharat's Indic Text-To-Speech systems and open-sourced state-of-the-art models for 13 Indian languages. ICASSP paper: https://arxiv.org/abs/2211.09536

### TCS Research

Location: Chennai, India 🇮🇳

Role: Machine Learning Research Engineer

Dates: 2019/Apr - 2020/Nov

Duration: 1 year and 8 months

Details:

- Improved time-series forecasting and document retrieval systems using deep learning, NLP, and computer vision.

- Enhanced retail sales forecasting with deep neural networks (RNNs, LSTNet) and a novel N-gram method for dynamic pricing. US patent: https://patents.google.com/patent/US11416881B2/en

- Implemented RNN-based spatio-temporal travel-time prediction, benchmarking against temporal-difference-based methods. IJCNN paper: https://ieeexplore.ieee.org/document/9207455

- Built an employee profile retrieval system using information parsing and natural language query processing. Special Initiative Award: https://drive.google.com/file/d/18K4NXnx6A7RzJiQVvi9imvaawO3phTu2/view?usp=sharing

- Developed a winning yield-prediction model for hybrid corn crops, recommending effective species crossing in the internal hackathon. Innovation Pride Award: https://drive.google.com/file/d/1AafL9nWv67sWwDtMLFZVTspv9_EPsTpd/view?usp=sharing

- Designed winning computer vision pipelines for curved text extraction and perspective correction in the internal hackathon with 100+ participants. Innovation Pride Award: https://drive.google.com/file/d/14NB2Q0oGoL71MauA6LmNi-HoIfk5oCfw/view?usp=sharing

### IIT Madras | AI4Bharat | One Fourth Labs

Location: Chennai, India 🇮🇳

Role: Machine Learning Intern

Dates: 2018/Dec - 2019/Mar

Duration: 4 months

Role: Summer Research Fellow

Dates: 2018/Jun - 2018/Aug

Duration: 2 months

Details:

- Built a multilingual scene text translation system and explored attention mechanisms for foreground detection.

- Developed a scene text translation system for Indian languages using synthetic datasets, incorporating an Efficient and Accurate Scene Text Detector (EAST) for detection, and Convolutional Recurrent Neural Networks (CRNN) for classification and recognition.

- Set up programming competitions for the One Fourth Labs Deep Learning course.

- Investigated attention models in deep learning, analyzing foreground region detection through proxy data with distinct statistical properties.

### Experience Page Links

- ARMADA: https://armada-dn.eu/

- WavLink: https://arxiv.org/abs/2601.15118

- ICASSP Paper: https://arxiv.org/abs/2601.15118

- Falcon3-Audio: https://arxiv.org/abs/2509.07526

- ASRU Paper: https://arxiv.org/abs/2509.07526

- VisCon: https://huggingface.co/datasets/tiiuae/viscon-1m

- PAKDD Paper: https://arxiv.org/abs/2502.10250

- Falcon3 LLM: https://falconllm.tii.ae/falcon3/index.html

- Jais: https://inceptionai.ai/jais/index.html

- AI4Bharat's Indic Text-To-Speech: https://ai4bharat.iitm.ac.in/areas/model/TTS/IndicTTS

- ICASSP Paper: https://arxiv.org/abs/2211.09536

- US Patent: https://patents.google.com/patent/US11416881B2/en

- IJCNN Paper: https://ieeexplore.ieee.org/document/9207455

- Special Initiative Award: https://drive.google.com/file/d/18K4NXnx6A7RzJiQVvi9imvaawO3phTu2/view?usp=sharing

- Innovation Pride Award: https://drive.google.com/file/d/1AafL9nWv67sWwDtMLFZVTspv9_EPsTpd/view?usp=sharing

- Innovation Pride Award: https://drive.google.com/file/d/14NB2Q0oGoL71MauA6LmNi-HoIfk5oCfw/view?usp=sharing

- Indian Scene Text Dataset: https://github.com/GokulKarthik/Indian-Scene-Text-Dataset

- Indian Scene Text Detection: https://github.com/GokulKarthik/Indian-Scene-Text-Detection

- Indian Scene Text Classification: https://github.com/GokulKarthik/Indian-Scene-Text-Classification

- Indian Scene Text Recognition: https://github.com/GokulKarthik/Indian-Scene-Text-Recognition

- One Fourth Labs Deep Learning course: https://padhai.onefourthlabs.in/

- IITM SRF Report: https://github.com/gokulkarthik/Attention-Experiments-Neural-Network/blob/master/IITM_SRF_Report.pdf

## Education

Source: https://www.gokul.ch/education

### TU Wien

Location: Vienna, Austria 🇦🇹

Degree: PhD in Engineering Sciences (Artificial Intelligence) | DMKI Lab

Dates: 2026/Jan - Present

Role: Marie Skłodowska-Curie Actions PhD Fellow

Details:

- Working on Code LLMs under the ARMADA project, jointly funded by the Swiss SERI 🇨🇭 and the European Union 🇪🇺.

### Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI)

Location: Abu Dhabi, UAE 🇦🇪

Degree: MSc in Computer Vision | SPriNT-AI Lab

Dates: 2021/Aug - 2023/Jun

CGPA: 3.9 / 4

Featured in Khaleej Times: My MBZUAI Journey

Details:

- Thesis: Towards Learning Efficient Multilingual and Multimodal Representation.

- Relevant coursework: ML701 - Machine Learning; CV701 - Computer Vision; CV702 - Geometry for Computer Vision; CV703 - Visual Object Detection and Recognition; ML708 - Trustworthy AI.

- Related coursework projects: ACL Workshop Paper - Multilingual Question Answering Project; Object Detection with Vision Transformers Project; EMNLP Workshop Paper - Multimodal Hateful Meme Classification Project.

### Anna University | Thiagarajar College of Engineering

Location: Madurai, India 🇮🇳

Degree: BTech in Information Technology

Dates: 2015/Jul - 2019/May

CGPA: 9.74 / 10

Department Rank: 1 / 130

Award: Gold Medal of Excellence as the Best Outgoing Student out of 1000+ graduating students.

Details:

- Relevant coursework: Object Oriented Programming; Data Structures and Algorithms; Web Technologies; Data Base Management System; Software Engineering; Big Data; Data Mining.

### Education Page Links

- DMKI Lab: https://dmki-tuwien.github.io/

- ARMADA: https://armada-dn.eu/

- SPriNT-AI Lab: https://www.sprintai.org/

- My MBZUAI Journey: https://www.khaleejtimes.com/uae/education/abu-dhabi-this-indian-ai-graduate-developed-a-smart-solution-to-detect-hateful-memes-on-social-medi

- MSc thesis, Towards Learning Efficient Multilingual and Multimodal Representation: https://drive.google.com/file/d/1GlDyxrZ0ubkvMaNAHDRQDgL94kNRrHmK/view

- ACL Workshop Paper, Multilingual Question Answering Project: https://aclanthology.org/2022.dravidianlangtech-1.3/

- Object Detection with Vision Transformers Project: https://arxiv.org/abs/2205.05543

- EMNLP Workshop Paper, Multimodal Hateful Meme Classification Project: https://aclanthology.org/2022.nlp4pi-1.20/

- Gold Medal of Excellence: https://www.tce.edu/sites/default/files/PDF/BOS-Awardees-2019.pdf

## Publications

Source: https://www.gokul.ch/publications

Google Scholar: https://scholar.google.com/citations?hl=en&user=seFW1dkAAAAJ

Summary shown on site: 11 publications: 1 patent | 7 conferences | 3 others

### [c7] WavLink: Compact Audio-Text Embeddings with a Global Whisper Token

Authors: Gokul Karthik Kumar, Ludovick Lepauloux, Hakim Hacid

Venue/year: 2026 | ICASSP

Tags: #language #speech

Link: https://arxiv.org/abs/2601.15118

Abstract/summary from expanded site card:

Whisper has become the de-facto encoder for extracting general-purpose audio features in large audio-language models, where a 30-second clip is typically represented by 1500 frame features projected into an LLM. In contrast, audio-text embedding models like CLAP-based models have largely relied on alternative audio encoders, such as HTS-AT and PaSST, and have not leveraged Whisper effectively. WavLink is a compact audio-text embedding model that augments the Whisper encoder with a learnable global token, trained jointly with a text encoder. Through a systematic study of pretrained text encoders, loss functions, training modes, and data mixtures, it identifies configurations that yield state-of-the-art retrieval performance. A two-stage training recipe across three model sizes, combined with Matryoshka-style supervision, improves scalability and enables 8x smaller embeddings with minimal performance drop. WavLink also demonstrates competitive performance on AIR-Bench with MCQs and zero-shot classification.

### [c6] Competitive Audio-Language Models with Data-Efficient Single-Stage Training on Public Data

Authors: Gokul Karthik Kumar, Rishabh Saraf, Ludovick Lepauloux, Abdul Muneer, Billel Mokeddem, Hakim Hacid

Venue/year: 2025 | ASRU

Tags: #language #speech

Link: https://arxiv.org/abs/2509.07526

Abstract/summary from expanded site card:

Falcon3-Audio is a family of Audio-Language Models built on instruction-tuned LLMs and Whisper encoders. Using less than 30K hours of public audio data, including 5K unique hours, Falcon3-Audio-7B matches the best reported performance among open-weight models on the MMAU benchmark with a score of 64.14, matching R1-AQA, while emphasizing data efficiency, parameter efficiency, single-stage training, and transparency. The smallest 1B model remains competitive with larger open models ranging from 2B to 13B parameters. Extensive ablations find that common complexities such as curriculum learning, multiple audio encoders, and intricate cross-attention connectors are not required for strong performance, even compared to models trained on over 500K hours of data.

### [c5] VisCon-100K: Leveraging Contextual Web Data for Fine-tuning Vision Language Models

Authors: Gokul Karthik Kumar, Iheb Chaabane, Kebin Wu

Venue/year: 2025 | PAKDD

Tags: #language #vision

Link: https://arxiv.org/abs/2502.10250

Abstract/summary from expanded site card:

VisCon-100K is a dataset derived from interleaved image-text web documents to address the shortage of high-quality visual fine-tuning data for vision-language models. The approach transforms 45K web documents from the OBELICS dataset into 100K image conversation samples, using GPT-4V to generate image-contextual captions and OpenChat 3.5 to convert those captions into diverse free-form and multiple-choice question-answer pairs. Fine-tuning with this data improves VLM performance across multiple benchmarks. The method leverages accompanying web context, and finds that a leaky modality mix, where conversation samples contain questions answerable from both the image and contextual caption, outperforms non-leaky combinations of captions and Q&A pairs. VisCon-100K shows strong performance with ShareGPT4V-7B and IDEFICS2-8B, and the project also releases a contextual captioner and the larger VisCon-1M dataset.

### [o3] Falcon 3 Family of Open Foundation Models

Authors: Falcon LLM Team @ TII UAE

Venue/year: 2024 | HuggingFace Blog

Tags: #language

Link: https://huggingface.co/blog/falcon3

Summary from expanded site card:

Falcon3 is a family of decoder-only large language models under 10 billion parameters, developed by Technology Innovation Institute in Abu Dhabi. The release focuses on performance, training efficiency, open and accessible large foundation models, and expanded science, math, and code capabilities.

### [o2] Towards Learning Efficient Multilingual and Multimodal Representation

Author: Gokul Karthik Kumar

Venue/year: 2023 | MSc Thesis @ MBZUAI

Tags: #language #vision #speech

Link: https://drive.google.com/file/d/1GlDyxrZ0ubkvMaNAHDRQDgL94kNRrHmK/view?usp=sharing

Abstract/summary from expanded site card:

This thesis focuses on efficient representation methods for multilingual and multimodal data in machine learning. The research is divided into three stages: improving multilingual representation for question-answering and text-to-speech, improving multimodal fusion for hateful meme classification, and unifying earlier stages through image retrieval and multilingual/multimodal representation learning.

The thesis proposes approaches using pretrained models and multimodal fusion to improve performance and cultural relevance across machine learning applications. The Hate-CLIPper architecture achieves state-of-the-art performance on meme detection, while training with a natively multilingual and multimodal Wikipedia Image-Text dataset plus English text augmentation enables retrieval of culturally relevant images in ten Indian languages. The work contributes to efficient representation methods for multilingual and multimodal data and motivates further work on pretrained models and multimodal fusion.

### [c4] Towards Building Text-To-Speech Systems for the Next Billion Users

Authors: Gokul Karthik Kumar, Praveen S V, Pratyush Kumar, Mitesh M. Khapra, Karthik Nandakumar

Venue/year: 2023 | ICASSP

Tags: #language #speech

Link: https://arxiv.org/abs/2211.09536

Abstract/summary from expanded site card:

This work evaluates design choices for Indian language text-to-speech systems, including acoustic models, vocoders, supplementary loss functions, training schedules, and speaker and language diversity for Dravidian and Indo-Aryan languages. The study identifies monolingual models with FastPitch and HiFi-GAN V1, trained jointly on male and female speakers, as the best setup. Using this setup, the authors train and evaluate TTS models for 13 languages and find significant improvement over existing models as measured by mean opinion scores. The models are open-sourced on the Bhashini platform.

### [c3] Hate-CLIPper: Multimodal Hateful Meme Classification based on Cross-modal Interaction of CLIP features

Authors: Gokul Karthik Kumar, Karthik Nandakumar

Venue/year: 2022 | EMNLP Workshop

Tags: #language #vision

Paper: https://arxiv.org/abs/2210.05916

Code: https://github.com/gokulkarthik/hateclipper

Abstract/summary from expanded site card:

Hate-CLIPper addresses hateful meme detection by jointly considering image and text, since the two modalities may be related without conveying the same meaning individually. The architecture explicitly models cross-modal interactions between image and text representations from CLIP encoders using a feature interaction matrix. A simple classifier over this representation achieves state-of-the-art performance on the Hateful Memes Challenge dataset with an AUROC of 85.8, surpassing reported human performance of 82.65. Experiments on Propaganda Memes and TamilMemes show generalizability, and analysis suggests that the feature interaction matrix helps learn meaningful cross-modal concepts.

### [c2] MuCoT: Multilingual Contrastive Training For Question-Answering In Low-resource Languages

Authors: Gokul Karthik Kumar, Abhishek Singh Gehlot, Sahal Shaji Mullappilly, Karthik Nandakumar

Venue/year: 2022 | ACL Workshop

Tags: #language

Paper: https://arxiv.org/abs/2204.05814

Code: https://github.com/gokulkarthik/mucot

Abstract/summary from expanded site card:

MuCoT studies question answering in low-resource languages where large QA datasets are unavailable. The method augments target-language QA samples using translation and transliteration into other languages, then fine-tunes an mBERT-based QA model pretrained in English. Experiments on the Google ChAII dataset show that translation from the same language family improves question-answering performance, while cross-language-family augmentation can degrade performance. Adding a contrastive loss between translated question-context feature pairs during fine-tuning helps prevent cross-family degradation and yields marginal improvement.

### [o1] An Empirical Study Of Self-supervised Learning Approaches For Object Detection With Transformers

Authors: Gokul Karthik Kumar, Sahal Shaji Mullappilly, Abhishek Singh Gehlot

Venue/year: 2022 | ArXiv

Tags: #vision

Link: https://arxiv.org/abs/2205.05543

Abstract/summary from expanded site card:

This work studies self-supervised learning methods for object detection transformers such as DETR and Deformable DETR. Although masked image modeling has been explored for vision transformers, object detection transformers operate on CNN-extracted feature maps rather than raw image patches. The work uses the spatial structure of CNN feature maps to design self-supervised pretraining and multi-task learning approaches for the encoder of object detection transformers, exploring image reconstruction, masked image modeling, and jigsaw-style objectives. Preliminary experiments on iSAID show faster early convergence for DETR in pretraining and multi-task settings, though similar improvement is not observed for Deformable DETR multi-task learning.

### [p1] Method And System For Forecasting Sales Based On N-Gram Model

Authors: Gokul Karthik, Avinash Achar, Balaraman Ravindran

Venue/year: 2021 | US Patent

Tags: #time-series

Link: https://patents.google.com/patent/US11416881B2/en

Abstract/summary from expanded site card:

This patent describes a method and system for forecasting sales using an N-gram model. The method receives inputs for each product, including sales history and a current price bin. Product sales histories are discretized by clustering each product's sales history into groups based on maximum sales velocity range. A probability table is generated for discretized categorical sales using rounded weighted mean and median computations with an N-gram model. A smoothed probability table is then used for multi-step sales forecasting through joint, bootstrapped, and step-greedy approaches.

### [c1] Dynamic Bus Arrival Time Prediction: A Temporal Difference Learning Approach

Authors: LKP Vignesh, Avinash Achar, Gokul Karthik

Venue/year: 2020 | IJCNN

Tags: #time-series

Link: https://ieeexplore.ieee.org/document/9207455

Abstract/summary from expanded site card:

This paper addresses real-time bus arrival and travel-time prediction under uncertainty from dwell times, signals, seasonal variation, fluctuating demand, lack of lane discipline, diverse modes of transport, and excess vehicles. The method recasts dynamic prediction as a value-function prediction problem under a suitably constructed Markov reward process, then explores temporal-difference learning predictors. The approach trains with travel-time targets between any two bus stops while keeping the number of models approximately linear in the number of stops and controlling variation in travel-time targets. Experiments show comparable or superior performance on mid-length and long-length routes versus the state of the art.

## Skills

Source: https://www.gokul.ch/skills

### Technologies

- Large Language Models

- Generative AI

- Audio/Speech Processing

- Natural Language Processing

- Computer Vision

- Machine Learning

- Data Science

### Software Frameworks

- Git

- Docker

- FastAPI

- React

- Next.js

### Platforms

- AWS: IAM, EC2, S3, and SageMaker

- Google Cloud Platform

- Supabase

- Vercel

- Render

### Programming

- Python

- JavaScript

- PostgreSQL

### Python Packages

- PyTorch

- HuggingFace: Datasets, Transformers

- HuggingFace: TRL, PEFT, and Accelerate

- Lightning

- Triton

- Streamlit

- Pandas / cuDF

- Numpy

- Matplotlib

- NLTK

- OpenCV

- Coqui-TTS

## Honors

Source: https://www.gokul.ch/honors

### Press And Media

#### Khaleej Times

Title: Abu Dhabi: This Indian AI graduate developed a smart solution to detect hateful memes on social media

Description: Gokul Karthik Kumar says his experience at the Mohamed Bin Zayed University of Artificial Intelligence prepared him well for a career in AI research and development.

Link: https://www.khaleejtimes.com/uae/education/abu-dhabi-this-indian-ai-graduate-developed-a-smart-solution-to-detect-hateful-memes-on-social-medi

#### MBZUAI NEWS

Title: Innovating Agritech, serving the nation

Description: Two student teams from MBZUAI have taken top honors in the first edition of the Agritech Hackathon or “Agrithon,” organized by the Abu Dhabi Agriculture and Food Security Authority (ADAFSA) and held as part of the Abu Dhabi Agriculture and Food Security Week.

Link: https://mbzuai.ac.ae/news/innovating-agritech-serving-the-nation/

#### MBZUAI NEWS

Title: MBZUAI teams shine in competition

Description: The team of MBZUAI master’s students Gokul Karthik Kumar and Bokang Jia, with teammate Aakash Sasikumar from the University of Alberta, won the Best Potential Impact Project award for AutoDub at the IEEE SLT international hackathon, held in Qatar in January 2023.

Link: https://mbzuai.ac.ae/news/mbzuai-teams-shine-in-competition/

#### WIRED Middle East

Title: The Abu Dhabi AI researchers making video dubbing sync

Description: How a team of graduate students at the Mohamed Bin Zayed University of Artificial Intelligence are working to overcome the limitations of audio-visual dubbing technologies.

Video link: https://www.youtube.com/watch?v=1DkW3b5TRt4

Article link: https://wired.me/technology/artificial-intelligence/i-see-what-youre-saying-the-abu-dhabi-ai-researchers-making-video-dubbing-sync/

### Awards And Achievements

- 2023 | Best Potential Impact Project in the IEEE SLT 2022 international hackathon for Autodub, an AI-Human interactive dubbing platform that seamlessly integrates Transcription, Translation, Voice-over, and Background Music Mixing. Link: https://mbzuai.ac.ae/news/mbzuai-teams-shine-in-competition/

- 2021 | Winning Team in the ADAFSA Agrithon for developing visual plant disease diagnosis, optimal animal clinic placement, and disease outbreak zone classification. Link: https://mbzuai.ac.ae/news-events/Innovating-Agritech-serving-the-nation

- 2021 | Finished Second in the GITEX x AI-everything High Flyer challenge for the AI-powered dubbing presentation. Link: https://wired.me/technology/artificial-intelligence/i-see-what-youre-saying-the-abu-dhabi-ai-researchers-making-video-dubbing-sync/

- 2021 | Selected in Top 10 Teams for the deep tech startup incubation program by Khalifa Innovation Center. Link: https://khalifainnovation.ae/

- 2020 | Innovation Pride Award by TCS for winning the Image Enrichment Challenge 2 with over 100 participants. Link: https://drive.google.com/file/d/14NB2Q0oGoL71MauA6LmNi-HoIfk5oCfw/view?usp=sharing

- 2020 | Innovation Pride Award by TCS for winning the DDS ML Hackathon out of 8 teams for developing the corn-hybrid recommendation system. Link: https://drive.google.com/file/d/1AafL9nWv67sWwDtMLFZVTspv9_EPsTpd/view?usp=sharing

- 2019 | Special Initiative Award by TCS for the data driven solution to DDS profile hunt. Link: https://drive.google.com/file/d/18K4NXnx6A7RzJiQVvi9imvaawO3phTu2/view?usp=sharing

- 2019 | Winning Team Lead of IIT Madras AI Hackathon out of 500+ teams for developing safe route identification system. Link: https://drive.google.com/file/d/1Y8ZBfH1dErrFqgLbKJpiuOKrR5i3cIwh/view?usp=sharing

- 2019 | Gold Medal of Excellence for the best outgoing student out of 1000+ students of TCE. Link: https://drive.google.com/file/d/1tY7DuX7pD0g7YZTSRQu04mYgBemSihea/view?usp=sharing

- 2018 | Winning Team Lead of Guvi-HCL AI Hackathon out of 30+ teams for developing the qualitative recommendation app. Link: https://drive.google.com/file/d/1whwKlSiVRDIahEvXwv1T-b0rjDIZl1N6/view?usp=sharing

- 2017 | All India Rank 1 in the NPTEL course "Social Networks". Link: https://drive.google.com/file/d/1NDvWJr0998a7mey0mMrXksozyJkppvnP/view?usp=sharing

- 2017 | Top 8 in the MHRD Smart India Hackathon out of Top 50 teams for the development of smart evaluation application. Link: https://drive.google.com/file/d/1186uIe4bO6vdxyB5Pyhx6tq-EuJZygxi/view?usp=sharing

Google Sites

Report abuse