Gokul - Experience

IBM Research📍Zurich, Switzerland 🇨🇭

👨‍💻 Machine Learning Researcher | Marie Skłodowska-Curie Actions PhD Fellow ⌛ 2026/Jan - Present

Working on Code LLMs under the ARMADA project, jointly funded by the Swiss SERI🇨🇭and the European Union 🇪🇺.

Technology Innovation Institute (TII)📍Abu Dhabi, UAE 🇦🇪

👨‍💻 Machine Learning Research Engineer ⌛ 2023/Oct - 2025/Nov (2 years and 2 months)

Led end-to-end development of audio and vision LLMs, combining research on synthetic fine-tuning datasets, novel architectures, training methodologies, and benchmark design with engineering of scalable multi-node training and inference systems.

Explored LLM-based audio synthesis using neural audio codecs and its codebook compression with knowledge distillation.
Developed WavLink, a CLIP-style audio–text embedding model that augments the Whisper encoder with a global token, achieving state-of-the-art retrieval and efficient 8× smaller embeddings with minimal performance drop. (ICASSP Paper)
Developed Falcon3-Audio, audio-understanding LLMs that surpass prior open-weight models by 10+ points on multi-domain benchmarks (speech, music, sound) like MMAU and AirBench, leveraging a simpler architecture, single-stage training, and under 30K hours of public data. (ASRU Paper)
Developed VisCon, a novel LLM fine-tuning dataset with leaky visual conversations and contextual information for enhanced visual QA. (PAKDD Paper)
Integrated image understanding in text-only LLMs via distributed fine-tuning with CLIP’s vision encoder.
Enhanced distributed pretraining of Falcon3 LLM, integrating Triton and optimizing backward-pass for architectural updates in the Megatron based framework.
Accelerated LLM Inference through self-distillation, intermediate layers prediction, and recurrent multi-token prediction.

G42📍Abu Dhabi, UAE 🇦🇪

👨‍💻 Applied Scientist ⌛ 2023/Jun - 2023/Aug (~ 3 months)

Collaborated on the development Jais, an English-Arabic LLM, at G42's Inception.

Managed the LLM Arena framework, benchmarking models (in-house, open-source, and GPT-4) with human annotators using Elo ratings.
Leveraged LLM Arena data to align LLMs with harmlessness and usefulness through RLHF and DPO techniques using the TRL package.

Microsoft Research 📍Bengaluru, India 🇮🇳

👨‍💻 Machine Learning Intern ⌛ 2022/May - 2022/Aug (3 months)

Extensively evaluated the design choices for AI4Bharat's Indic Text-To-Speech systems and open-sourced state-of-the-art models for 13 Indian languages. (ICASSP Paper)

TCS Research 📍Chennai, India 🇮🇳

👨‍💻 Machine Learning Research Engineer ⌛2019/Apr - 2020/Nov (1 year and 8 months)

Improved time-series forecasting and document retrieval systems using deep learning, NLP, and computer vision.

Enhanced retail sales forecasting with deep neural networks (RNNs, LSTNet) and a novel N-gram method for dynamic pricing. (US Patent)
Implemented RNN-based spatio-temporal travel time prediction, benchmarking against temporal difference-based methods. (IJCNN Paper)
Built an employee profile retrieval system using information parsing and natural language query processing. (Special Initiative Award)
Developed a winning yield prediction model for hybrid corn crops, recommending effective species crossing in the internal hackathon. (Innovation Pride Award)
Designed winning computer vision pipelines for curved text extraction and perspective correction in the internal hackathon with 100+ folks. (Innovation Pride Award)

IIT Madras | AI4Bharat | One Fourth Labs 📍Chennai, India 🇮🇳

👨‍💻 Machine Learning Intern ⌛2018/Dec - 2019/Mar (4 months)

👨‍💻 Summer Research Fellow ⌛2018/Jun - 2018/Aug (2 months)

Built a multilingual scene text translation system and explored attention mechanisms for foreground detection.

Developed a scene text translation system for Indian languages using synthetic datasets, incorporating an Efficient and Accurate Scene Text Detector (EAST) for detection, and Convolutional Recurrent Neural Networks (CRNN) for Classification and Recognition. (Dataset, Detection, Classification, Recognition)
Set up programming competitions for the One Fourth Labs Deep Learning course.
Investigated attention models in Deep Learning, analyzing foreground region detection through proxy data with distinct statistical properties.