Experience
Total Experience: 4 years and 2 months +
Technology Innovation Institute (TII)📍Abu Dhabi, UAE
👨💻 Machine Learning Engineer ⌛ 2023/Oct - Present (~11 months +)
Developing native audio understanding capabilities in Large Language Models (LLMs) by fine-tuning to align the audio encoder
Accelerating LLM by continual pretraining with knowledge distillation into intermediate layers, and recurrent multi-token prediction
Introduced image-understanding in text-only LLMs(7B+ parameters) via distributed fine-tuning with visual QA datasets using AWS SageMaker and Huggingface's Transformers, TRL, Datasets, and Accelerate packages
Led the development of the novel vision LLM finetuning dataset, VisCon, with leaky visual conversations, leveraging contextual web data and enhancing performance across multiple visual QA benchmarks
Enhanced FalconLLM by refining the distributed pretraining framework, Gigatron, integrating Triton for new activation functions, and optimizing backward pass implementations for new architectural changes
G42’s Inception📍Abu Dhabi, UAE
👨💻 Applied Scientist ⌛ 2023/Jun - 2023/Aug (~3 months)
Collaborated in developing Jais, an English-Arabic Large Language Model (LLM)
Set up and managed the LLM Arena evaluation using the FastChat package, benchmarking in-house, open-source, and GPT4 models with human annotators through Elo ratings
Processed and utilized LLM Arena data to align the fine-tuned LLMs on harmlessness and usefulness using Reinforcement Learning from Human Feeback (RLHF) and Direct Preference Optimization (DPO) with the TRL package
Microsoft Research 📍Bengaluru, India
👨💻 Machine Learning Research Intern ⌛ 2022/May - 2022/Aug (~3 months)
Extensively evaluated the design choices for Text-To-Speech systems and open-sourced state-of-the-art models for 13 Indian languages [ICASSP Paper]
Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI) 📍Abu Dhabi, UAE
👨💻Machine Learning Research Assistant ⌛2021/Jan - 2021/Jul (~7 months)
Explored the Double Variational Autoencoder Network (DoENet) for unsupervised adversarial example classification in computer vision, demonstrating its target classifier and attack-agnostic nature, leading to improved performance
TCS Research 📍Chennai, India
👨💻 Machine Learning Developer ⌛2019/Jul - 2020/Nov (~1 year and 5 months)
👨💻 Machine Learning Developer Intern ⌛2019/Apr - 2019/Jul (~3 months)
Enhanced the sales forecasting models for a dynamic pricing system of a prominent retail client, leveraging deep neural networks such as RNNs and LSTNet, along with a novel N-gram-based method [US Patent]
Implemented RNN-based spatiotemporal travel time prediction models to evaluate against temporal difference-based methods [IJCNN Paper]
Developed an employee profile retrieval system that parses information documents and responds to input text queries, as part of an internal profile hunt challenge [Special Initiative Award]
Developed a yield prediction model for hybrid corn crops to recommend effective crossing of species, winning an internal hackathon out of 8 teams [Innovation Pride Award]
Designed computer vision pipelines to solve curved text extraction, perspective correction, etc., winning an internal image enrichment hackathon with over 100 participants [Innovation Pride Award]
IIT Madras | AI4Bharat | One Fourth Labs 📍Chennai, India
👨💻 Machine Learning Project Intern ⌛2018/Dec - 2019/Mar (~4 months)
👨💻 Summer Research Fellow ⌛2018/Jun - 2018/Aug (~2 months)
Created a scene text translation system for Indian languages using synthetic datasets, incorporating an Efficient and Accurate Scene Text Detector (EAST) for detection, and Convolutional Recurrent Neural Networks (CRNN) for Classification and Recognition [Dataset] [Detection] [Classification] [Recognition]
Set up programming competitions for the One Fourth Labs Deep Learning course [Course]
Investigated attention models in Deep Learning, analyzing foreground region detection through proxy data with distinct statistical properties [Report]