Home India Multimodal Research Engineer (AI Labs)

Home India Multimodal Research Engineer (AI Labs)

Multimodal Research Engineer (AI Labs)

Full time at Krutrim in India
Posted on February 9, 2025

Job details

Multimodal and Vision AI Research Engineer / Scientist Location: Bangalore (India), Singapore and Palo Alto (CA, US) Type of Job: Full-time About Krutrim: is building AI computing for the future. Our envisioned AI computing stack encompasses the AI computing infrastructure, AI Cloud, multilingual and multimodal foundational models, and AI-powered end applications. We are India’s first AI unicorn and built the first foundation model from the country. Our AI stack is empowering consumers, startups, enterprises and scientists across India and the world to build their end AI applications or AI models. While we are building foundational models across text, voice, and vision relevant to our focus markets, we are also developing AI training and inference platforms that enable AI research and development across industry domains. The platforms being built by Krutrim have the potential to impact millions of lives in India, across income and education strata, and across languages. The team at Krutrim represents a convergence of talent across AI research, Applied AI, Cloud Engineering, and semiconductor design. Our teams operate from three locations: Bangalore, Singapore & San Francisco. Job Description: We are looking for experienced Generative AI Engineers to train, optimize, scale, and deploy a variety of generative AI models such as large language models, voice/speech foundation models, vision and multi-modal foundation models using cutting-edge techniques and frameworks. In this role, you will conduct advanced research and development to push the boundaries of what is possible with generative AI and language models. Responsibilities:

  1. Research, architect, and deploy new generative AI methods such as autoregressive models, causal models, and diffusion models
  2. Refine foundation model infrastructure to support the deployment of optimized AI models with a focus on C/C++, CUDA, and kernel-level programming enhancements
  3. Implement state-of-the-art optimization techniques, including quantization, distillation, sparsity, streaming, and caching, for model performance enhancements
  4. Train or fine-tune vision models for representation (like, Vision Transformers, Q-Former, CLIP, SigLIP.), generation (like, Stable diffusion, Stable cascade), video representation (like, Video-Swin transformer).
  5. Design and develop novel large language models and corresponding architectures by leveraging transformers, Mixture-of-experts, attention mechanisms (FlashAttention (w/ MQA, GQA), MLA (Multi-head Latent Attention) and state-of-the-art architectures.
  6. Implement large multimodal models following latest architectures - early fusion and deep fusion.
  7. Experience with audio models for representation (like, W2V-BERT), generation (like, Hi-Fi GAN,, SeamlessM4T) is a plus.
  8. Should be able to innovate for the state-of-the-art architectures involving Panoptic Segmentation, Image Classification and Image Generation.
  9. Practice around AI Arts, Image Prompts, Conditional Image Generation will be an additional advantage.
  10. Drive innovations in NLP techniques like text generation, summarization, translation, question answering, etc. enabled by generative models
  11. Integrate and tailor frameworks such as PyTorch, TensorFlow, DeepSpeed, Lightening, Habana and FSDP for the advancement of super-fast model training and inference
  12. Advance the deployment infrastructure with MLOps frameworks such as KubeFlow, MosaicML, Anyscale, and Terraform, ensuring robust development and deployment cycles
  13. Publish papers at top-tier AI/ML conferences like NeurIPS, ICML, ICLR on new research contributions
  14. Collaborate with engineering teams to productionize research advancements into scalable services and products
Qualifications:
  1. Ph.D. or MS with 2+ years of research / applied research experience in LLMs, NLP, CV, Reinforcement Learning, Voice, and Generative models
  2. Demonstrated expertise in high-performance computing with proficiency in Python, C/C++, CUDA, and kernel-level programming for AI applications
  3. Extensive experience in the optimization of training and inference for large-scale AI models, including practical knowledge of quantization, distillation, and LLMOps
  4. Prior experience with large-scale distributed training and fine-tuning of foundation models such as GPT-3, LLaMA2, AlphaFold, and DALL-E
  5. Experience with language modeling evaluation, prompt tuning and engineering, instruction tuning, and/or RLHF
  6. Research contributions in NLP, generative modeling, LLMs demonstrated through publications and products
  7. Strong programming skills and proficiency in Python, TensorFlow/PyTorch, and other ML frameworks and tools
  8. Experience in Information Extraction, Question Answering, Conversational Agents (Chatbots), Data Visualization and/or text-to-image models
  9. Excellent communication and collaboration skills to work cross-functionally with various teams

Apply safely

To stay safe in your job search, information on common scams and to get free expert advice, we recommend that you visit SAFERjobs, a non-profit, joint industry and law enforcement organization working to combat job scams.

Share this job
See All Multimodal Jobs
Feedback Feedback