AI Research Engineer (Model Serving & Inference)

Tether Operations Limited

United Arab Emirates

full-time

A week ago

Description

Join Tether and Shape the Future of Digital Finance At Tether, we’re not just building products, we’re pioneering a global financial revolution. Our solutions enable seamless integration of reserve-backed tokens across blockchains, empowering businesses worldwide. Transparency and security are at the core of our mission to build trust in digital transactions. Innovate with Tether Tether Finance: Home of the trusted stablecoin USDT and innovative digital asset tokenization services. Tether Power: Eco-friendly energy solutions for Bitcoin mining, utilizing sustainable practices across diverse locations. Tether Data: Advancing AI and peer-to-peer tech with solutions like KEET for secure data sharing. Tether Education: Providing accessible digital learning to empower individuals in the digital economy. Tether Evolution: Pushing technological boundaries to merge human potential with innovation. Why Join Us? Our remote, global team is passionate about fintech innovation. Join us to work alongside top talent, influence industry standards, and grow with a fast-paced, industry-leading company. If you excel in English communication and are eager to contribute to cutting-edge platforms, Tether is your next career move. About the job: As part of our AI model team, you will innovate in model serving and inference architectures for advanced AI systems, focusing on optimizing deployment strategies for responsiveness, efficiency, and scalability across various hardware environments, including resource-constrained devices and complex multi-modal systems. Your role involves developing, testing, and deploying robust inference pipelines, establishing performance metrics, and troubleshooting bottlenecks to ensure high-throughput, low-latency AI performance in real-world applications. Responsibilities : Design and deploy high-performance model serving architectures optimized for diverse environments, including edge devices. Set and track performance targets such as latency reduction, token response improvements, and memory minimization. Conduct inference testing in simulated and live environments, monitoring key metrics and documenting results for continuous improvement. Prepare high-quality datasets and scenarios for real-world deployment challenges, especially on low-resource devices. Analyze and address bottlenecks in serving pipelines to enhance scalability and reliability. Collaborate with teams to integrate optimized inference frameworks into production, ensuring continuous monitoring and improvement. Minimum requirements: Degree in Computer Science or related field; Ph D preferred, with a strong record in AI R&D. Proven experience in kernel and inference optimization on mobile devices, demonstrating measurable performance improvements. Deep understanding of modern model serving architectures and optimization techniques for resource-constrained environments. Expertise in CPU and GPU kernel development for mobile platforms and experience deploying inference pipelines. Ability to translate empirical research into practical optimizations, with skills in evaluation frameworks and iterative improvement. #J-18808-Ljbffr

Posted: 7th July 2025 1.21 pm

Application Deadline: N/A

Apply Now