Microsoft’s Azure Foundry Model team is accelerating the next wave of AI innovation. We enable customers and partners to harness the full power of frontier AI models by providing a fully managed, high-performance inference platform built for scale, reliability, and responsible AI. Our team is responsible for hosting, optimizing, and scaling the inference stack for Azure AI Foundry models, including the latest and most advanced offerings from DeepSeek, Grok, Meta, Mistral, BlackForestLabs and other leading model providers.

Responsibilities

Lead the design and implementation of scalable, high-performance model inference infrastructure for serving frontier AI models in production.
Drive end-to-end inference performance improvements for state-of-the-art LLMs, including latency, throughput, and GPU efficiency optimization.
Scale the platform to meet rapidly growing inference demand while maintaining high availability and operational excellence.
Deliver core capabilities required toserve the latest GenAI models and reduce time-to-market for new model onboardings.
Collaborate with internal and external partners across Azure, model providers, and GPU vendors to build world-class inference solutions.
Mentor and guide engineers on inference, large-scale systems design, and best practices in high-performance AI serving.

Qualifications

Required Qualifications:

Bachelor's Degree in Computer Science or related technical field AND 2+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
- OR equivalent experience.

Other Requirements:

This role requires the ability to meet Microsoft, customer, and/or government security screening requirements, including: Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.

Preferred Qualifications:

3+ years of experience building and operating high-scale, reliable online systems.
Solid technical foundation in software engineering principles, distributed systems, and service architecture.
Hands-on experience building real-time online services with low latency and high throughput requirements.
Knowledge and experience in OSS, Docker, Kubernetes, python, C#, rust or equivalent programming languages.
Experience in LLM model areas including inferencing, pre-training or post-training area
Experience building or operating cloud-based large-scale inference platforms.

#aiplatform#

#coreai

Software Engineering IC3 - The typical base pay range for this role across the U.S. is USD $100,600 - $199,000 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $131,400 - $215,400 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:
https://careers.microsoft.com/us/en/us-corporate-pay

This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.

Apply now

See more open positions at Microsoft

Privacy policy Cookie policy