At Google, we are at the forefront of next-generation technologies that transform how billions of users interact with information. We are seeking a Machine Learning GPU Performance Engineer to join our Core team, responsible for building and optimizing the technical foundation behind Google’s flagship products. This role focuses on enhancing GPU performance for machine learning models, ensuring they are efficient and scalable.
As a Machine Learning GPU Performance Engineer, you will play a crucial role in identifying performance opportunities, guiding GPU performance towards state-of-the-art levels, and supporting the integration of large-scale machine learning models. Your expertise will help drive significant advancements in our GPU-based infrastructure.
Key Responsibilities
- Benchmarking and Performance Analysis. Identify and maintain benchmarks for LLM (Large Language Model) training and serving that reflect Google production, industry standards, and ML community practices. Use these benchmarks to spot performance opportunities and drive improvements in XLA
- Collaboration. Work closely with Google product teams, including DeepMind, to address ML model performance issues, onboard new LLM models, and ensure efficient training and serving of LLMs on a large scale (e.g., thousands of GPUs).
- Architecture Simulations. Conduct architecture-level simulations on GPU designs and perform roofline analysis to guide internal teams.
- Performance Benchmarks. Run and analyze performance benchmarks on GPU hardware using both internal and external tools.
- Optimization. Analyze performance metrics to identify bottlenecks and design solutions to enhance performance and efficiency at Google’s fleet-wide scale.
Minimum Qualifications
- Education. Bachelor’s degree or equivalent practical experience.
Experience
- 3 years of experience in testing, maintaining, or launching software products, with 1 year of experience in software design and architecture.
- 3 years of experience in performance, systems data analysis, visualization tools, or debugging.
- 5 years of experience in software development, including expertise in C++ and Python, and with data structures/algorithms.
Preferred Qualifications
- Education. Master’s degree or PhD in Computer Science or a related technical field.
Experience
- 1 year of experience in a technical leadership role.
- Experience with developing accessible technologies.
Skills
- Advanced knowledge in GPU performance and machine learning optimization.
- Familiarity with architecture simulations and performance benchmarking tools.
Why Google?
Google’s Core team is responsible for the underlying design elements, developer platforms, and infrastructure that support our flagship products. By joining us, you will have the opportunity to work on critical projects that drive technological innovation and influence key technical decisions across the company. We are looking for engineers who are versatile, display leadership qualities, and are eager to tackle new challenges.
Equal Opportunity Employment
Google is committed to creating an inclusive environment for all employees. We are an equal opportunity employer and embrace diversity. We welcome applicants regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, or Veteran status. We also consider qualified applicants with criminal histories, consistent with legal requirements. For accommodations related to disability, please complete our Accommodations for Applicants form.