Sr. Site Reliability Engineer

Gurugram, Haryana, India
Aug 02, 2024
Aug 02, 2025
Hybrid
Full-Time
4 Years
Job Description

As a Site Reliability Engineer (SRE) at Gartner, you will be pivotal in ensuring the operational readiness of critical client-facing applications. Your responsibilities will include evaluating application performance, reliability, scalability, and observability, as well as identifying and resolving production issues. You will manage applications and infrastructure as code, execute chaos tests, and oversee alerts and dashboards.

What You’ll Do

  1. Full Stack Triaging. Analyze alerts and collaborate with engineers to diagnose and resolve application performance and stability issues.
  2. Incident Management. Partner with cross-functional teams during production incidents to provide technical insights and root cause analysis.
  3. Service Level Objectives (SLOs). Work with product owners to define and track SLOs, ensuring systems meet these objectives over time.
  4. Dashboard & Reporting. Design and develop dashboards and reports to communicate key metrics and performance indicators.
  5. Alerting and Monitoring. Improve alerting mechanisms and manage alerts to enhance the system’s observability.
  6. Application Resiliency. Perform single point of failure analysis, create scenarios for resiliency testing, and ensure performance considerations are integrated early in the SDLC.
  7. Performance Testing. Execute performance and chaos tests, and analyze results using APM tools to identify issues.
  8. Infrastructure Management. Assist with monitoring infrastructure capacity, recommending optimizations, and identifying cost-saving opportunities.
  9. Documentation. Document findings, analysis, and results, and present them to stakeholders.
  10. Automation and Analytics. Use automation to reduce problem recurrence and perform analytics on past incidents to understand root causes.
  11. Flexible Support. Provide operational support for major releases and conferences, and participate in an on-call schedule.

What You’ll Need

  1. Education. Bachelor's or Master’s degree in Engineering or a related field.
  2. Experience. 4-6 years in IT, with a focus on DevOps, SRE, or performance engineering.

Technical Skills

  • Proficiency in triaging production issues using APM tools (Dynatrace, AppDynamics, New Relic) and log aggregation tools (Splunk, ELK).
  • Experience with SRE concepts (SLI/SLOs, error budgets) and cloud services (AWS, Azure).
  • Knowledge of Docker, CI/CD processes (Jenkins, Argo), chaos engineering, and automation scripting (Python, Shell).
  • Familiarity with Infrastructure as Code (Terraform) is a plus.
  • Soft Skills. Strong analytical abilities, excellent communication skills, and a proactive approach to problem-solving.

Who You Are

  1. Motivated Performer. High-potential individual with a proven ability to influence and lead.
  2. Strong Communicator. Excellent interpersonal skills and the ability to solve complex problems.
  3. Teachable. Open to feedback and continuous improvement.
  4. Resilient. Able to handle ambiguity and unexpected changes with a positive outlook.

Why Gartner?

At Gartner, we guide leaders who shape the world. With a global presence, we offer a supportive work environment where your career can flourish. Our teams are diverse, inclusive, and dedicated to delivering results. We provide competitive compensation, world-class benefits, and a hybrid work model to support your professional and personal growth.

How to Apply

Ready to grow your career with Gartner? Apply now and be part of a team that values innovation and collaboration.

Gartner Applicant Privacy Policy

For accommodation requests due to disabilities, please contact Human Resources at +1 (203) 964-0096 or email [email protected].