We’re looking for passionate, cloud-savvy innovators like you who can help drive our cloud infrastructure to the next level. If you're skilled at creating AWS/Azure/GCP cloud-based infrastructure using Infrastructure as Code (IaC) and thrive on deploying applications in production environments, this is your chance to make an impact.
Why We Want You
- You’re experienced in building cloud-based infrastructure (AWS/Azure/GCP) using Infrastructure as Code (IaC) tools like Terraform and Ansible.
- You’re passionate about deploying applications in production environments and automating processes to remove manual steps.
- Your expertise in AI/ML tools helps optimize cloud infrastructure, particularly in cost management.
- You have a proactive, ownership-driven attitude and always seek out opportunities to add value.
What You’ll Love About This Role
- Take full ownership of managing the entire lifecycle of client-facing cloud infrastructure and application deployments across multiple digital marketing products.
- Work as a Site Reliability Engineer (SRE), performing OS patching, vulnerability mitigation, and handling zero-day vulnerabilities.
- Collaborate closely with cross-functional teams (Product, Delivery, Support) using Cloud-DevOps and Kanban practices to ensure faster time to market.
- Identify and implement cloud cost optimization strategies, ensuring efficient AWS usage and control.
Key Responsibilities
- Work with Infrastructure as Code (IaC) tools like Terraform and Ansible to build and manage cloud infrastructure on AWS, including Compute, Storage, Serverless, and Networking.
- Automate manual tasks and eliminate daily toil using CI/CD pipelines with tools such as GoCD.
- Manage data infrastructure including Oracle, Redshift, and EMR clusters.
- Apply your knowledge of Unix administration, networking (SSO, SSL, LDAP), and cloud security to maintain and troubleshoot infrastructure.
- Utilize the K-ELK stack for troubleshooting and infrastructure monitoring.
- Implement AI/ML operations (MLOps) to streamline automation tooling for operational efficiency.
Qualifications
- Bachelor’s degree (or equivalent) in Computer Science or a related field.
- 5+ years of experience in systems/DevOps/network engineering, supporting production web applications.
- At least 2+ years of hands-on experience with AWS Cloud infrastructure and automation, proficient in scripting languages like Python, Bash, or PHP.
- AWS SysOps / Azure Administrator or RHCE/RHCSA certification would be a plus.
Why Epsilon?
At Epsilon, diversity and inclusion are at the core of our culture. We strive to attract, engage, and retain talented individuals from all backgrounds, promoting equal employment opportunities for women, people of color, the LGBTQ community, and those with disabilities. Join a company that values your unique perspective and helps you grow while making a difference.