AWS Data Engineer

Gurugram, Haryana, India
Oct 08, 2024
Oct 08, 2025
Onsite
Full-Time
6 Years
Job Description

We are seeking a highly skilled and motivated Data Engineer to join our dynamic team at Infogain. You will play a key role in building, optimizing, and maintaining our data platforms. As part of a forward-thinking organization, you will leverage cutting-edge technologies to develop scalable, efficient data solutions that support various business initiatives across technology, healthcare, insurance, telecom, and more. If you have a passion for data engineering, extensive experience in ETL, and proficiency in big data technologies, this is the perfect opportunity for you.

Key Responsibilities

  1. Data Platform Development. Design and develop data platforms and components like Data Lakes, Cloud Data Warehouses, APIs, and data pipelines (batch and streaming) using tools like Apache Spark, Apache Hive, and Airflow.
  2. ETL & Data Processing. Build and optimize ETL processes to transform, cleanse, and process large datasets. Leverage tools like Python, Spark, and SQL/PLSQL, utilizing Parquet/ORC formats.
  3. Automation & Orchestration. Utilize Airflow for workflow orchestration, automating data workflows, and managing batch and stream processing data solutions.
  4. Scalability & Optimization. Implement scalable data pipelines with containerization (Docker, Kubernetes) and optimize data storage and retrieval using advanced data modelling (Relational, Dimensional, E-R modelling).
  5. Deployment & Monitoring. Automate deployment processes with CI/CD tools like Jenkins and Spinnaker. Monitor performance and troubleshoot using DataDog and Splunk to ensure system reliability.
  6. Collaboration & Project Management. Collaborate with cross-functional teams, including developers and data scientists, using tools like JIRA and Confluence for seamless project tracking and documentation.
  7. On-Call Support. Provide on-call production support for data pipelines, handling incidents and troubleshooting data issues to ensure data pipeline reliability.
  8. Security & Best Practices. Adhere to security best practices in managing sensitive data, ensuring data integrity, and maintaining up-to-date documentation using Confluence.

Required Skills & Expertise

  1. Programming Languages. Python, Bash/Unix/Linux
  2. Big Data Technologies. Apache Spark, Apache Hive
  3. Databases. PostgreSQL, SQL
  4. ETL & Data Modelling. Experience in designing relational, dimensional, and E-R data models
  5. Cloud Platforms. AWS (EC2, ECS, S3, SNS, CloudWatch)
  6. Containerization. Docker, Kubernetes
  7. CI/CD Tools. GitHub, Jenkins, Spinnaker
  8. Additional Skills. Familiarity with Scala and the RCP Framework is a plus

Experience

  • 6-8 Years of experience in Data Engineering, with extensive exposure to ETL processes, big data, and cloud technologies.

About Infogain

Infogain is a leading human-centered digital platform and software engineering company based in Silicon Valley. We drive business transformation for Fortune 500 companies across diverse industries including healthcare, technology, insurance, telecom, and retail. Our expertise spans cutting-edge technologies such as cloud, microservices, IoT, and AI, helping companies achieve experience-led digital transformations.