AWS Data Engineer

Gurugram, Haryana, India

Oct 28, 2024

Oct 28, 2025

Onsite

Full-Time

6 Years

Job Description

We are seeking a highly skilled and motivated Data Engineer to join our dynamic team. The ideal candidate will have extensive experience in ETL, Data Modeling, and Data Architecture, with a strong proficiency in optimizing ETL processes and designing big data solutions using Python.

Key Responsibilities

Develop and maintain a comprehensive data platform, including Data Lakes, cloud Data Warehouses, APIs, and both batch and streaming data pipelines.
Design and implement scalable data pipelines and applications that efficiently process large datasets with low latency using Apache Spark and Apache Hive.
Utilize orchestration tools like Airflow to automate and manage complex data workflows.
Collaborate with project management tools such as JIRA and Confluence to track project progress and enhance team communication.
Build data processing workflows leveraging Spark, SQL/PLSQL, and Python to transform and cleanse raw data into usable formats, employing Parquet/ORC for data storage solutions.
Implement containerization with Docker and orchestration with Kubernetes for data applications.
Optimize data storage and retrieval performance through effective data modeling techniques, including Relational, Dimensional, and E-R modeling.
Ensure data integrity and quality by implementing robust validation and error handling mechanisms within ETL processes.
Automate deployment processes using CI/CD tools like Jenkins and Spinnaker to guarantee reliable and consistent releases.
Monitor and troubleshoot data pipelines with tools like DataDog and Splunk to identify performance bottlenecks and maintain system reliability.
Participate in Agile methodologies such as Scrum and Kanban, including sprint planning, daily stand-ups, and retrospective meetings.
Conduct code reviews to uphold coding standards, best practices, and scalability considerations.
Maintain clear and comprehensive documentation using Confluence for data pipelines, schemas, and processes.
Provide on-call support for production data pipelines, responding to incidents and resolving issues promptly.
Collaborate with cross-functional teams, including developers, data scientists, and operations, to tackle complex data engineering challenges.
Stay informed on emerging technologies and industry trends to continuously enhance data engineering processes and tools.
Contribute to developing reusable components and frameworks to streamline data engineering tasks across various projects.
Utilize version control systems like Git for effective codebase management and team collaboration.
Leverage IDEs like IntelliJ IDEA for efficient development and debugging of data engineering code.
Adhere to security best practices in handling sensitive data and implementing access controls within the data lake environment.

Job Requirements

Experience. 6-8 years in data engineering or related fields.
Programming Languages. Proficiency in Python, Bash/Unix/Linux.
Big Data Technologies. Experience with Apache Spark and Apache Hive.
Cloud Services. Familiarity with AWS services including EC2, ECS, S3, SNS, and CloudWatch.
Databases. Proficient in PostgreSQL.
Application Development. Experience with RCP Framework.
Containerization & Orchestration. Hands-on experience with Docker and Kubernetes.
CI/CD Tools. Proficient in GitHub, Jenkins, and Spinnaker.
Additional Skills. Knowledge of Scala and Maven is a plus.

Join Infogain and be part of a forward-thinking team that drives innovation and transforms businesses through cutting-edge technology solutions. If you are passionate about data engineering and eager to make a significant impact, we would love to hear from you!