We are looking for an experienced and motivated AWS Data Engineer to join our growing team at Infogain. As an AWS Data Engineer, you will play a pivotal role in designing, developing, and maintaining robust data pipelines and real-time streaming applications that support the organization's data-driven initiatives. This role offers an exciting opportunity to work in a fast-paced, innovative environment where you can apply your expertise in AWS, Spark, Kafka, and data engineering to deliver cutting-edge data solutions.
Key Responsibilities
- Design & Develop Scalable Data Pipelines. You will be responsible for designing and developing data pipelines that can handle large-scale data processing. This includes building real-time streaming applications using technologies like Apache Spark, Kafka, and similar platforms to ensure efficient data flow across systems.
- Real-Time Streaming Applications. You will implement and maintain real-time streaming applications that help in the quick and effective processing of data. Leveraging Spark and Kafka, you will ensure that data can be processed, ingested, and transferred seamlessly across systems with minimal latency.
- Cloud Data Engineering. You will leverage your expertise in AWS to build and maintain robust data pipelines within a cloud-native environment. This includes working with various AWS services such as AWS S3, AWS Redshift, AWS Lambda, and AWS Glue to create a seamless and scalable data architecture.
- Collaborate with the Data Engineering Team. Working within a team of at least 5 engineers, you will actively collaborate on data engineering initiatives. This includes following agile methodologies, contributing to version control, and applying best practices for code development and deployment.
- ETL Development & Workflow Management. You will play a key role in developing and managing ETL workflows using Airflow 2.0, PySpark, and Python. By automating the process of extracting, transforming, and loading data, you will help ensure data quality and efficiency across the organization.
- Code Reviews & CI/CD. You will contribute to maintaining the quality of the codebase by performing thorough code reviews and participating in the continuous integration and deployment (CI/CD) processes. This ensures that the team is aligned with best practices and quality standards for cloud-native data pipelines.
- Data Processing Patterns. A significant portion of your role will involve delivering projects using various data processing patterns, especially batch processing and streaming. You will design and implement solutions that cater to both real-time and batch data requirements.
- System Design & Access Patterns. You will be expected to balance the needs of traditional RDBMS systems and distributed systems. You will design modern data access patterns that are both secure and efficient to meet the needs of the business.
- Documentation & Reporting. You will also be responsible for maintaining clear documentation of the systems and solutions you develop. This includes maintaining thorough records of processes, methodologies, and performance metrics.
Required Qualifications
- Experience. Minimum of 4 years of hands-on experience in crafting data pipelines and real-time streaming applications, with a deep understanding of Spark, Kafka, or similar platforms. You should have experience working with large-scale data systems and cloud-based solutions.
- ETL Development. Experience with ETL development and workflow management, specifically using Airflow 2.0, PySpark, and Python programming. Your experience should include working on end-to-end data engineering pipelines that deliver high-quality data products.
- Cloud Engineering Expertise. A proven track record of working in cloud environments, specifically on AWS. Experience with AWS services like EKS, CloudFormation, Redshift, and S3 will be essential to this role.
- Programming Skills. Strong expertise in one or more programming languages relevant to data engineering (e.g., Python, Java, Spark). Knowledge of additional languages and technologies related to data processing is a plus.
- Data Processing. Demonstrated experience in designing and delivering projects using various data processing patterns. Familiarity with both stream processing and batch processing is crucial.
- Version Control & CI/CD. A solid understanding of version control systems such as GitHub, as well as continuous integration and deployment practices. Experience with CI/CD tools such as Jenkins or GitHub Actions is highly preferred.
- RDBMS and Distributed Systems. Familiarity with traditional RDBMS systems (e.g., SQL) and modern distributed data systems. Understanding of data access patterns, both for traditional and distributed systems, will be critical.
- Collaboration & Teamwork. Strong communication skills and the ability to collaborate within a team. Your proactive approach and self-starter attitude will ensure you contribute effectively to the team while working in an agile environment.
Preferred Skills and Competencies
- Data Access Patterns. Expertise in modern data access patterns and techniques for integrating cloud-based solutions with on-premise systems.
- AWS Services. In-depth knowledge of various AWS services such as AWS S3, AWS Redshift, AWS Lambda, AWS Glue, and AWS RDS for building scalable data pipelines.
- Data Engineering Frameworks. Experience with Apache Hive and Spark SQL to process structured and unstructured data in a distributed manner.
- Adaptability. The ability to adapt quickly to evolving technologies, tools, and methodologies to stay ahead in the fast-changing field of data engineering.
About Infogain
Infogain is a human-centered digital platform and software engineering company based out of Silicon Valley. As an industry leader, we collaborate with Fortune 500 companies and digital natives across various sectors, including technology, healthcare, insurance, travel, telecom, and retail & CPG. We utilize cutting-edge technologies like cloud, microservices, automation, IoT, and artificial intelligence to accelerate experience-led transformation and deliver powerful digital platforms.
As a Microsoft Gold Partner and Azure Expert Managed Services Provider (MSP), we are committed to delivering world-class solutions across industries. Infogain operates offices across the globe, with a strong presence in the United States, UK, UAE, Singapore, and delivery centers in cities like Seattle, Houston, Austin, Kraków, Noida, Gurgaon, Mumbai, Pune, and Bengaluru.
Join us at Infogain, where your career will flourish in an innovative, collaborative, and growth-driven environment.