Home India PySpark Engineer

Home India PySpark Engineer

PySpark Engineer

Full time at Impetus in India
Posted on February 10, 2025

Job details

Locations : Noida / Indore / Pune / Gurugram /Bengaluru Job Overview : We are seeking talented and experienced professionals in PySpark, Big Data technologies, and cloud solutions to join our team. The position spans across three levels: Engineer , Module Lead , and Lead Engineer . In these roles, you will be responsible for developing, optimizing, and managing ETL pipelines on cloud and on-premises environments using Big Data tools and AWS services. You will collaborate with cross-functional teams, ensuring that business requirements are met through efficient and scalable data solutions. Key Responsibilities:

  • ETL Pipeline Development : Design and develop efficient ETL pipelines as per business requirements while adhering to development standards and best practices.
  • AWS Integration & Testing : Perform integration testing on AWS environments and ensure seamless data operations across platforms.
  • Estimation & Planning : Provide estimates for development, testing, and deployments across various environments.
  • Peer Reviews & Best Practices : Participate in code peer reviews, ensuring code quality, adherence to best practices, and promoting continuous improvement within the team.
  • Cost-Effective Solutions : Build and maintain cost-effective pipelines using AWS services like S3, IAM, Glue, EMR, Redshift, etc.
  • Cloud Migrations : Support and manage cloud migrations from on-premise to cloud or between cloud environments.
  • Orchestration & Scheduling : Manage job orchestration with tools like Airflow and any other relevant job scheduler.
Required Skills & Qualifications: Experience :
  • Engineer Level : 2-5 years of experience with PySpark, Hadoop, Hive, and related Big Data technologies.
  • Module Lead Level : 4-6 years of experience with PySpark, Hadoop, Hive, and related Big Data technologies.
  • Lead Engineer Level : 5-7 years of experience with PySpark, Hadoop, Hive, and related Big Data technologies.
Technical Skills :
  • Hands-on experience with PySpark (DataFrame and SparkSQL) , Hadoop , and Hive .
  • Proficiency in Python and Bash scripting .
  • Solid understanding of SQL and data warehouse concepts .
  • Experience with AWS Big Data services (IAM, Glue, EMR, Redshift, S3, Kinesis) is a plus.
  • Experience with Orchestration tools (e.g., Airflow, job schedulers) is beneficial.
  • Analytical & Problem-Solving Skills :
  • Strong analytical, problem-solving, and data analysis skills.
  • Ability to think creatively and implement innovative solutions beyond readily available tools.
  • Communication & Interpersonal Skills :
  • Excellent communication, presentation, and interpersonal skills to collaborate with internal and external teams effectively.
Desired Skills & Experience:
  • Experience with migrating workloads between on-premise systems and cloud environments.
  • Familiarity with cloud-native technologies and platforms.
  • Knowledge of performance optimization techniques for distributed data processing.

Apply safely

To stay safe in your job search, information on common scams and to get free expert advice, we recommend that you visit SAFERjobs, a non-profit, joint industry and law enforcement organization working to combat job scams.

Share this job
See All PySpark Jobs
Feedback Feedback