Home India Data Scientist

Home India Data Scientist

Data Scientist

Full time at Protium in India
Posted on February 21, 2025

Job details

We are seeking a highly skilled and motivated Data Scientist (Feature Engineering) to join our growing team. In this role, you will play a crucial part in developing and optimizing our data processing pipeline. You will be responsible for converting existing Python-based data parsing and feature engineering code to be compatible with GPU acceleration and Apache Spark for large-scale data processing. Responsibilities:

  • Analyze and understand existing Python code for data parsing and feature engineering.
  • Refactor and optimize the code to leverage GPU acceleration using libraries like CUDA or Numba.
  • Convert the code to be compatible with Apache Spark, enabling distributed processing on a cluster.
  • Develop and implement efficient feature engineering techniques for various data types (e.g., text, numerical, categorical).
  • Ensure the scalability, performance, and reliability of the data processing pipeline.
  • Collaborate with other data scientists and engineers to integrate the pipeline into our machine learning workflows.
  • Stay up-to-date with the latest advancements in data science, feature engineering, and distributed computing.
Qualifications:
  • Overall work experience between 1-4 years
  • Strong programming skills in Python, with experience in data science libraries (NumPy, Pandas, Scikit-learn).
  • Proven experience in feature engineering, including feature extraction, transformation, and selection.
  • Hands-on experience with GPU programming (CUDA, Numba, or similar).
  • Solid understanding of Spark / Dask and its ecosystem (Spark DataFrames, Spark SQL).
  • Experience with data parsing and handling different data formats.
  • Passion for problem-solving through code: You should enjoy the challenge of breaking down complex problems into smaller, manageable pieces and finding creative solutions.
  • Ability to write structured and reusable code: You should write code that is clean, well-documented, and easy to maintain.
  • Fluency in database-related operations: You should have a good understanding of SQL and be familiar with working with databases like MySQL, PostgreSQL, or NoSQL databases.
Desirable:
  • Familiarity with cloud computing platforms (AWS, GCP, Azure).
  • Knowledge of DevOps practices (Git, testing, CI/CD).
  • Experience in Lending Industry.

Apply safely

To stay safe in your job search, information on common scams and to get free expert advice, we recommend that you visit SAFERjobs, a non-profit, joint industry and law enforcement organization working to combat job scams.

Share this job
Improve your chance to get this job. Do an online course on Data Science starting now. Claim $10 promo towards online courses. See all courses
See All Data Jobs
Feedback Feedback