Data Scientist
Job details
We are seeking a highly skilled and motivated Data Scientist (Feature Engineering) to join our growing team. In this role, you will play a crucial part in developing and optimizing our data processing pipeline. You will be responsible for converting existing Python-based data parsing and feature engineering code to be compatible with GPU acceleration and Apache Spark for large-scale data processing. Responsibilities:
- Analyze and understand existing Python code for data parsing and feature engineering.
- Refactor and optimize the code to leverage GPU acceleration using libraries like CUDA or Numba.
- Convert the code to be compatible with Apache Spark, enabling distributed processing on a cluster.
- Develop and implement efficient feature engineering techniques for various data types (e.g., text, numerical, categorical).
- Ensure the scalability, performance, and reliability of the data processing pipeline.
- Collaborate with other data scientists and engineers to integrate the pipeline into our machine learning workflows.
- Stay up-to-date with the latest advancements in data science, feature engineering, and distributed computing.
- Overall work experience between 1-4 years
- Strong programming skills in Python, with experience in data science libraries (NumPy, Pandas, Scikit-learn).
- Proven experience in feature engineering, including feature extraction, transformation, and selection.
- Hands-on experience with GPU programming (CUDA, Numba, or similar).
- Solid understanding of Spark / Dask and its ecosystem (Spark DataFrames, Spark SQL).
- Experience with data parsing and handling different data formats.
- Passion for problem-solving through code: You should enjoy the challenge of breaking down complex problems into smaller, manageable pieces and finding creative solutions.
- Ability to write structured and reusable code: You should write code that is clean, well-documented, and easy to maintain.
- Fluency in database-related operations: You should have a good understanding of SQL and be familiar with working with databases like MySQL, PostgreSQL, or NoSQL databases.
- Familiarity with cloud computing platforms (AWS, GCP, Azure).
- Knowledge of DevOps practices (Git, testing, CI/CD).
- Experience in Lending Industry.
Apply safely
To stay safe in your job search, information on common scams and to get free expert advice, we recommend that you visit SAFERjobs, a non-profit, joint industry and law enforcement organization working to combat job scams.