Data Engineer
Job details
Data Engineer Location: Hyderabad, India - Hybrid Remote (3 days a week in office preferred) Position Overview: As a Data Engineer focused on AI development, you will play a pivotal role in the design, development, and deployment of data infrastructure and pipelines that support artificial intelligence and machine learning initiatives within our organization. You will work closely with data scientists, machine learning engineers, and software developers to ensure that data is readily accessible, high-quality, and efficiently processed to fuel AI algorithms and models. Responsibilities:
- Data Pipeline Development: Design, implement, and maintain data pipelines that collect, transform, and store data from various sources, ensuring data quality and consistency. These pipelines will serve as the foundation for AI model training and deployment.
- Data Integration: Collaborate with cross-functional teams to integrate data from diverse sources, including structured and unstructured data, into the data ecosystem. Ensure data is accessible and ready for AI development.
- Data Cleansing and Preprocessing: Develop data preprocessing routines to clean and preprocess raw data, handling missing values, outliers, and data inconsistencies, making it suitable for machine learning algorithms. Ensures data consistency and integrity during the integration process, performing data validation and cleaning as needed.
- Data Modeling Support: Work closely with data scientists and machine learning engineers to understand data requirements for AI model training. Assist in feature engineering and dataset creation for model development.
- Data Governance: Implement data governance practices, including data cataloging, version control, and access controls to maintain data lineage and ensure data security and compliance.
- Performance Optimization: Continuously monitor and optimize data pipelines and infrastructure for efficiency and scalability, ensuring timely data processing for AI applications. Monitor and tune data systems, identifies and resolves performance bottlenecks, and implements caching and indexing strategies to enhance query performance
- Collaboration: Collaborate with cross-functional teams to understand AI project requirements and translate them into data engineering solutions. Collaborate with data scientists and analysts to optimize models and algorithms for data quality, security, and governance.
- Documentation: Maintain comprehensive documentation of data pipelines, transformations, and processes to facilitate knowledge sharing and troubleshooting.
- Quality Assurance: Implement and maintain data quality checks and validation processes to identify and address data anomalies or issues that may impact AI model performance. Implement data quality checks and validations within data pipelines to ensure the accuracy, consistency, and completeness of data
- Learning and Development: Stay up-to-date with industry trends, emerging technologies, and best practices in data engineering and AI development. Propose and evaluate new tools and technologies that can enhance the AI development process.
- A bachelor’s degree in computer science, data science, software engineering, information systems, or related quantitative field; master’s degree preferred
- Minimum 3 years in data management, focusing on data integration, modeling, optimization, data quality, and related data engineering activities.
- Demonstrated experience in developing and managing data warehouses for AI/ML applications.
- Proficient in big data languages such as PySpark, SQL, and/or DBT.
- Skilled in database technologies including SQL, NoSQL, Deltalake, and MongoDB.
- Experience with cloud platforms like AWS or Azure, and container technologies like Docker or Kubernetes.
- Strong understanding of ETL and ELT processes, preferably with Databricks.
- Knowledgeable in AI/ML principles, models, tools, and deployment strategies.
- Advanced expertise in Apache technologies like Kafka, Airflow, and Spark for creating effective data pipelines.
- Capable of designing and implementing data solutions for AI, ML, and BI applications.
- Proficient in programming languages like Python, and data science tools such as Scala or SQL.
- Collaborative skills to work with diverse technical teams, guiding and educating on data products.
- Exceptional problem-solving abilities, including troubleshooting and resolving issues in unfamiliar code or systems.
- Excellent business insight and interpersonal skills, capable of working with senior stakeholders to drive change.
- Ability to articulate business cases, data sources, management concepts, and analytical methods.
- Skilled in translating technical language for executive, business, IT, and quantitative audiences.
Apply safely
To stay safe in your job search, information on common scams and to get free expert advice, we recommend that you visit SAFERjobs, a non-profit, joint industry and law enforcement organization working to combat job scams.