Principal Data Engineer
Full time
at Sony Pictures Networks India
in
India
Posted on December 20, 2024
Job details
Principal Data Engineer: Experience:
- 10-12 years of deep technical experience in data engineering, primarily focusing on high-tech or cloud-based environments.
- Extensive expertise in building, maintaining, and optimizing large-scale data platforms and pipelines. Must have knowledge of Big Data Architecture Patterns and experience in the delivery of end-to-end Big Data solutions.
- Advanced hands-on experience with:
- ETL/ELT frameworks : (e.g., Apache Nifi, Airflow, dbt)
- Data Warehousing : (e.g., Snowflake, Redshift, BigQuery)
- Big Data Technologies : (e.g., Hadoop, Apache Spark, HDFS)
- Streaming Data Platforms : (e.g., Apache Kafka, Kinesis, Flink)
- Cloud Data Solutions : (AWS, GCP) with a focus on services like Lambda, EMR, Glue, and DataFlow.
- Have 8+ year of experience in ClickHouse/Redshift or any other OLAP system
- Proficient in data processing and programming with Python, Scala, or Java , leveraging these languages for building high-performance data pipelines.
- Strong understanding of distributed computing , microservices architecture , and containerization technologies (e.g., Kubernetes, Docker).
- Deep knowledge of data security, data governance, and compliance frameworks in large-scale, complex data ecosystems.
- Experience in performance optimization and troubleshooting in high-throughput data environments.
- worked in Hadoop eco-system with AWS cloud distribution and worked at least on one or more big data ingestion tools (Sqoop, Flume, NiFI, etc), distributed messaging and ingestion frameworks (Kafka, Pulsar, Pub/Sub, etc).
- Architect and build large-scale, robust, and high-performance data pipelines that process both batch and streaming data.
- Implement, automate, and maintain real-time data processing systems, ensuring high availability and low latency.
- Collaborate with data scientists, machine learning engineers, and analysts to optimize data pipelines and integrate advanced analytics workflows.
- Develop and maintain scalable ETL processes that handle vast amounts of structured and unstructured data from diverse sources.
- Focus on continuous improvement, driving the evolution of architecture to ensure optimal performance, cost-efficiency, and scalability.
- Optimize distributed systems for performance and cost-efficiency, including caching, compression techniques, and optimizing query performance.
- Maintain, monitor, and troubleshoot complex distributed systems, implementing monitoring solutions and ensuring reliability through proactive alerting.
- Lead technical deep dives into performance bottlenecks, debugging, and optimization of data processing workflows.
- Manage and ensure data security, availability, and reliability by following best practices and aligning with industry standards (e.g., encryption, IAM policies).
Apply safely
To stay safe in your job search, information on common scams and to get free expert advice, we recommend that you visit SAFERjobs, a non-profit, joint industry and law enforcement organization working to combat job scams.