Data Engineer
Job Summary:
We are seeking an experienced Data Engineer with a strong background in big data technologies to join our dynamic team. The ideal candidate will be responsible for designing, implementing, and optimizing scalable data pipelines and big data solutions, ensuring seamless data integration and high performance. This role requires deep technical expertise, problem-solving skills, and a collaborative mindset to support data-driven decision-making across the organization.
Key Responsibilities:
- Design and develop scalable and efficient big data pipelines to process and analyze large datasets.
- Implement ETL processes using modern data engineering tools and frameworks.
- Build and optimize data lakes, distributed data platforms, and data warehouses for robust storage and querying capabilities.
- Leverage big data technologies such as Hadoop, Spark, Hive, Kafka, and Flink for large-scale data processing.
- Develop and maintain streaming data pipelines using tools like Kafka, Apache Beam, or Google Cloud Pub/Sub.
Job Description:
- Optimize query performance and data retrieval processes for both batch and real-time use cases.
- Work with cloud platforms such as AWS, Azure, or Google Cloud Platform (Google Cloud Platform) to deploy and manage data infrastructure.
- Collaborate with cross-functional teams to gather requirements, design solutions, and implement best practices in data engineering.
- Ensure data quality, security, and governance across the data lifecycle.
- Mentor junior engineers and contribute to architectural decisions for the data platform.
- Strong proficiency in big data technologies such as Hadoop, Spark, Hive, Kafka, or Flink.
- Advanced expertise in SQL, Python, Scala, or Java for data processing and analytics.
- Hands-on experience with cloud-based data platforms (e.g., Google Cloud Platform BigQuery, AWS Redshift, Azure Synapse).
- Proficiency in building and managing streaming data pipelines using tools like Kafka, Pub/Sub, or Kinesis.
- Experience with CI/CD pipelines and version control systems like Git.
- Deep understanding of data modeling, data architecture, and schema design principles.
- Strong knowledge of ETL/ELT processes, distributed systems, and data orchestration tools (e.g., Apache Airflow, Apache Nifi).
- Familiarity with containerization and orchestration tools like Docker and Kubernetes.
- Experience with machine learning pipelines or integrating ML models into data workflows.
- Knowledge of data visualization tools such as Tableau, Power BI, or Looker.
- Understanding of data governance frameworks and compliance requirements.