PT Graha Karya Informasi
Data Engineer
Job Description
Tasks and Responsibilities :
• Data Pipeline Development and Management : Data engineers create and manage the pipelines that extract data from various sources, transform it into usable formats, and load it into databases or data warehouses (ETL/ELT processes). They ensure that data flows reliably from source systems to reporting and analytics platforms.
• Data Storage and Architecture : They design and maintain scalable and secure data storage solutions, such as relational databases, NoSQL databases, and cloud-based data warehouses. They establish schemas, indexing strategies, and storage models optimized for performance and analytics.
• Data Quality and Governance : Data engineers enforce data quality standards, implement validation and cleansing processes, and monitor pipelines to detect and resolve inconsistencies. They also ensure compliance with data governance policies and standards, including privacy regulations.
• Performance Optimization and Scalability : Optimizing database queries, improving pipeline performance, and planning for scalable infrastructure to manage growing datasets are key responsibilities. This includes tuning systems to handle large-scale batch and real-time processing efficiently.
• Collaboration with Stakeholders : Data engineers work closely with data scientists, analysts, and business teams to understand their data requirements and provide reliable datasets for analysis, reporting, and machine learning applications.
• Tooling and Automation : They leverage tools such as Apache Spark, Hadoop, Kafka, Airflow, and cloud services like AWS, Azure, or Google Cloud Platform to automate workflows and manage large-scale data processing
Requirements :
• Programming and Scripting: Proficiency in Python, Java, or Scala for pipeline development.
• Database Management: Expertise in SQL and Gogle Big Query
• Data Modeling: Ability to design schema and data models optimized for analytics.
• Big Data Tools: Knowledge of platforms like Hadoop, Spark, Kafka, or Flink for processing large datasets.
• Problem Solving: Analytical skills for troubleshooting pipeline issues, optimizing performance, and ensuring data integrity.