We are seeking a skilled and experienced Data Engineer to join our team and take ownership of developing and maintaining our data warehouse and data lake infrastructure. As a Data Engineer, you will play a crucial role in designing and implementing robust, scalable, and cost-effective solutions using open-source or affordable premium frameworks. You will collaborate closely with cross-functional teams, including data analysts, data scientists, and business stakeholders, to understand data requirements and ensure smooth data integration and retrieval processes.
If that all sounds up your alley, then read on.
- Design, develop, and maintain our data warehouse and data lake infrastructure, utilizing open-source or affordable premium frameworks.
- Build and optimize data pipelines to extract, transform, and load data from diverse sources into the data warehouse or data lake.
- Perform data modeling and schema design to ensure efficient data organization and retrieval.
- Implement data quality checks, error handling, and data cleansing techniques to ensure the accuracy and reliability of stored data.
- Tune and optimize the performance of the data warehouse and data lake, including query optimization, indexing, and partitioning strategies.
- Collaborate with data analysts and data scientists to understand their data requirements and design solutions that meet their needs.
- Work with cross-functional teams to ensure data governance, security, and compliance requirements are met.
- Stay up-to-date with emerging trends and technologies in the data engineering field, especially related to data warehousing and data lakes.
- Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent practical experience).
- Proven experience in developing data warehouse and data lake solutions using open-source or affordable premium frameworks.
- Strong proficiency in SQL and scripting languages (e.g., Python, Java) for data manipulation and automation.
- Experience with data integration tools (e.g., Apache Spark, Apache Kafka, Talend) for extracting, transforming, and loading data.
- Solid understanding of data modeling principles and best practices.
- Familiarity with cloud platforms such as AWS, Azure, or GCP, and their respective data storage and processing services.
- Knowledge of data governance, security, and compliance practices.
- Strong problem-solving skills and the ability to work in a fast-paced, collaborative environment.
- Excellent communication skills and the ability to effectively communicate complex concepts to technical and non-technical stakeholders.
- Experience with Delta Lake, Metabase, Redash, or similar data warehousing and data visualization tools.
- Familiarity with streaming technologies (e.g., Apache Kafka, Amazon Kinesis) for real-time data ingestion and processing.
- Knowledge of containerization technologies (e.g., Docker) and container orchestration platforms (e.g., Kubernetes).
- Experience with workflow management tools (e.g., Apache Airflow, Luigi) for automating data processing and ETL pipelines.
- Understanding of data cataloging and metadata management tools (e.g., Apache Atlas, Collibra).
- Open work culture
- Medical insurance
- Unlimited leave
- Hybrid (work from home)
- Free snacks and coffee
- Monthly activities
- Rooftop!
Minimum 3 years