Design and develop robust data pipelines, ETL processes, and data integration solutions to collect, transform, and load data from various sources into our data warehouse.
Collaborate with cross-functional teams to identify data requirements and translate them into technical specifications and data models.
Optimize and tune database systems, queries, and ETL processes for performance and scalability, ensuring efficient data retrieval and storage.
Implement data quality and validation mechanisms to maintain data integrity and accuracy.
Develop and maintain documentation for data pipelines, data models, and data flow diagrams to facilitate understanding and collaboration among team members.
Monitor and troubleshoot data pipeline issues, database performance bottlenecks, and data-related problems to ensure smooth data operations.
Stay up to date with emerging technologies and trends in the data engineering space, evaluating and recommending new tools and frameworks to improve data processing efficiency and overall system performance.
Collaborate with data scientists and analysts to support their data needs, providing them with clean, reliable, and well-organized datasets.
Create and maintain reports and dashboards using Power BI or other visualization tools to enable data-driven decision-making across the organization.
Utilize Python programming to develop and maintain data engineering solutions, including data manipulation, data cleansing, and automation of data processes.
Proven experience as a Data Engineer or similar role, with a strong understanding of data management principles and best practices.
Proficiency in SQL and experience working with both SQL and NoSQL databases (e.g., MySQL, PostgreSQL, MongoDB, Cassandra).
Experience with big data technologies, such as Hadoop, Spark, and related frameworks (e.g., Spark Streaming, PySpark).
Familiarity with cloud-based data platforms and services, preferably AWS or Azure (e.g., Amazon Redshift, Azure SQL Database, S3).
Strong programming skills in Python, with the ability to write efficient and optimized code for data manipulation and automation.
Proficiency in Spark/Flink, Kafka/Pulsar.
Good at AWS Glue or Azure Data Factory, ETL processes.
Experience with data visualization tools like Power BI or Tableau, including creating interactive dashboards and reports.
Familiarity with data warehousing concepts and experience working with tools like Apache Airflow or similar workflow management systems.
Solid understanding of data security, privacy, and compliance standards.
Strong analytical and problem-solving skills, with the ability to quickly troubleshoot and resolve data-related issues.
Excellent communication and collaboration skills, with the ability to work effectively in cross-functional teams.
Self-motivated and eager to learn, with a proactive approach to problem-solving and staying updated on industry trends and best practices.