We are seeking a highly skilled Python Data Engineer with deep experience in CMS datasets (MOR, MMR, MAO) and a strong understanding of healthcare regulations and compliance standards (HIPAA). This role is ideal for a data-driven professional who thrives in cloud-native environments and is passionate about building robust, scalable, and efficient pipelines that drive healthcare innovation.

Key Responsibilities:

  • Design, develop, and maintain scalable ETL pipelines for CMS datasets using GCP Dataflow (Apache Beam) and Python

  • Architect and manage BigQuery data warehouses, ensuring optimal performance and cost-efficiency

  • Implement and manage Airflow DAGs for workflow orchestration and scheduling

  • Ensure end-to-end data quality, lineage, validation, and governance in alignment with HIPAA and CMS standards

  • Optimize large-scale healthcare datasets using partitioning, clustering, sharding, and efficient query patterns in BigQuery

  • Collaborate within Agile teams using tools like Jira and Confluence for sprint planning and documentation

  • Monitor, troubleshoot, and improve pipeline reliability and performance across the full data lifecycle

Qualifications:

  • Bachelor's degree in Computer Science, Information Systems, or related field

  • 3+ years of experience in cloud-based data engineering, preferably with healthcare datasets

  • Strong proficiency in Python, GCP Dataflow, and Apache Beam

  • Expert-level knowledge in BigQuery, including schema design, performance tuning, and advanced SQL

  • Hands-on experience with Airflow forthe orchestration of complex data workflows

  • In-depth understanding of data warehouse design, including star/snowflake schemas, normalization, and denormalization

  • Strong analytical skills for query and data optimization

  • Familiarity with Agile methodologies and collaboration tools (Jira, Confluence)

  • Knowledge of CMS datasets (MOR, MMR, MAO) and healthcare data privacy/compliance standards (HIPAA)