We are seeking a highly skilled Python Data Engineer with deep experience in CMS datasets (MOR, MMR, MAO) and a strong understanding of healthcare regulations and compliance standards (HIPAA). This role is ideal for a data-driven professional who thrives in cloud-native environments and is passionate about building robust, scalable, and efficient pipelines that drive healthcare innovation.
Design, develop, and maintain scalable ETL pipelines for CMS datasets using GCP Dataflow (Apache Beam) and Python
Architect and manage BigQuery data warehouses, ensuring optimal performance and cost-efficiency
Implement and manage Airflow DAGs for workflow orchestration and scheduling
Ensure end-to-end data quality, lineage, validation, and governance in alignment with HIPAA and CMS standards
Optimize large-scale healthcare datasets using partitioning, clustering, sharding, and efficient query patterns in BigQuery
Collaborate within Agile teams using tools like Jira and Confluence for sprint planning and documentation
Monitor, troubleshoot, and improve pipeline reliability and performance across the full data lifecycle
Bachelor's degree in Computer Science, Information Systems, or related field
3+ years of experience in cloud-based data engineering, preferably with healthcare datasets
Strong proficiency in Python, GCP Dataflow, and Apache Beam
Expert-level knowledge in BigQuery, including schema design, performance tuning, and advanced SQL
Hands-on experience with Airflow forthe orchestration of complex data workflows
In-depth understanding of data warehouse design, including star/snowflake schemas, normalization, and denormalization
Strong analytical skills for query and data optimization
Familiarity with Agile methodologies and collaboration tools (Jira, Confluence)
Knowledge of CMS datasets (MOR, MMR, MAO) and healthcare data privacy/compliance standards (HIPAA)