Key Accountabilities:
Clinical Data Pipeline Leadership
- Serve as a technical subject matter expert for building, deploying, and maintaining data pipelines supporting clinical study start-up, conduct, and close-out.
- Lead planning, execution, and delivery of data pipeline initiatives across multiple clinical trials.
- Own end-to-end clinical data flow across systems, ensuring reliability, scalability, and traceability.
Data Integration & Architecture
- Design, develop, and optimize robust data pipelines integrating heterogeneous clinical data sources (e.g., EDC, eCOA, central labs, imaging, real-world data).
- Identify, design, and implement scalable data delivery solutions, automating manual processes where possible.
- Design and build cloud-based ETL/ELT infrastructure for optimal data ingestion and transformation.
Technical Execution & Operations
- Define, build, validate, and maintain APIs, data streams, and staging layers for extraction and integration across systems.
- Manage, monitor, and troubleshoot data pipelines within enterprise data lakes and data warehouses, ensuring ongoing reliability and performance.
- Implement comprehensive data integrity, validation, and quality control checks throughout the ingestion lifecycle.
Compliance, Inspection & External Engagement
- Prepare functional areas for submission readiness and represent Clinical Data Engineering in formal inspections and audits.
- Support study-level negotiation and agreement for data transfer and integration standards as needed.
Cross‑Functional Collaboration
- Partner closely with statistical programmers, SDTM programmers, clinical data programmers, and analytics teams to ensure data products meet downstream requirements.
- Provide technical guidance and leadership while working collaboratively across all levels of the organization.
- Appropriately escalate risks, issues, and dependencies to CDE leadership.
General requirements:
Technical Competencies
Core Technologies
- Proficiency in Python, SQL, and NoSQL databases
- Strong understanding of database concepts and data modeling
- Working knowledge of XML, JSON, and API-based integrations
Cloud & Big Data
- Hands-on experience with AWS, Azure, or GCP
- Experience with big data processing frameworks such as Apache Spark
- Proven expertise deploying and orchestrating data pipelines using Apache Airflow
Data Platforms & DevOps
- Experience designing and managing data lakes and data warehouses (e.g., Snowflake, Amazon Redshift)
- Familiarity with GitHub, GitLab, and/or Jenkins for version control and CI/CD
- Strong understanding and application of System Development Life Cycle (SDLC) principles
Behavioral Competencies
- Strong critical thinking and problem-solving skills
- Ability to work collaboratively with minimal guidance across technical and non-technical teams
- Clear communicator with the ability to explain complex technical concepts to diverse audiences
Demonstrated ownership, accountability, and attention to data quality and compliance