Clinical Data Curator/Engineer (19-00454) – CA – South San Francisco

We are seeking an experienced Clinical Data Curator with advanced skills in PySpark / Python for the development of clinical data pipelines for machine learning and artificial intelligence applications. This individual will work in ECDi’s Information Management Office (IMO) and will play an important role working alongside Data Engineers, Data Scientists and Artificial Intelligence Engineers to support early clinical development strategy and execution through advanced analytics.
The hallmark of a great candidate is one who has extensive experience working with clinical datasets (CRF, EHR, Labs, Biomarker, Claims, etc.) with demonstrable experience applying data standards and transformations (SDTM, ADaM, FHIR) in the development of PySpark data processing pipelines. If you have these skills and you are passionate about healthcare, advanced analytics and continuous learning then this is the role for you.

· Understand the current landscape of clinical data repositories and standards; profile and understand data
Acquire datasets, identify transformations and formats best fit for data science Client and AI use cases
Write generic and scalable Python/Pyspark modules for processing data from various data sources (XML, Parquet, CSV, Relational)
Work with Data Engineers to optimize and deploy data pipelines for use on AWS and on premise HPCs
Qualifications & Skills

Minimum Qualifications
2-5 years of experience developing data pipelines using PySpark / Python (advanced experience)
2-5 years of experience curating clinical data and applying pharma industry data standards and models, such as CDISC/SDTM & ADaM, HL7 FHIR, LOINC, CPT, ICD and SNOMED
Experience in SQL
Experience with REST APIs, relational databases and cloud data lake environments
Basic understanding of data science concepts and Client
Must be self-motivated and able to extrapolate customer needs with minimal direction
Strong analytical and problem-solving skills
Excellent oral and written communication skills
Excellent documentation and code management practices
Preferred Qualifications
Data engineering experience designing, deploying and optimizing pipelines using tools such as Docker, Kubernetes, Apache Airflow
AWS infrastructure knowledge and experience (EC2, EMR, Lambda, EKS, Glue, Redshift, Sagemaker)
Source: Job Diva – Job Listing

Leave a Reply

Your email address will not be published. Required fields are marked *