Data Engineer (19-00526) – CA – South San Francisco

The role will require cross-functional interactions with Data Management Leads, Clinical Study Teams, Predictive Analytics, Artificial Intelligence and Information Technology teams and will require supporting multiple teams and projects. The hallmark of a great candidate is one who can translate the unique needs of data analysts and data scientists, is eager to solve complex problems with data and is skilled in wrangling data and developing data pipelines. Must be self-motivated and able to extrapolate customer needs with minimal direction.


  • Assemble large, complex data sets that meet use case requirements
  • Perform ETL to deliver analyzable data for data analysts, data scientists and analytical tools / dashboards using AWS technologies
  • Develop and optimize big data pipelines for data scientists (requires a basic understanding of data science concepts and Client)
  • Write generic Python/Pyspark modules for processing data from various data sources (XML, Parquet, CSV, Relational)

· Perform hands-on infrastructure design of ECD’s data lake and data warehouse environment (gCORE) including continuous exploration and recommendation of new technologies and best practices
· Research and recommend new innovative methods and systems to manage data for business improvement
· Participate in internal governance to drive the data quality business cycle and roadmap

Qualifications & Skills

Minimum Qualifications

  • Bachelor’s or Master’s degree in computer science or software engineering
  • 5+ years of programming experience (including functional programming); must be advanced in Python
  • Experience with relational SQL and NoSQL databases, including Postgres
  • Experience building and optimizing big data pipelines using Spark or other similar technologies
  • Experience with AWS cloud services: S3, EC2, EMR, RDS, Redshift, Glue, Lambda, EKS, Sagemaker
  • Solid understanding of how to design robust data workflows including optimization and user experience
  • Strong analytical and problem-solving skills
  • Excellent oral and written communication skills
  • Able to work in teams and collaborate with others to clarify requirements
  • Strong co-ordination and project management skills to handle complex projects
  • Experience developing and working with XML, JSON, and external web services

Preferred Qualifications

  • Clinical drug development domain knowledge
  • Scientific domain knowledge and experience working with biomedical data types (omics, imaging, etc.)
  • Experience with Clinical data and systems such as Medidata RAVE, Siebel CTMS, IxRS
  • Experience with data quality software such as Talend, Informatica, Paxata or similar class of tools
  • Competencies in applied statistics to solve business needs
  • Knowledge of industry data standards used in drug development, particularly in Clinical development


  • Bachelor’s or Master’s degree in computer science or software engineering

Source: Job Diva – Job Listing

Leave a Reply

Your email address will not be published. Required fields are marked *