Data Engineer (19-00267) – CA – South San Francisco

seeking an experienced Data Engineer who is motivated and experienced in data architecture. This individual will be accountable for providing engineering expertise in the delivery and optimization of the organization’s data lake and data warehouse.

The role will require cross-functional interactions with Data Management Leads, Clinical Study Teams, Predictive Analytics, Artificial Intelligence and Information Technology teams to drive data acquisitions and data operations projects as well as data platform technology needs. The hallmark of a great candidate is one who is eager to solve complex problems with data, is skilled in managing databases and developing data pipelines and has a passion for learning new skillsets to deliver on organizational-wide data needs.

Responsibilities
Assemble large, complex data sets that meet use case requirements
Perform ETL to deliver analyzable data for data analysts, data scientists and analytical tools / dashboards using AWS technologies
Develop and optimize big data pipelines for data scientists (requires a basic understanding of data science concepts and Client)
Write generic Python/Pyspark modules for processing data from various data sources (XML, Parquet, CSV, Relational)
· Perform hands-on infrastructure design of data lake and data warehouse environment including continuous exploration and recommendation of new technologies and best practices
· Research and recommend new innovative methods and systems to manage data for business improvement
· Participate in internal governance to drive the data quality business cycle and roadmap

Skills:
Bachelor’s or Master’s degree in computer science or software engineering
5+ years of programming experience (including functional programming); must be advanced in Python
Experience with relational SQL and NoSQL databases, including Postgres
Experience building and optimizing big data pipelines using Spark or other similar technologies
Experience with AWS cloud services: S3, EC2, EMR, RDS, Redshift, Glue, Lambda, EKS, Sagemaker
Solid understanding of how to design robust data workflows including optimization and user experience
Strong analytical and problem-solving skills
Excellent oral and written communication skills
Able to work in teams and collaborate with others to clarify requirements
Strong co-ordination and project management skills to handle complex projects
Experience developing and working with XML, JSON, and external web services

Preferred Qualifications

Clinical drug development domain knowledge
Scientific domain knowledge and experience working with biomedical data types (omics, imaging, etc.)
Experience with Clinical data and systems such as Medidata RAVE, Siebel CTMS, IxRS
Experience with data quality software such as Talend, Informatica, Paxata or similar class of tools
Competencies in applied statistics to solve business needs
Knowledge of industry data standards used in drug development, particularly in Clinical development

Keywords:
Education:
Bachelor’s or Master’s degree in computer science or software engineering
Source: Job Diva – Job Listing

Leave a Reply

Your email address will not be published. Required fields are marked *