Data Wrangler, Oregon Health & Science University, Portland OR
Posted July 30, 2013
The Oregon Health & Science University (OHSU) Library in Portland seeks a skilled Data Wrangler to lead in data ingestion, transformation, and quality assurance for a cutting-edge bioinformatics project.
Project Description:
Clinical and translational researchers face a daunting challenge in using the vast amount of biomedical data to inform their understanding of human disease mechanisms and develop new therapies. To address this challenge, the Monarch project is aggregating information about model organisms, in vitro models, genes, pathways, gene expression, protein and genetic interactions, orthology, disease, phenotypes, publications, and authors. The system we are building will provide an ability to navigate multi-scale spatial and temporal phenotypes across in vivo and in vitro model systems in the context of genetic and genomic data, using semantics and statistics.
Workplace Description:
OHSU is the state’s only comprehensive academic health center and is made up of the Schools of Dentistry, Medicine, and Nursing; College of Pharmacy; OHSU Healthcare; and related programs. The OHSU Library, the largest health sciences library in Oregon, serves the faculty, staff, and students of OHSU, as well as health professionals and residents of the State of Oregon. The Data Wrangler will be part of the Ontology Development Group (ODG) and will work under the guidance of Dr. Carlo Torniai and Dr. Melissa Haendel, but will also be expected to contribute to the library more generally on committees, etc., based on the candidate’s experience and interest.
The Data Wrangler serves as a member of the OHSU Library Ontology Development Group. This position works in the context of the Monarch project to develop a research platform in support of investigations of phenotype-genotype correlations across species. The Data Wrangler will work with ontologists and bioinformaticians at OHSU and consortium sites to design and implement tools and strategies for semantically mapping and manipulating data.
The primary duty of the Data Wrangler will be to research and develop automation for the ingestion and quality control of data coming from several biomedical and informatics databases. This will involve the development of custom scripts and ad-hoc SQL queries, semantic mapping, and data normalization strategies. After ingestion, s/he will contribute to the development of optimization strategies in order to transform these data sets to RDF triples through D2RQ mapping, to be published via a Virtuoso Server instance. S/he will also develop QA pipelines to ensure consistency and accuracy of the ingested data before and after transformation. Moreover, s/he will provide feedback and change requests to the ontologists in the project in order to ensure a consistent and accurate representation of the data. This position will require the ability of explore possible solutions and make decisions that will lead to the identification and implementation of effective end-user displays of the data, novel approaches for data analysis, and efficient testing to support data transformation.
Position Conditions/Qualifications:
Required:
· Master’s degree with major courses in relevant field or Bachelor’s degree with major courses in field of research plus 4 additional years related experience.
· 3 years of relevant work experience
· Ability to perform research and make independent decisions about approaches and tools to reach specific goals
· Experience with semantically annotated data
· Experience with Software Project Management tools (Jira, Confluence, SVN, Git)
· Hands-on experience with one or more scripting languages (e.g. Perl, Python, Ruby, Bash)
· Hands-on experience with SQL (Postgres preferred)
· Strong programming skills with a solid understanding of object oriented languages and principles
· Experience in Java programming
· Strong verbal, written, and interpersonal communication skills (especially via teleconferencing venues)
Preferred:
· Experience developing and evaluating data curation workflows
· Experience developing ontologies and data models
· Experience in developing Extract Transform Load (ETL) scripts
· Knowledge of SPARQL, RDF, OWL
· Experience in bioinformatics
· Experience in end-user usability for bioinformatics platforms
Duration of this appointment and indicated salary may be changed or eliminated if gift, grant, or contract funds supporting this position become unavailable.
Applications and Nominations: To apply please visit ohsujobs.com and search for position IRC 40016. Applications should include a resume, a letter of introduction, and contact information for three references. Screening of applications will commence immediately and continue until the position is filled. OHSU is an AA/EO employer.