Data Scientist
San Francisco Veterans Affairs Medical Center
RESEARCH AND DEVELOPMENT SERVICE DATA SCIENTIST
INTRODUCTION
We are seeking a full-time data scientist with expertise in epidemiology, biostatistics, and bioinformatics to join an active research program focused on using national electronic health record data (“big data”) to construct clinical cohorts and answer clinically and biologically relevant questions in the public health field, computational epidemiology, and computational biology. These studies use modern epidemiological methods (causal inference), including propensity weighting and inverse probability weighting. They also include prediction and machine learning approaches to data analyses. Individuals with a background in epidemiology, statistics, bioinformatics, or related field are welcome to apply to join this research program based in the San Francisco VA. This job involves working directly with UCSF/ SFVA Faculty.
Current studies include:
- Mimicking a clinical trial using national VA, COVID-19 vaccine, and health outcomes data to assess booster effectiveness.
- Examining the biological determinants of health outcomes, particularly using national SARS-CoV-2 viral sequencing and human genetics data.
The Department of Veterans Affairs(VA) Research Program strives to promote Veteran- centered care to improve patient experience and outcomes across VA healthcare and community settings, and to advance value-driven care by providing Veterans the highest quality care at the lowest financial burden. The position is vital within the Office of Research and Development (ORD) and the facilities where research programs are conducted. Data Scientist prove the link between veterans and highly innovative projects. This position will be located in various research offices within the VHA.
BACKGROUND
The incumbent serves as the subject matter expert (SME) in the highly technical and specialized area of data science serving as a Data Scientist with specific technical expertise in health sciences data. They will act as a key advisor to the VA Medical Center Research & Development Service and senior leadership, as well as internal and external stakeholders on all matters related to data science.
The incumbent is responsible for providing high-level technical expertise on the aggregation, collation, and the analyzation of data from databases including, but not limited to the development, implementation and updating of data extraction queries, data mining and, developing strategies, action plans, data queries modification and maintenance, quality control and validation of collected data.
The incumbent is responsible for the development of methodological approaches, study design, and advanced written, verbal, and visual communications of study/analysis output. The incumbent additionally contributes to the development of the design for advanced data systems, software, and complex programming specifications.
Analyses generally inform decisions about how to design, implement, and evaluate administrative orders in the healthcare setting including, but not limited to analyzing electronic health record data to develop predictive analytics and medical image processing; high-impact AI technologies such as deep learning, trustworthy AI, privacy preserving AI, explainable AI, and multi-scale AI Analysis.
Position requires work experience with use of large longitudinal medical databases, statistical/analysis software platforms (e.g., Statistical Analysis System (SAS), familiarity with queries using structured query language (SQL) and visual programming tools, SQL development tools, or others. Performs statistical and quantitative analysis utilizing specially developed software models or procedures.
Interpersonal communications and outreach, acquisitions and procurement, information technology, and financial management are the functional skills that will supplement those primary tasks. The incumbent leads or consults with cross-functional teams, staffed by federal civil servants and contractors, composed of professionals skilled in policy analysis, information technology, data engineering, and project management disciplines, in order to develop data-driven solutions that address the Department of Veterans Affairs (henceforth referred to as “Agency”) program and business challenges. The above areas are subject to change as program emphasis and priorities change.
Location: San Francisco, CA
Salary: $93,123.00 - $128,717.00
Pay Scale & Grade: GS-13
Major Duties and Responsibilities Data Science Lifecycle - 50%
Serves as a Data Scientist and advisor to Agency leadership with the responsibility for the overall development, management, control, coordination, storage, retrieval and execution of data acquisition requirements and the efforts of other Agency professionals and contractors in solving complex and far-reaching data management issues and problems for a wide range of systems, applications, and customers. Issues include data interface and storage, and require application of data management principles, procedures, and tools such as modeling techniques, data backup, data recovery, data dictionaries, data warehousing, data mining, data disposal, and data standardization processes.
The incumbent requires expertise in and management of end-to-end data processes in the data life cycle. Mastery experience with data science tools to execute duties is required; these duties including programming, statistics, machine learning, causal inference, and data visualization. Constructs data pipelines using complex tools and techniques to handle data at scale. Conceives, plans, and conducts projects to support enterprise analytics, with the ability to advance methodological designs through collaborations with industry, government, and academia. The goal is to advance science through leading edge methods that may result in establishing new theories and a deeper understanding of phenomena; examples of data accessed include health data, health images, demographic data, administrative data, claims data, programmatic data, and patient behavior, provider activities, economic indicators, and expenditure data.
Demonstrates expert level understanding of diverse data science techniques including data pooling, machine learning, natural language processing, deep learning, advanced visualizations and has the ability to demonstrate how ensembles of these techniques can be harmonized into user-centric solutions.
Translates complex concepts, findings, and limitations into language for scientific and lay audiences. Closely ties findings and conclusions into the Agency mission, original problem statement, and team objectives. Research and designs presentations and interpretations of analytical outputs tailored to specific audiences including the use of interactivity and narrative storytelling where appropriate. Collaborates with teammates, customers, and stakeholders in a reproducible and organized manner.
Consults with stakeholders and customers to identify the appropriate data, methodological approach, design, and validation. Conducts observational analysis using software and/or programming languages such as R, Python, Stata, or SAS to explore/group data, test hypotheses, predict outcomes, and inform decisions. Has the ability to distinguish analyses that provide predictions versus those that inform causal inference with strong counterfactual analyses. Derives meaning from big data (i.e., datasets that may be large, disparate, unstructured, and/or complex), including structured, loosely structured, and unstructured data. Comfortable working with large datasets and numerous confounding variables.
The incumbent uses these tools to transform, combine, and clean data in preparation for analysis. Works with teams to develop data products and definitions. Uses appropriate analytic and statistical software to programmatically prepare data for analysis and clean imperfect data such as probabilistic matching and imputation of missing values. Translates the results of analysis into clear, actionable communications that equip Agency decision makers to make informed, data-driven decisions. Articulates findings through data visualization, reports, and operationalized constructs. Solves technical problems by choosing the appropriate tools and explaining data architecture and design to both technical and non-technical audiences and delivers reports and relevant information to support the needs of leadership and policy/ project teams.
Has experience with prediction and causal inference methods, such as difference in differences, instrumental variables, regression discontinuity. Has experience testing assumptions in causal inference models, and can explain the differences between average treatment effects, average treatment effect on the treated, and local average treatment effects.
Presents the results of quantitative analysis to technical and non-technical audiences. Has mastery knowledge about chart typologies, mapping stories to appropriate chart types. Works with customers to create useful presentations and interpretations of analytical outputs. Independently synthesizes findings and turns them into actionable insights. Adheres to applicable style guides. Is responsive to a variety of stakeholders and team members with varying technical skills.
Collaborates with other SMEs or stakeholders to select the relevant sources of information, as needed, to perform job duties and makes strategic recommendations on data collection, integration, and retention requirements incorporating business requirements and knowledge of best practices. Provides strategic and technical guidance and hands-on support in the transfer of any required data needed for ad-hoc and ongoing analyses. Assures regulatory compliance in conjunction with relevant Institutional Review Boards and ensures completion of required documentation/approvals (e.g., data sharing agreements, memorandums of agreement, etc.).Develops memorandums of agreement (MOAs) and memorandums of understanding (MOUs) to include data sharing agreements with external and or internal agencies for the acquisition and use of data or the sustainment of data use agreements.
Identifies, adapts, and manages changes to data analysis tools in response to evolving user needs. The incumbent documents data definitions and issues for future reference. Develops usage and access control policies and systems in collaboration with Agency contractors and system security design staff and partners with stakeholders in continuous improvement process by developing data set processes and using programming language and tools. Impacting reliability, efficiency and quality for missions, goals, and future planning, performance enhancements, and overall user experience. Offers expertise when participating in discussions with non-technical internal customers to understand the problem and identify required data sets, collection mechanisms, and other key stakeholders required to solve the problem. Independently analyzes data, applying and explaining the statistical and mathematical principles used. Is familiar with methods of exploratory data analysis, selecting appropriate models and identifying when data is insufficient to reach conclusions.
Consults with customers and applies analytical processes to the planning, design, and implementation of new and improved information systems to meet the business requirements of customer organizations. Ensures the safe and reliable integration of different elements of a system, including schedules, configurations, and resources. Develops contingency and long-range plans, and responses to unexpected/unplanned, externally driven requirements.
Data Science Consultant and Advisor - 25%
Provides technical advice serving as a SME in the development of new methods to automate reports, rapid analysis of pilot initiatives and applications and makes recommendations to management on the use of such technology as it relates to Agency data. Participates on cross-functional teams in the development, implementation, and oversight of new technologies and statistical methods related to data science and collaborates with government, academic, and industry partners. Provides advice and guidance to team members on project decisions and recommendations in support of IT security plans to ensure compliance with Agency security requirements specific to data.
Provides technical advice to the Center Director or Deputy Director in the collecting, optimizing, analyzing, interpreting, and communicating insights from data. Serves as a mentor to staff to manage projects and provide direction and support. Provides technical recommendation in the modernization of existing services. Advises other scientists on appropriate data elements to use for studies.
Directs internal and external process or system reviews, studies, projects, and data validation efforts, which provide a mean for evaluating system performance and vulnerabilities. Accesses, merges,cleans, standardizes, and develops derivedmetrics on multiple structured datasets; has familiarity with appropriate software and automation tools. Also serves as a resource to junior analysts and contractor staff, as directed by supervisor.
Prepares and delivers written reports and oral presentations, e.g., briefings, training sessions, and consultations. Regularly provides authoritative advisory discussions with command and Agency leadership, supervisors, customers, and co-workers and conducts or leads conferences as an expert representative of the organization and command. Attends meetings, conferences, briefings, and seminars related to data science, advanced data storage technologies, and statistical and analysis support system concepts. Consolidates research finding and conclusions into completed products for review by the investigative teams, and subsequent publication and or oral presentations to interagency and both internal and external stakeholders.
Project Leadership Related to Data Analytics and Technical Liaison - 25%
Serves as a resource and leader to Federal and contractor staff on implementing, managing, tracking, and evaluating large scale and complex data analysis projects and their associated requirements and risks. Has the ability to weigh the tradeoffs between more complicated analyses with more assumptions and simpler analyses that involve obtaining additional data, thereby guiding Agency decisions on which approach is preferrable to meet the scientific needs and deadlines.
Applies system architecture principles to develop and, manage technical requirements and achieve formulating, planning, stakeholder needs and constraints, clarifies objectives, develops resource requirements and the appropriate balance among resources, schedule, and technical requirements. Develops and manages an efficient project organizational structure and applies system architecture principles to projects.
Assumes responsibility for the development of project planning, coordination of overall team efforts, and maintaining appropriate lines of communication. Develops project plans to ensure logistics are handled efficiently, identifying potential bottlenecks, and resolving issues within the scope of authority. Develops and implements advanced systems for the Agency data management program. These systems are being designed, or redesigned, to utilize new technology such as national database management systems. Creates and maintains the preparation and maintenance of procedures for the Agency’s data systems. Recommends policy changes and provides data to support recommended changes. Develops system proposals and coordinates the efforts necessary to translate business requirements into effective IT data systems solutions. Develops, codes, tests, and maintains database programs.
Assures effective operations and achievement of contractual requirements and program objectives. May serve as a Business Function Lead (BFL), SME, or Contracting Officer’s Representative (COR) on contracts, task orders, or other projects related to data science issues or initiatives.
May develop procurement support materials, such as acquisition plans, procurement strategies, cost estimates, statements of work, and schedules of deliverables as needed to support data science initiatives. May establish and maintain tracking systems and records for grant and contract activities in the program.
Designs, develops, and operates systems for ingesting, storing, and analyzing data at scale. Uses data parallelization techniques or streaming technologies to process data. Monitors data flows and stored datasets to make improvements to data collection and ingestion mechanisms. Ensures that data sources are fit for their intended purpose through assessment of potential bias in data ingestion and transmission mechanisms, current data quality, monitoring for incoming changes in data quality, and improvement of data quality. Recommends improvements to upstream processes to improve data quality.
Level Descriptions
Supervisory Controls
The incumbent works under the administrative direction of the supervisor, who outlines the resources and objectives for work assignments based on mission or functional goals. The incumbent has responsibility for independently planning, designing, and carrying out programs, projects, studies, or other work. Considerable tact, personal initiative, resourcefulness, independence, and professional judgment are used with respect to resolve most of the conflicts that arise.
Interprets policy and regulatory requirements; manages progress and potentially controversial problems, concerns, and issues; develops changes to plans and/or methodology; and provides recommendations for improvements to meet program objectives. Work is reviewed in terms of their soundness and approach to work assignments, adherence to requirements, and achievement of desired results based on the practicality of recommendations. As such the supervisor will not generally review the methods used to complete work assignments.
Guidelines
The guidelines include general policy statements, basic legislation, recent scientific findings, Federal and State law, Agency and departmental program and evaluation policies, and professional literature regarding analytic methods. These guidelines are frequently evolving and involve many interrelated programs, with new standards or methods often rendering existing guidelines inapplicable. As such guidelines for work assignments may be scarce or have limitations that would require considerable adaptation to be used to address issues or problems being confronted.
The incumbent must exercise considerable resourcefulness, self-motivation, and inventiveness, and must show discretion in working on sensitive assignments; when modifying or adapting guidelines; dealing with issues; developing new methods; and/or proposing new policies and practices. The incumbent is a technical authority at the program and group level, and frequently contributes to Agency-wide or even government- wide standards; consults with other experts in the organization, and outside the agency, to define and establish possible approaches to unprecedented problems. However, the incumbent must often deviate from past approaches to develop, initiate, and lead new project to read in, store, and analyze Agency data more effectively.
Scope and Effect
The work assignments include the formulation, definition, and interpretation of data to be used for presentation, planning and policy development. The development of new standards will be used to analyze, test, and assess new emerging technology or scientific methods. The origination of these new applications and strategies will be used to complement existing or new scientific data concepts and principles. The incumbent will provide consultation and advise top management at the VAMC. The results of the work assignments will result is the improved efficiency and credibility of applications to meet ORD, VHA, VA, and scientific community standards. The work will allow for management and other key decision-makers to adopt and accept new approaches, technology, etc. to be used as safe and effective approaches to system compatibility or other uses.
Position Qualification Requirements
This position requires a PhD degree in epidemiology, biostatistics, data science or a related field.