Epicareer Might not Working Properly
Learn More

Data Scientist in Genomic Sequencing/BioInformatics SME

Salary undisclosed

Apply on


Original
Simplified

Position Title: Data Scientist in Genomic Sequencing/BioInformatics SME

Location: Fully REMOTE OK – accommodating any US timezone is fine.

Length of Contract: 3-6 months to start, possible extension. Could be open to K to hire down the road, but pure contract is fine as well

Hours: Full time 40 hours per week for at least the first 3 months, then possibly taper off to more of a part-time basis after that.

Start Date: ASAP – can interview as early as tomorrow and start by October 1.

Process: One round – panel interview with CEO, VP of Product, and one other team lead. One 90 minute Teams/Zoom call as a panel should suffice.

Background Check Required – not sure on Drug Screen

Scope: Client has a small team of about 5 folks, and a heavy pipeline of data that needs to be run through an informatics pipeline and converted into base coding. They need a Subject Matter Expert to help guide and lead their internal team. They key tool/platform used to convert this data is “Basecaller.” This tool takes electronic signals and converts them into a sequence of typical DNA proteins and building blocks – the A,T,C,G that are the starting points of a typical genomics pipeline.

The client has a lot of electronic and optical signal data that they want to plug into Basecaller, and it will turn these signals into DNA sequencing data. Basecaller is essentially a signal processing algorithm/tool there you call a “signal” into the “DNA base” and it will spit out the protein sequence along with some extra noise and interference. This candidate will need to be familiar with using Basecaller and also know how to do noise filtering on the back-end. They will also need some post-processing and analysis of the data and code the results in Python. Ideal candidate is adept at Basecaller and using machine learning algorithms for data processing, noise filtering and cancellation. The last part of the process is the general pragmatics of this data called “Alignment” and “Assembly” which involves aligning the sequence of data to a known reference point to check for errors, etc. The data that is retrieved (A, C, T, G) can be cross-referenced against existing biological and functions to verify it’s accuracy, check for errors, etc. Also need to be familiar with FAST-A and FAST-Q, which are text-based formats for storing both a biological sequence and its corresponding quality scores.

Duties/Responsibilities:

-Drive development of signal processing algorithms.

-Developing a pipeline for genomic data processing using Basecaller

-Developing deep learning algorithm to process large scale datasets.

-Algorithm scaling and optimization.

-Process the R&D data and conduct customized analyses in daily basics.

Necessary Skills:

-Basecaller

-Familiar with Alignment and Assembly

-FAST A and FAST Q

-Python

-Experienced in deep learning (CNN, RNN, etc.) and natural language processing algorithms.

-Strong background in probability, statistics, and random processes.

-Experience in biomedical signal processing, including filtering, spectral domain analysis, signal decomposition, noise filtering etc.

-Exposure to modern collaboration and software development tools (e.g., Git, GitHub, or GitLab).

-Comfortable working with diverse computing environments and platforms (e.g., Windows, Linux, containers, and cloud).

Nice To Haves:

-Any additional experience in Genomic Sequencing or nanopore sequencing.

-Experience with database management tools and languages such as SQL

-Extra experience in other languages such as R, C++, Matlab, etc.

-Experience in data visualization and web application is a plus

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
Report this job