Epicareer Might not Working Properly
Learn More

Code Data Validation Consultant (Machine Learning & Data Processing)

Salary undisclosed

Checking job availability...

Original
Simplified

Job Description:

  • Join our team to enable cutting-edge AI/ML innovation by building robust data pipelines and automation tools.
  • You ll work closely with human data operators and generative AI teams to process, analyze, and optimize high-quality datasets for training machine learning models.
  • Your work will directly impact the efficiency and performance of AI systems, from automating data quality checks to designing infrastructure that scales with evolving model requirements.
  • This role is ideal for a problem-solver who thrives in fast-paced environments and enjoys bridging data engineering with machine learning.

Responsibilities:

Data Pipeline Development:

  • Design and implement Python-based automation tools to process, clean, and transform raw data for ML training.
  • Build custom scripts to streamline data ingestion and preprocessing workflows.

Quality Analysis & Reporting:

  • Conduct manual and automated quality assessments to identify high/low-impact data for model training.
  • Generate reports detailing experimental results, data effectiveness, and recommendations for improvement.

ML Model Integration:

  • Train and evaluate open-source ML models (e.g., Gemma) to assess data impact on model performance.
  • Collaborate with AI teams to refine data selection strategies based on model feedback.

Infrastructure Optimization:

  • Develop scalable solutions in Colab/Jupyter Notebooks to automate data validation and filtering.
  • Troubleshoot and debug data formatting issues (e.g., code-comment relevance, dataset consistency).

Required (Mandatory):

  • Preferred: 2-3+ years in data analysis/validation/engineering, ML engineering, or automation-focused roles.
  • Bonus: PhD graduates with hands-on ML/data processing projects.

Required (Desired):

  • Exposure to Generative AI models (e.g., GPT, Llama) or large-scale datasets.
  • Bash/Shell Scripting: Ability to automate repetitive tasks.
  • Familiarity with APIs for data ingestion/processing.
  • Experience contributing to open-source projects or public GitHub repositories.
  • Knowledge of cloud services.

Skills:

  • Technical Expertise:
  • Python: Medium to Advanced proficiency (scripting, automation, data processing libraries like Pandas/NumPy).
  • Hands-on experience writing, executing and reviewing code. (Preferably using Colab/Jupyter Notebooks)
  • Data & ML Skills:
  • Experience training/fine-tuning ML models and analyzing their performance.
  • Familiarity with public data platforms (Hugging Face, GitHub) and data formats (JSON, CSV).
  • Analytical Skills.
  • Proven ability to assess data quality and build tools to automate quality checks.

Why Join This Project:

  • Impact AI innovation by shaping the data backbone of advanced ML systems.
  • Collaborate with senior data engineers and generative AI experts.
  • Flexible hybrid work environment with opportunities for growth.

Education:

  • Bachelor s degree in Computer Science, Data Science, Engineering, or related STEM field.

About US Tech Solutions:

US Tech Solutions is a global staff augmentation firm providing a wide range of talent on-demand and total workforce solutions. To know more about US Tech Solutions, please visit .

US Tech Solutions is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
Report this job

Job Description:

  • Join our team to enable cutting-edge AI/ML innovation by building robust data pipelines and automation tools.
  • You ll work closely with human data operators and generative AI teams to process, analyze, and optimize high-quality datasets for training machine learning models.
  • Your work will directly impact the efficiency and performance of AI systems, from automating data quality checks to designing infrastructure that scales with evolving model requirements.
  • This role is ideal for a problem-solver who thrives in fast-paced environments and enjoys bridging data engineering with machine learning.

Responsibilities:

Data Pipeline Development:

  • Design and implement Python-based automation tools to process, clean, and transform raw data for ML training.
  • Build custom scripts to streamline data ingestion and preprocessing workflows.

Quality Analysis & Reporting:

  • Conduct manual and automated quality assessments to identify high/low-impact data for model training.
  • Generate reports detailing experimental results, data effectiveness, and recommendations for improvement.

ML Model Integration:

  • Train and evaluate open-source ML models (e.g., Gemma) to assess data impact on model performance.
  • Collaborate with AI teams to refine data selection strategies based on model feedback.

Infrastructure Optimization:

  • Develop scalable solutions in Colab/Jupyter Notebooks to automate data validation and filtering.
  • Troubleshoot and debug data formatting issues (e.g., code-comment relevance, dataset consistency).

Required (Mandatory):

  • Preferred: 2-3+ years in data analysis/validation/engineering, ML engineering, or automation-focused roles.
  • Bonus: PhD graduates with hands-on ML/data processing projects.

Required (Desired):

  • Exposure to Generative AI models (e.g., GPT, Llama) or large-scale datasets.
  • Bash/Shell Scripting: Ability to automate repetitive tasks.
  • Familiarity with APIs for data ingestion/processing.
  • Experience contributing to open-source projects or public GitHub repositories.
  • Knowledge of cloud services.

Skills:

  • Technical Expertise:
  • Python: Medium to Advanced proficiency (scripting, automation, data processing libraries like Pandas/NumPy).
  • Hands-on experience writing, executing and reviewing code. (Preferably using Colab/Jupyter Notebooks)
  • Data & ML Skills:
  • Experience training/fine-tuning ML models and analyzing their performance.
  • Familiarity with public data platforms (Hugging Face, GitHub) and data formats (JSON, CSV).
  • Analytical Skills.
  • Proven ability to assess data quality and build tools to automate quality checks.

Why Join This Project:

  • Impact AI innovation by shaping the data backbone of advanced ML systems.
  • Collaborate with senior data engineers and generative AI experts.
  • Flexible hybrid work environment with opportunities for growth.

Education:

  • Bachelor s degree in Computer Science, Data Science, Engineering, or related STEM field.

About US Tech Solutions:

US Tech Solutions is a global staff augmentation firm providing a wide range of talent on-demand and total workforce solutions. To know more about US Tech Solutions, please visit .

US Tech Solutions is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
Report this job