AI Quality Assurance Engineer (LLM & Co-Pilot Testing)
Looking for an AI Co-Pilot & LAMA Quality Testing Specialist to support the development, evaluation, and optimization of AI models, including OpenAI s Co-Pilot, LAMA (Log Anomaly detection with Multi-Head Attention), and other Large Language Models (LLMs). The role involves designing and testing AI prompts, validating model outputs for accuracy and bias, ensuring alignment with industry standards, and continuously improving AI response quality.
This position is ideal for professionals with a background in AI/ML model testing, cybersecurity, or AI governance, who are passionate about enhancing AI decision-making and response accuracy.
Key Responsibilities
1. AI Co-Pilot and LLM Prompt Testing
Develop and test effective prompts to optimize AI responses for accuracy, clarity, and relevance.
Evaluate AI model outputs from Co-Pilot, LAMA, and OpenAI-based LLMs, identifying inconsistencies or hallucinations.
Design edge-case testing scenarios to assess AI performance under diverse queries.
Optimize prompts for domain-specific applications, including cybersecurity, IT governance, compliance, and risk management.
2. Quality Assurance & Model Evaluation
Conduct manual and automated testing of AI-generated responses to ensure alignment with accuracy benchmarks.
Compare AI-generated responses vs. human expert responses to assess effectiveness and reliability.
Implement LLM validation frameworks to measure performance on predefined tasks.
Work closely with data scientists and engineers to refine AI models based on test results.
Document test cases, log failures, and provide detailed feedback to AI development teams.
3. Cybersecurity & Bias Testing for AI Models
Test AI models against security risks, ensuring that Co-Pilot and LAMA outputs do not expose vulnerabilities or mislead users.
Identify and mitigate AI bias, hallucinations, or inaccurate predictions in model-generated responses.
Perform ethical AI testing to verify compliance with privacy laws (CCPA, NAIC AI Guidelines) and bias-reduction standards.
Collaborate with AI governance teams to enhance explainability and fairness in AI-driven decision-making.
4. AI Model Performance Monitoring & Continuous Improvement
Set up AI response benchmarking metrics and maintain real-time monitoring dashboards.
Develop AI response quality scoring systems, ensuring ongoing improvement in LLM outputs.
Conduct A/B testing of different prompting techniques to optimize AI-generated results.
Partner with product teams and engineers to implement enhancements based on test findings.
5. Collaboration & Documentation
Work closely with AI research teams, LLM developers, and QA engineers to improve AI model performance.
Maintain detailed testing reports, highlighting trends, anomalies, and improvement opportunities.
Provide training and best practices on AI prompt engineering for internal teams.
Stay updated with latest advancements in AI prompt engineering, LLM governance, and cybersecurity AI trends.
Required Skills & Experience
Technical Skills
Experience with AI/ML testing, LLM fine-tuning, or AI prompt engineering.
Familiarity with OpenAI s Co-Pilot, LAMA, ChatGPT, GPT-based models, or similar LLM frameworks.
Strong understanding of cybersecurity concepts, anomaly detection, and AI risk management.
Proficiency in Python, SQL, or scripting languages for automated AI testing (preferred).
Experience with AI benchmarking tools, prompt testing frameworks, or NLP evaluation techniques.
Quality & Compliance Knowledge
Understanding of AI governance frameworks, model explainability, and bias mitigation strategies.
Knowledge of AI-related regulations (NAIC AI Guidelines, CCPA, SEC AI Governance Frameworks).
Strong data validation and accuracy testing experience for AI-driven systems.
Soft Skills
Strong analytical and problem-solving skills to assess AI performance and recommend improvements.
Excellent communication skills to collaborate with engineers, researchers, and governance teams.
Attention to detail and a structured approach to documentation and model evaluation.
Ability to work in a fast-paced environment, adapting to emerging AI innovations.
Key Competencies
Technical Expertise: Deep understanding of AI Tools and capabilities.
QA Testing Skills: Strong skills to build automated testing solutions.
Attention to Detail: Meticulous attention to detail to identify quality issues.
Team Collaboration: Ability to work collaboratively with cross-functional teams to achieve quality testing objectives.
Adaptability: Flexibility to adapt to evolving and changing technology landscapes.
Communication: Excellent communication skills to convey AI testing concepts and recommendations to diverse audiences