II.
SkillArea overview
Reference · liveskill-area:eval-driven-development
Eval-Driven LLM Development overview
Defining evals before features — golden sets, rubric scoring, LLM-as-judge with calibration, and regression gates.
Attributes
displayName
Eval-Driven LLM Development
description
Defining evals before features — golden sets, rubric scoring,
LLM-as-judge with calibration, and regression gates.
domains
expertiseLevels
- intermediate
- expert
Outgoing edges
applies_to1
- specialization:ai-agents-conversational·Specialization
Incoming edges
addresses2
- skill:babysitter-retrospect·Skillbabysitter:retrospect
- skill:babysitter-accomplish-status·Skillbabysitter:accomplish-status
lib_requires_skill_area4
- lib-agent:ai-agents-conversational--agent-evaluator·LibraryAgentagent-evaluator
- lib-agent:ai-agents-conversational--llm-judge·LibraryAgentllm-judge
- lib-agent:ai-agents-conversational--prompt-optimizer·LibraryAgentprompt-optimizer
- lib-skill:ai-agents-conversational--phoenix-arize-setup·LibrarySkillphoenix-arize-setup
prerequisite_for_learning2
- skill-area:llm-evaluation·SkillAreaLLM Evaluation
- skill-area:ai-agent-development·SkillAreaAI Agent Development
requires_expertise15
- responsibility:ai-agent-usage-review·Responsibility
- responsibility:ai-tooling-evaluation·ResponsibilityAI Tooling Evaluation
- responsibility:model-quality-assurance·ResponsibilityModel quality assurance
- responsibility:prompt-quality-assurance·ResponsibilityPrompt quality assurance
- role:ai-champion·RoleAI Champion
- role:data-scientist·RoleData Scientist
- role:ml-engineer·RoleMachine Learning Engineer
- role:planner·RolePlanner
- role:ml-engineer-convergent·RoleML Engineer
- role:ai-trainer·RoleAI Trainer
- role:ai-researcher·RoleAI Researcher
- role:prompt-engineer·RolePrompt Engineer
- role:ai-ethics-researcher·RoleAI Ethics Researcher
- role:ai-product-manager·RoleAI Product Manager
- role:product-owner·RoleProduct Owner
requires_skill_area36
- skill-area:hallucination-mitigation-fact-checking·SkillAreaHallucination Mitigation and Fact Checking
- skill-area:agent-simulation-testing·SkillAreaAgent Simulation and Testing
- workflow:rag-pipeline-evaluation·WorkflowRAG Pipeline Evaluation
- workflow:ai-content-moderation-review·WorkflowAI Content Moderation Review
- workflow:ai-agent-adoption-rollout·WorkflowAI Agent Adoption Rollout
- workflow:ai-usage-review·WorkflowAI Agent Usage Review
- workflow:ai-knowledge-sharing·WorkflowAI Knowledge Sharing
- workflow:ai-pair-programming-governance·WorkflowAI Pair-Programming Governance
- workflow:ai-model-license-compliance·WorkflowAI Model License Compliance
- workflow:algo-strategy-backtesting·WorkflowAlgorithmic Strategy Backtesting
- workflow:adas-validation-cycle·WorkflowADAS Validation Cycle
- workflow:process-simulation-review·WorkflowProcess Simulation Review
- workflow:model-fairness-audit·WorkflowModel Fairness Audit
- workflow:ml-model-versioning-governance·WorkflowML Model Versioning Governance
- workflow:adaptive-learning-model-review·WorkflowAdaptive Learning Model Review
- workflow:landing-page-optimization-cycle·WorkflowLanding Page Optimization Cycle
- workflow:growth-experiment-review·WorkflowGrowth Experiment Review
- workflow:growth-experimentation-platform-setup·WorkflowGrowth Experimentation Platform Setup
- workflow:quality-control-audit·WorkflowQuality Control Audit
- workflow:underwriting-model-validation·WorkflowUnderwriting Model Validation
- workflow:contract-automation-review·WorkflowContract Automation Review
- workflow:hypothesis-driven-experiment·WorkflowHypothesis-Driven Experiment
- workflow:prompt-regression-testing·WorkflowPrompt Regression Testing
- workflow:llm-eval-pipeline·WorkflowLLM Evaluation Pipeline
- workflow:model-card-maintenance·WorkflowModel Card Maintenance
- workflow:impact-measurement-review·WorkflowImpact Measurement Review
- workflow:computational-experiment-validation·WorkflowComputational Experiment Validation
- workflow:competitive-landscape-analysis·WorkflowCompetitive Landscape Analysis
- workflow:quant-model-peer-review·WorkflowQuant Model Peer Review
- workflow:quantum-algorithm-benchmarking·WorkflowQuantum Algorithm Benchmarking
- workflow:error-correction-validation·WorkflowError Correction Validation
- workflow:revenue-forecasting-model-calibration·WorkflowRevenue Forecasting Model Calibration
- workflow:support-chatbot-performance-review·WorkflowSupport Chatbot Performance Review
- workflow:ai-agent-adoption-rollout·WorkflowAI Agent Adoption Rollout
- workflow:ai-usage-review·WorkflowAI Agent Usage Review
- workflow:ai-knowledge-sharing·WorkflowAI Knowledge Sharing
tool_used_by1
- tool:braintrust-proxy·ToolBraintrust Proxy