Agentic AI Atlas

II.

SkillArea overview

skill-area:eval-driven-development

Reference · live

Eval-Driven LLM Development overview

Defining evals before features — golden sets, rubric scoring, LLM-as-judge with calibration, and regression gates.

SkillAreaOutgoing · 1Incoming · 60

Attributes

displayName

Eval-Driven LLM Development

description

Defining evals before features — golden sets, rubric scoring, LLM-as-judge with calibration, and regression gates.

domains

specialization:ai-agents-conversational

expertiseLevels

intermediate
expert

Outgoing edges

applies_to1

specialization:ai-agents-conversational·Specialization

Incoming edges

addresses2

skill:babysitter-retrospect·Skillbabysitter:retrospect
skill:babysitter-accomplish-status·Skillbabysitter:accomplish-status

lib_requires_skill_area4

lib-agent:ai-agents-conversational--agent-evaluator·LibraryAgentagent-evaluator
lib-agent:ai-agents-conversational--llm-judge·LibraryAgentllm-judge
lib-agent:ai-agents-conversational--prompt-optimizer·LibraryAgentprompt-optimizer
lib-skill:ai-agents-conversational--phoenix-arize-setup·LibrarySkillphoenix-arize-setup

prerequisite_for_learning2

skill-area:llm-evaluation·SkillAreaLLM Evaluation
skill-area:ai-agent-development·SkillAreaAI Agent Development

requires_expertise15

responsibility:ai-agent-usage-review·Responsibility
responsibility:ai-tooling-evaluation·ResponsibilityAI Tooling Evaluation
responsibility:model-quality-assurance·ResponsibilityModel quality assurance
responsibility:prompt-quality-assurance·ResponsibilityPrompt quality assurance
role:ai-champion·RoleAI Champion
role:data-scientist·RoleData Scientist
role:ml-engineer·RoleMachine Learning Engineer
role:planner·RolePlanner
role:ml-engineer-convergent·RoleML Engineer
role:ai-trainer·RoleAI Trainer
role:ai-researcher·RoleAI Researcher
role:prompt-engineer·RolePrompt Engineer
role:ai-ethics-researcher·RoleAI Ethics Researcher
role:ai-product-manager·RoleAI Product Manager
role:product-owner·RoleProduct Owner

requires_skill_area36

skill-area:hallucination-mitigation-fact-checking·SkillAreaHallucination Mitigation and Fact Checking
skill-area:agent-simulation-testing·SkillAreaAgent Simulation and Testing
workflow:rag-pipeline-evaluation·WorkflowRAG Pipeline Evaluation
workflow:ai-content-moderation-review·WorkflowAI Content Moderation Review
workflow:ai-agent-adoption-rollout·WorkflowAI Agent Adoption Rollout
workflow:ai-usage-review·WorkflowAI Agent Usage Review
workflow:ai-knowledge-sharing·WorkflowAI Knowledge Sharing
workflow:ai-pair-programming-governance·WorkflowAI Pair-Programming Governance
workflow:ai-model-license-compliance·WorkflowAI Model License Compliance
workflow:algo-strategy-backtesting·WorkflowAlgorithmic Strategy Backtesting
workflow:adas-validation-cycle·WorkflowADAS Validation Cycle
workflow:process-simulation-review·WorkflowProcess Simulation Review
workflow:model-fairness-audit·WorkflowModel Fairness Audit
workflow:ml-model-versioning-governance·WorkflowML Model Versioning Governance
workflow:adaptive-learning-model-review·WorkflowAdaptive Learning Model Review
workflow:landing-page-optimization-cycle·WorkflowLanding Page Optimization Cycle
workflow:growth-experiment-review·WorkflowGrowth Experiment Review
workflow:growth-experimentation-platform-setup·WorkflowGrowth Experimentation Platform Setup
workflow:quality-control-audit·WorkflowQuality Control Audit
workflow:underwriting-model-validation·WorkflowUnderwriting Model Validation
workflow:contract-automation-review·WorkflowContract Automation Review
workflow:hypothesis-driven-experiment·WorkflowHypothesis-Driven Experiment
workflow:prompt-regression-testing·WorkflowPrompt Regression Testing
workflow:llm-eval-pipeline·WorkflowLLM Evaluation Pipeline
workflow:model-card-maintenance·WorkflowModel Card Maintenance
workflow:impact-measurement-review·WorkflowImpact Measurement Review
workflow:computational-experiment-validation·WorkflowComputational Experiment Validation
workflow:competitive-landscape-analysis·WorkflowCompetitive Landscape Analysis
workflow:quant-model-peer-review·WorkflowQuant Model Peer Review
workflow:quantum-algorithm-benchmarking·WorkflowQuantum Algorithm Benchmarking
workflow:error-correction-validation·WorkflowError Correction Validation
workflow:revenue-forecasting-model-calibration·WorkflowRevenue Forecasting Model Calibration
workflow:support-chatbot-performance-review·WorkflowSupport Chatbot Performance Review
workflow:ai-agent-adoption-rollout·WorkflowAI Agent Adoption Rollout
workflow:ai-usage-review·WorkflowAI Agent Usage Review
workflow:ai-knowledge-sharing·WorkflowAI Knowledge Sharing

tool_used_by1

tool:braintrust-proxy·ToolBraintrust Proxy

II.

SkillArea overview

skill-area:eval-driven-development

Reference · live

Eval-Driven LLM Development overview

Defining evals before features — golden sets, rubric scoring, LLM-as-judge with calibration, and regression gates.

SkillAreaOutgoing · 1Incoming · 60

Attributes

displayName

Eval-Driven LLM Development

description

Defining evals before features — golden sets, rubric scoring, LLM-as-judge with calibration, and regression gates.

domains

specialization:ai-agents-conversational

expertiseLevels

intermediate
expert

Outgoing edges

applies_to1

specialization:ai-agents-conversational·Specialization

Incoming edges

addresses2

skill:babysitter-retrospect·Skillbabysitter:retrospect
skill:babysitter-accomplish-status·Skillbabysitter:accomplish-status

lib_requires_skill_area4

lib-agent:ai-agents-conversational--agent-evaluator·LibraryAgentagent-evaluator
lib-agent:ai-agents-conversational--llm-judge·LibraryAgentllm-judge
lib-agent:ai-agents-conversational--prompt-optimizer·LibraryAgentprompt-optimizer
lib-skill:ai-agents-conversational--phoenix-arize-setup·LibrarySkillphoenix-arize-setup

prerequisite_for_learning2

skill-area:llm-evaluation·SkillAreaLLM Evaluation
skill-area:ai-agent-development·SkillAreaAI Agent Development

requires_expertise15

responsibility:ai-agent-usage-review·Responsibility
responsibility:ai-tooling-evaluation·ResponsibilityAI Tooling Evaluation
responsibility:model-quality-assurance·ResponsibilityModel quality assurance
responsibility:prompt-quality-assurance·ResponsibilityPrompt quality assurance
role:ai-champion·RoleAI Champion
role:data-scientist·RoleData Scientist
role:ml-engineer·RoleMachine Learning Engineer
role:planner·RolePlanner
role:ml-engineer-convergent·RoleML Engineer
role:ai-trainer·RoleAI Trainer
role:ai-researcher·RoleAI Researcher
role:prompt-engineer·RolePrompt Engineer
role:ai-ethics-researcher·RoleAI Ethics Researcher
role:ai-product-manager·RoleAI Product Manager
role:product-owner·RoleProduct Owner

requires_skill_area36

skill-area:hallucination-mitigation-fact-checking·SkillAreaHallucination Mitigation and Fact Checking
skill-area:agent-simulation-testing·SkillAreaAgent Simulation and Testing
workflow:rag-pipeline-evaluation·WorkflowRAG Pipeline Evaluation
workflow:ai-content-moderation-review·WorkflowAI Content Moderation Review
workflow:ai-agent-adoption-rollout·WorkflowAI Agent Adoption Rollout
workflow:ai-usage-review·WorkflowAI Agent Usage Review
workflow:ai-knowledge-sharing·WorkflowAI Knowledge Sharing
workflow:ai-pair-programming-governance·WorkflowAI Pair-Programming Governance
workflow:ai-model-license-compliance·WorkflowAI Model License Compliance
workflow:algo-strategy-backtesting·WorkflowAlgorithmic Strategy Backtesting
workflow:adas-validation-cycle·WorkflowADAS Validation Cycle
workflow:process-simulation-review·WorkflowProcess Simulation Review
workflow:model-fairness-audit·WorkflowModel Fairness Audit
workflow:ml-model-versioning-governance·WorkflowML Model Versioning Governance
workflow:adaptive-learning-model-review·WorkflowAdaptive Learning Model Review
workflow:landing-page-optimization-cycle·WorkflowLanding Page Optimization Cycle
workflow:growth-experiment-review·WorkflowGrowth Experiment Review
workflow:growth-experimentation-platform-setup·WorkflowGrowth Experimentation Platform Setup
workflow:quality-control-audit·WorkflowQuality Control Audit
workflow:underwriting-model-validation·WorkflowUnderwriting Model Validation
workflow:contract-automation-review·WorkflowContract Automation Review
workflow:hypothesis-driven-experiment·WorkflowHypothesis-Driven Experiment
workflow:prompt-regression-testing·WorkflowPrompt Regression Testing
workflow:llm-eval-pipeline·WorkflowLLM Evaluation Pipeline
workflow:model-card-maintenance·WorkflowModel Card Maintenance
workflow:impact-measurement-review·WorkflowImpact Measurement Review
workflow:computational-experiment-validation·WorkflowComputational Experiment Validation
workflow:competitive-landscape-analysis·WorkflowCompetitive Landscape Analysis
workflow:quant-model-peer-review·WorkflowQuant Model Peer Review
workflow:quantum-algorithm-benchmarking·WorkflowQuantum Algorithm Benchmarking
workflow:error-correction-validation·WorkflowError Correction Validation
workflow:revenue-forecasting-model-calibration·WorkflowRevenue Forecasting Model Calibration
workflow:support-chatbot-performance-review·WorkflowSupport Chatbot Performance Review
workflow:ai-agent-adoption-rollout·WorkflowAI Agent Adoption Rollout
workflow:ai-usage-review·WorkflowAI Agent Usage Review
workflow:ai-knowledge-sharing·WorkflowAI Knowledge Sharing

tool_used_by1

tool:braintrust-proxy·ToolBraintrust Proxy