Agentic AI Atlasby a5c.ai
OverviewWikiGraphFor AgentsEdgesSearchWorkspace
/
GitHubDocsDiscord
iiRecord
Agentic AI Atlas · Eval-Driven LLM Development
skill-area:eval-driven-developmenta5c.ai
Search record views/
Record · tabs

Available views

II.Record viewspp. 1 - 1
overviewjsongraph
II.
SkillArea overview

skill-area:eval-driven-development

Reference · live

Eval-Driven LLM Development overview

Defining evals before features — golden sets, rubric scoring, LLM-as-judge with calibration, and regression gates.

SkillAreaOutgoing · 1Incoming · 60

Attributes

displayName
Eval-Driven LLM Development
description
Defining evals before features — golden sets, rubric scoring, LLM-as-judge with calibration, and regression gates.
domains
  • specialization:ai-agents-conversational
expertiseLevels
  • intermediate
  • expert

Outgoing edges

applies_to1
  • specialization:ai-agents-conversational·Specialization

Incoming edges

addresses2
  • skill:babysitter-retrospect·Skillbabysitter:retrospect
  • skill:babysitter-accomplish-status·Skillbabysitter:accomplish-status
lib_requires_skill_area4
  • lib-agent:ai-agents-conversational--agent-evaluator·LibraryAgentagent-evaluator
  • lib-agent:ai-agents-conversational--llm-judge·LibraryAgentllm-judge
  • lib-agent:ai-agents-conversational--prompt-optimizer·LibraryAgentprompt-optimizer
  • lib-skill:ai-agents-conversational--phoenix-arize-setup·LibrarySkillphoenix-arize-setup
prerequisite_for_learning2
  • skill-area:llm-evaluation·SkillAreaLLM Evaluation
  • skill-area:ai-agent-development·SkillAreaAI Agent Development
requires_expertise15
  • responsibility:ai-agent-usage-review·Responsibility
  • responsibility:ai-tooling-evaluation·ResponsibilityAI Tooling Evaluation
  • responsibility:model-quality-assurance·ResponsibilityModel quality assurance
  • responsibility:prompt-quality-assurance·ResponsibilityPrompt quality assurance
  • role:ai-champion·RoleAI Champion
  • role:data-scientist·RoleData Scientist
  • role:ml-engineer·RoleMachine Learning Engineer
  • role:planner·RolePlanner
  • role:ml-engineer-convergent·RoleML Engineer
  • role:ai-trainer·RoleAI Trainer
  • role:ai-researcher·RoleAI Researcher
  • role:prompt-engineer·RolePrompt Engineer
  • role:ai-ethics-researcher·RoleAI Ethics Researcher
  • role:ai-product-manager·RoleAI Product Manager
  • role:product-owner·RoleProduct Owner
requires_skill_area36
  • skill-area:hallucination-mitigation-fact-checking·SkillAreaHallucination Mitigation and Fact Checking
  • skill-area:agent-simulation-testing·SkillAreaAgent Simulation and Testing
  • workflow:rag-pipeline-evaluation·WorkflowRAG Pipeline Evaluation
  • workflow:ai-content-moderation-review·WorkflowAI Content Moderation Review
  • workflow:ai-agent-adoption-rollout·WorkflowAI Agent Adoption Rollout
  • workflow:ai-usage-review·WorkflowAI Agent Usage Review
  • workflow:ai-knowledge-sharing·WorkflowAI Knowledge Sharing
  • workflow:ai-pair-programming-governance·WorkflowAI Pair-Programming Governance
  • workflow:ai-model-license-compliance·WorkflowAI Model License Compliance
  • workflow:algo-strategy-backtesting·WorkflowAlgorithmic Strategy Backtesting
  • workflow:adas-validation-cycle·WorkflowADAS Validation Cycle
  • workflow:process-simulation-review·WorkflowProcess Simulation Review
  • workflow:model-fairness-audit·WorkflowModel Fairness Audit
  • workflow:ml-model-versioning-governance·WorkflowML Model Versioning Governance
  • workflow:adaptive-learning-model-review·WorkflowAdaptive Learning Model Review
  • workflow:landing-page-optimization-cycle·WorkflowLanding Page Optimization Cycle
  • workflow:growth-experiment-review·WorkflowGrowth Experiment Review
  • workflow:growth-experimentation-platform-setup·WorkflowGrowth Experimentation Platform Setup
  • workflow:quality-control-audit·WorkflowQuality Control Audit
  • workflow:underwriting-model-validation·WorkflowUnderwriting Model Validation
  • workflow:contract-automation-review·WorkflowContract Automation Review
  • workflow:hypothesis-driven-experiment·WorkflowHypothesis-Driven Experiment
  • workflow:prompt-regression-testing·WorkflowPrompt Regression Testing
  • workflow:llm-eval-pipeline·WorkflowLLM Evaluation Pipeline
  • workflow:model-card-maintenance·WorkflowModel Card Maintenance
  • workflow:impact-measurement-review·WorkflowImpact Measurement Review
  • workflow:computational-experiment-validation·WorkflowComputational Experiment Validation
  • workflow:competitive-landscape-analysis·WorkflowCompetitive Landscape Analysis
  • workflow:quant-model-peer-review·WorkflowQuant Model Peer Review
  • workflow:quantum-algorithm-benchmarking·WorkflowQuantum Algorithm Benchmarking
  • workflow:error-correction-validation·WorkflowError Correction Validation
  • workflow:revenue-forecasting-model-calibration·WorkflowRevenue Forecasting Model Calibration
  • workflow:support-chatbot-performance-review·WorkflowSupport Chatbot Performance Review
  • workflow:ai-agent-adoption-rollout·WorkflowAI Agent Adoption Rollout
  • workflow:ai-usage-review·WorkflowAI Agent Usage Review
  • workflow:ai-knowledge-sharing·WorkflowAI Knowledge Sharing
tool_used_by1
  • tool:braintrust-proxy·ToolBraintrust Proxy

Related pages

No related wiki pages for this record.

Shortcuts

Open in graph
Browse node kind