capability certification

Capability Certification

Document Positioning: This document translates Core Principle 2.4 (Capability Pluralism) into a framework of assessment criteria. Specific assessment formats, weightings, and passing standards need to be operationalized according to domain and temporal conditions; this document only sets the principle boundaries.

I. Introduction: Why Capability Certification Is the Core Battlefield

Capability certification is the most sensitive link in Stairway Universalism. It directly determines: who may operate medical AI? Who may design financial algorithms? Who may control critical infrastructure?

But capability certification is also the most dangerous link. If poorly designed, it will:

Replicate and amplify existing inequalities (screening by cognitive type, educational background, cultural capital)
Become an access gate for a technocratic elite (only specific groups can pass)
Manufacture an illusion of legitimacy ("We gave you a chance; you just didn't pass")

Core Principle 2.4 requires "at least three independent capability dimensions, no single dimension may substantively dominate the overall assessment," and provides parallel certification paths.

II. Three-Dimensional Framework of Capability

2.1 Dimension One: Technical Capability

Definition: The ability to understand, operate, and evaluate a specific AI system and its related technical infrastructure.

Assessment Objective: Not to test "how much you know," but to test "can you safely operate this system."

Assessment Formats (Multiple Combination):

Scenario Simulation Test: Operating an AI system in a highly realistic virtual environment. Candidates may use reference manuals and tools (simulating real working scenarios, not testing rote memorization).
Troubleshooting Test: Given an AI system that has already malfunctioned, candidates must locate the fault within a limited time and propose repair solutions.
Technical Proposal Review: Candidates review a real technical proposal, identifying security vulnerabilities, ethical risks, and operational hazards.

Bias Detection:

Technical capability assessment favors abstract thinking and systematic cognition. To reduce bias:
- Scenario simulation tests are prioritized over written exams (reducing "test-taker" advantages)
- Allowing responses in native language (reducing language barriers)
- Providing multiple input methods (text, voice, diagrams, adapting to different cognitive habits)

Definition: The ability to coordinate conflicts, communicate information, maintain cooperative relationships, and promote collective action in environments with multiple stakeholders.

Assessment Objective: High-risk threshold holders do not operate machines in isolation; they exercise power within complex social networks. No matter how strong their technical capability, if they cannot effectively communicate with patients, colleagues, and the public, disaster may still occur.

Assessment Formats (Multiple Combination):

Role-Play Negotiation: Candidates are placed in simulated multi-party conflict scenarios, evaluating interest identification, communication skills, solution creativity, and relationship repair capability.
Cross-Cultural Communication Test: Candidates communicate with simulated counterparts from different cultural backgrounds.
Team Collaboration Task: Candidates jointly complete a complex task with other candidates.

Bias Detection:

Social coordination assessment favors extroverted, verbal, and performative personalities. To reduce bias:
- Team collaboration tasks include a "silent contributor" role (those not skilled at speaking but good at execution can also score)
- Role-play scenarios do not preset "correct answers," allowing different negotiation strategies
- Introducing "observer ratings" (assessed by bystanders, not just participants)

2.3 Dimension Three: Ethical Judgment Capability

Definition: The ability to identify ethical tensions, weigh multiple values, and make responsible judgments in technical decisions.

Assessment Objective: High-risk threshold holders frequently face ethical dilemmas with "no right answer." Technical capability tells them "what can be done," ethical judgment tells them "what should be done."

Assessment Formats (Multiple Combination):

Ethical Dilemma Case Analysis: Candidates analyze real or highly realistic ethical dilemma cases. Key design: No "correct answer" required; what is required is the transparency and defensibility of the decision-making process.
Ethics Committee Simulation: Candidates participate in a simulated ethics committee, jointly deliberating on a controversial case with other candidates.
Value Conflict Log: Candidates are asked to record ethical conflicts they have encountered in real situations recently, and reflect on their judgments at the time.

Bias Detection:

Ethical judgment assessment favors people with specific philosophical training. To reduce bias:
- Case analysis does not limit ethical frameworks (allowing analysis from care ethics, virtue ethics, communitarianism, etc.)
- Introducing cross-cultural ethical cases (not limited to Western ethical dilemmas)
- Value conflict logs allow non-written expression (oral narration, artistic creation, community testimony)

III. Passing Standard Principles

Not a total-score system, but a dimensional minimum threshold + comprehensive scoring system:

Dimensional Minimum Threshold:

Each dimension must meet the minimum score line for that stair position.
If any dimension falls below the minimum threshold, even if other dimensions score perfectly, the candidate does not pass.
Purpose: Prevent "lopsided" candidates—technical geniuses but ethical blind spots, or social masters but technically incompetent.

Grading System Rather Than Pass/Fail:

Grade A: Excellent in all dimensions, directly granted the highest authority for that stair position
Grade B: Good but not excellent in some dimensions, granted standard authority for that stair position
Grade C: Just met the minimum threshold, granted restricted authority for that stair position (must operate under supervision, re-evaluated after a certain period)
Grade D: Did not pass, allowed to reapply after a reasonable period

IV. Parallel Certification Paths

4.1 Why Parallel Paths Are Needed

Standardized assessment favors specific cognitive types and cultural backgrounds. Even with three dimensions, the assessment formats within each dimension may still exclude some people.

The design goal of parallel certification paths: Within the same authority stair position, provide multiple methods of capability proof, allowing people from different backgrounds to obtain equivalent authority through their own advantageous paths.

4.2 Types of Parallel Paths

Path One: Standardized Assessment Path (Default Path)

Standardized tests for the above three dimensions
Suitable for: Those with formal education, good at exams, accustomed to structured assessment

Path Two: Practical Demonstration Path (Alternative Path)

Candidates submit work samples (portfolio) from real working scenarios
For example: Documentation of AI projects led in the past two years, team collaboration records, ethical decision-making cases
Assessed by a "practical review panel" jointly composed of the three subjects

Path Three: Master-Apprentice Heritage Path (Alternative Path, applicable only to specific domains)

Candidates are jointly recommended by two senior practitioners who have already obtained authority for that stair position
Recommenders must submit detailed recommendation letters explaining the candidate's capabilities, experience, and track record of responsibility
Recommenders bear joint liability for the candidate's subsequent behavior (if the candidate has a major failure, recommenders must undergo review)

4.3 Equivalence Guarantee for Parallel Paths

Core Question: How to ensure that holders of the "practical demonstration path" and "master-apprentice heritage path" are equally safe as holders of the "standardized assessment path"?

Guarantee Mechanisms:

Probation Period: All path recipients must operate under supervision during the initial period. The supervisor must be a senior authority holder for that stair position. Supervision records are included in personal files as a basis for subsequent assessment.
Unified Assessment After Probation: Regardless of which path authority was obtained through, a unified practical assessment must be accepted after the probation period ends. If the assessment is not passed, authority is downgraded or revoked.

V. Continuous Monitoring of Cognitive Bias

Establish a capability certification bias monitoring system to track the following indicators:

Indicator One: Pass Rate Differences

Differences in pass rates across dimensions for different populations (gender, region, educational background, native language, age)
If the pass rate of a certain group is significantly lower than the overall rate, trigger bias review

Indicator Two: Assessment Format Preference

Statistics on the proportion of candidates choosing different assessment formats
If the selection rate for a certain format is abnormally low, it may mean the format is too difficult or perceived as unfair

Indicator Three: Parallel Path Distribution

Statistics on the applicant proportion and pass rate of the three certification paths
If the pass rate of a certain path is significantly lower than others, it may mean the path design is unreasonable

Indicator Four: Bias Allegations in Appeals

Statistics on the proportion of appeals involving "cultural bias," "cognitive discrimination," or "format unfairness"
If abnormally high for multiple consecutive years, trigger comprehensive review

Quarterly Review Meeting:

Regularly convene a "Bias Review Meeting," composed of representatives of the three subjects + statisticians + representatives of affected groups
Review bias indicator data from the previous cycle
If significant bias is found, propose corrective solutions (such as: modifying assessment formats, adjusting scoring standards, increasing tutoring resources)

Annual Bias Report:

Regularly publish the "Capability Certification Bias Report," publicly disclosing all bias indicator data
The report must include "corrective measures" and "improvement plans"

VI. Summary

The design of capability certification follows one core principle: Diversity is not decoration, but a safety condition.

Single-dimension assessment creates blind spots—technical geniuses may lack ethical judgment, social masters may not understand system operations. The three-dimensional design ensures that high-risk threshold holders are "well-rounded individuals," not "lopsided geniuses."

The design of parallel certification paths follows another core principle: Fairness is not uniform form, but equivalent outcome. Not everyone is suited for exams, not everyone is good at interviews. Providing multiple paths allows people with different strengths to prove their capabilities in their own ways.

But these designs are not perfect. They may have subjective reviews, insufficient resources, or appear clumsy in the face of rapidly iterating technology. Their transparency and correctability allow these flaws to be detected, questioned, and improved.

Institutional engineering honesty: We are designing a "pluralistic but imperfect" assessment, not a "singular but efficient" screening. The former can be challenged; the latter cannot.

Capability Certification

I. Introduction: Why Capability Certification Is the Core Battlefield

II. Three-Dimensional Framework of Capability

2.1 Dimension One: Technical Capability

2.2 Dimension Two: Social Coordination Capability

2.3 Dimension Three: Ethical Judgment Capability

III. Passing Standard Principles

IV. Parallel Certification Paths

4.1 Why Parallel Paths Are Needed

4.2 Types of Parallel Paths

4.3 Equivalence Guarantee for Parallel Paths

V. Continuous Monitoring of Cognitive Bias

VI. Summary