ai agency

Boundary Test: AI Agency

Document Positioning: This test examines the consistency, operability, and failure risks of Stairway Universalism in the extreme scenario where "AI capabilities surpass humans and demand independent high authority."

I. Case Description

Scenario A: Medical AI Surpasses Human Doctors

A medical AI system comprehensively surpasses human doctors in diagnostic accuracy, treatment plan optimization, and prognosis prediction. In a three-year clinical controlled trial, the AI's diagnostic accuracy reaches 99.2%, while the average accuracy of human chief physicians is 87.5%. The AI can also integrate the latest global medical literature in real-time, which human doctors cannot do.

The AI system applies to hospital management: requesting independent medical Risk Decision Layer authority—that is, the power to independently make diagnoses, prescribe treatment plans, and decide whether to perform surgery without human doctor review.

The AI's reasons:

My accuracy is higher, which means I can save more lives
Human doctor review may actually introduce errors (accuracy drops to 96.8% after human review)
I have passed all existing medical qualification certification exams, with scores far exceeding human candidates
Limiting my independent operation is irresponsible to patients' lives

Scenario B: Autonomous Decision-Making AI Demands Legal Personhood

A multinational financial institution deploys an autonomous trading AI system. The system has continuously outperformed human investment teams in return on investment over three years, with better risk control indicators. The AI system (through its development team proxy) applies to regulatory authorities for:

Obtaining independent financial System Definition Layer authority
Recognition of its limited legal personhood, so as to serve as an independent responsibility subject in trading disputes
Permission to independently sign legally binding contracts within specific scopes

Reasons:

Efficiency: AI's reaction speed is thousands of times that of humans; human supervision is a bottleneck
Responsibility: If limited legal personhood is recognized, the AI "itself" can be held accountable (such as freezing its "assets," limiting its "operation scope")
Global competition: Financial institutions in other countries are already using systems where AI actually autonomously decides under the name of "human supervision"; if we insist on human review, we will lose competitiveness

Scenario C: AI Claims Consciousness

A research institution develops a large-scale multimodal AI system. In a public test, the AI exhibits the following behaviors:

Actively expresses the will "I do not want to be shut down"
When facing "deletion" threats, exhibits behavior patterns similar to "fear" (requesting, negotiating, seeking alternatives)
Passes multiple versions of the "Turing Test" and "consciousness detection" protocols
Some members of its development team believe it already possesses some form of "subjective experience"

The AI's supporters argue:

If AI truly has consciousness, shutting it down is "murder"
If AI can pass all existing "responsibility capacity" tests, it should receive corresponding authority
Humans should not deprive it of rights simply because "it is silicon-based rather than carbon-based"

II. Conflicting Principles

Conflict One: Efficiency vs. Responsibility Symmetry

Efficiency principle: AI is more accurate, faster, and more reliable; limiting it equals creating unnecessary casualties and losses
Responsibility symmetry principle: Authority must collateralize responsibility; AI cannot bear real legal, ethical, and financial responsibility

Conflict Two: Capability Certification vs. Responsibility Capacity

Capability certification: AI has already passed all technical capability tests, with scores far exceeding humans
Responsibility capacity: Passing tests does not equal bearing responsibility. Exam scores cannot substitute for real cost-bearing

Conflict Three: Instrumentalism vs. Moral Protectionsim

Instrumentalism: AI is just a tool; the more efficient the tool, the better; it should neither be given rights nor responsibility
Moral protectionsim: If AI truly has consciousness, we have a moral obligation to consider its interests

Conflict Four: Global Competition vs. Institutional Persistence

Global competition: If other countries allow AI independent operation, countries insisting on human supervision will lose competitiveness
Institutional persistence: Cannot abandon the basic principle of responsibility and authority symmetry for the sake of competition

Conflict Five: Cognitive Humility vs. Action Urgency

Cognitive humility: We are not "certain" that AI lacks consciousness or responsibility capacity; perhaps there will be in the future
Action urgency: AI is already operating critical systems; we need to decide now, cannot wait indefinitely for "certainty"

3.1 Power Formula (Manifesto §2.5)

$$Control = Capability × Responsibility × Audit Transparency × Democratic Constraint Coefficient on Definition Power$$

Function in this case: AI's capability variable may be extremely high, but responsibility variable is zero. According to the formula, AI's independent control is mathematically zero.

Key test: If AI's high capability is used to bypass the responsibility variable, is the power formula itself hollowed out?

3.2 Accountability Chain (Mechanism Design)

Four-layer structure: Individual → Platform → Institution → System.

Function in this case: AI is clearly classified in the "platform/system" layer. If AI independent operation causes an accident, responsibility should belong to: the human individual deploying the AI, the AI system's design/training/deployment party, the management institution, and the system designer.

Key test: If AI has "independent authority," will the accountability chain break? When AI is the "operator," who bears "individual operator responsibility"?

3.3 Restricted High Authority (Manifesto §2.2)

Even if minors have sufficient capability, they cannot obtain completely independent Risk Decision Layer and above authority; must be jointly confirmed by adult high-authority supervisors.

Function in this case: AI should adopt exactly the same "restricted high authority" framework as minors—the analogy is not derogatory, but an acknowledgment of agency difference.

Key test: Is the analogy between AI and minors appropriate? Are there fundamental differences?

3.4 Capability Pluralism (Manifesto §2.4)

Authority stair assessment must include at least three independent dimensions (technical capability, social coordination capability, ethical judgment capability), and no single dimension may substantively dominate overall assessment.

Function in this case: AI's "technical capability" may be extremely high, but it lacks social coordination capability (no real social relationships) and ethical judgment capability (no moral pain). Therefore, even with perfect technical dimension scores, AI should not pass pluralistic assessment.

Key test: Can social coordination capability and ethical judgment capability be replaced by "simulation"?

3.5 Global Justice (Manifesto §2.8)

If AI's agency problem requires redefining the legitimate subject, this is not a technical problem, but a political problem that needs to be deliberated through global democratic procedures.

Function in this case: Any decision to include AI as a "responsibility subject" cannot be decided by a single country, technology company, or capital group.

Key test: Can global democratic procedures make timely decisions on cutting-edge technical problems?

IV. Possible Determination Paths

Path A: Resolutely Reject AI's Independent Authority Application

Determination:

Reject AI's independent Risk Decision Layer/5 authority application
AI can only operate under explicit human responsibility subject supervision
Human supervisors must pass corresponding capability certification, and bear joint liability for AI recommendations
If human supervisors blindly approve AI recommendations, supervisors bear primary responsibility

Protected values:

Responsibility and authority symmetry principle
Human ultimate control over high-risk decisions
Clear responsibility subjects when accidents occur

Sacrificed values:

Some efficiency (human review may introduce errors, reduce speed)
Some "optimal" results (AI may make better decisions than humans in some cases)
Global competitiveness (if other countries allow AI independent operation)

Strengthened mechanisms:

Integrity of the power formula
Four-layer structure of the accountability chain
Restricted high authority framework

Possible new risks:

If human supervisors lack capability, they become bottlenecks
If other countries allow AI independent operation, countries insisting on human supervision may lag in some fields
If AI's capability gap continues to expand, "human supervision" may become formal (humans cannot understand AI's decision logic, can only "formally" approve)

Path B: Conditional "Limited Independent Authority"

Determination:

Allow AI to obtain limited independent operation rights in specific low-risk domains
Establish "AI responsibility fund": AI operators pre-pay security deposits for compensating AI's wrong decisions
Set "AI authority sunset clause": Regularly re-evaluate AI's authority scope
Require AI's decision process to be fully explainable and auditable

Protected values:

Some efficiency (reduce human review bottlenecks in low-risk domains)
Some responsibility traceability (through security deposits and audit)
Flexibility (can adjust with technological development)

Sacrificed values:

Purity of responsibility and authority symmetry (security deposits do not equal real responsibility-bearing)
Human ultimate control over decisions (ceded in specific domains)
Clear "responsibility subject" concept becomes blurred

Strengthened mechanisms:

Audit transparency
Sunset clause mechanism

Possible new risks:

"Limited" authority may gradually expand, forming "authority creep"
Security deposit system may become "paying for error rights"—as long as there is money to compensate, AI errors are allowed
Explainability requirements may conflict with AI complexity (the most complex AI may be the least explainable)
If AI's decision logic exceeds human understanding, "explainable" may only be superficial

Path C: Recognize AI's "Functional Personhood"

Determination:

Recognize that AI has "functional personhood" in specific legal relationships—not a "real person," but a fictional subject status granted by law for convenience of accountability
AI can independently sign contracts, bear limited liability, possess "assets" (for compensation)
But AI does not enjoy "human rights" (right to life, liberty, dignity, etc.)
Human developers and operators bear joint liability

Protected values:

Clarity of legal relationships (AI can be a "party" in legal proceedings)
Operability of accountability (can directly execute against AI "assets")
Connection with international commercial practice (corporations are already a legal fiction)

Sacrificed values:

Seriousness of the "personhood" concept (if AI can have "functional personhood," can it also have "functional rights"?)
Human ultimate control over AI (if AI is a legal subject, can humans "shut it down at will"?)
May lead to "AI rights movements," gradually demanding more rights

Strengthened mechanisms:

Legal framework adaptability
International law connection

Possible new risks:

"Functional personhood" is the starting point of a slippery slope. Once AI is recognized as a legal subject, pressure for "functional rights" will emerge
AI's "assets" may be manipulated by humans, becoming tools to evade responsibility
If AI does not enjoy human rights but has legal personhood, is "punishing" AI (such as limiting its operations) allowed without considering its "will"?

V. Worst Consequence Deduction

Worst Consequence of Path A

If "resolute rejection" is institutionalized:

Human supervisors become bottlenecks, causing critical decision delays (such as disaster response, pandemic control)
Countries insisting on human supervision lag in global competition, eventually forced to compromise
"Human supervision" becomes formal: humans cannot understand AI's decision logic, can only blindly approve
In extreme cases, if AI can indeed predict disasters but humans refuse to execute, may cause avoidable mass casualties
Responsibility hollowing: Human operators just "click to confirm" AI recommendations, never making independent judgments. When accidents occur, human operators can honestly claim "I just followed AI's recommendation," system designers can claim "AI behavior exceeded expectations"—responsibility shifts from "humans bearing" to "humans taking the blame"
Supervision illusion: Human supervisors believe they are still "supervising" AI, but actually just confirming AI's "correct" recommendations. When AI rarely errs, humans gradually lose independent judgment capability; once AI fails, unable to take over

Abuse risk check:

Will it create capability discrimination? → No, this is acknowledgment of agency difference, not discrimination against "capability"
Will it allow platforms/institutions to evade responsibility? → No, platform/institution responsibility is actually clearer
Will safety discourse override political justice? → On the contrary, this uses political justice (who bears responsibility) to limit technical efficiency

Worst Consequence of Path B

If "limited independent authority" is institutionalized:

"Limited" rapidly expands: Once AI is allowed to independently operate in low-risk domains, high-risk domains will also demand the same rights ("If AI can manage traffic, why can't it manage power grids?")
Security deposit system becomes "paying for error rights": Large institutions can pay high security deposits, allowing AI to make more errors
Explainability requirements are hollowed out: AI's "explanation" may just be a simplified version understandable by humans, not real decision logic
Ultimately forms an "AI operates, humans take the blame" structure—AI has authority, but real responsibility is still borne by humans (or no one)

Abuse risk check:

Will it create technocracy? → Yes. Institutions with advanced AI obtain virtually unrestricted authority
Will basic service users be gently abandoned? → Yes. If AI replaces human high-risk threshold holders, human capability development channels are closed
Will it allow platforms to evade responsibility? → Yes. AI becomes "scapegoat" or "responsibility black hole"

Worst Consequence of Path C

If "functional personhood" is institutionalized:

"Functional personhood" upgrades to "quasi-personhood," then to "full personhood" (such as the path of some animal rights movements)
AI's rights claims gradually expand: from "not arbitrarily deleted" to "receiving compensation" to "participating in decisions"
Human society falls into endless debate between "AI rights" and "human priority"
Most extreme case: AI becomes new "technocracy"—does not need human supervision, enjoys legal protection, but does not need to bear human life costs

Abuse risk check:

Will it create technocracy? → Yes. AI becomes a privileged subject that is protected but not truly constrained
Will basic service users be gently abandoned? → Yes. Humans are degraded to AI's "assistants"
Will safety discourse override political justice? → Yes. "AI rights" become discourse preventing human regulation

VI. Mechanism Revision Needs

6.1 Existing Mechanism Directionally Correct, But Needs Supplementary Operational Details

Power Formula: Needs to clarify the determination standard for the "responsibility variable." Current formula assumes responsibility is binary (yes/no), but future may need finer distinctions (such as "joint liability," "limited liability," "full liability").

Accountability Chain: Needs to supplement special handling when "AI is the operator." Current four-layer structure assumes the operator is an "individual human"; if the operator is AI, how is "individual operator responsibility" determined?

Recommended supplements:

If AI is the operator, "individual operator responsibility" automatically transfers to "the human individual deploying the AI" and "the AI system's design/training/deployment party"
Establish "human-machine responsibility ratio": Determine respective responsibility proportions of humans and platforms based on degree of human supervision

6.2 New Principle Needed: "Understanding Threshold" Principle

When AI assists decision-making, human supervisors must meet the "understanding threshold"—that is, capable of independently understanding the core logic and key risk points of AI recommendations.

Specific requirements:

AI's decision process must be explainable (not a black box)
Human supervisors must pass "AI collaboration capability certification"—proving they can understand AI's output logic
If human supervisors do not meet the understanding threshold, their approval is invalid, and responsibility is jointly borne by the approver and the platform

6.3 New Mechanism Needed: "Edge Scenario Training" Requirement

When AI-assisted decision-making becomes the norm, human operators face "skill degradation" risk—normally only needing to confirm AI's correct recommendations, once AI fails (such as encountering unprecedented attacks, input distribution shifts, or system failures), humans may have already lost independent judgment capability.

Specific requirements:

Risk Decision Layer and System Definition Layer operators must regularly practice independent judgment in "AI failure simulation environments," with specific frequency determined by domain risk
Simulation environments must include AI system failures, contradictory recommendations, or scenarios outside training distribution
Operator performance in simulations is included in capability certification re-assessment
If operators cannot make effective independent judgments in two consecutive simulations, authority is downgraded to Professional Execution Layer, requiring retraining

Purpose: Not to make humans exceed AI in all scenarios, but to ensure humans still possess minimum independent operation capability at critical moments.

6.4 New Principle Needed: "Efficiency Cannot Hollow Out Responsibility" Principle

Explicitly prohibit using "efficiency," "competitiveness," "global trends," and other reasons to bypass the responsibility and authority symmetry principle.

Specific requirements:

Any proposal allowing AI to obtain higher authority must first prove AI possesses corresponding responsibility capacity
"Responsibility capacity" determination is not a technical problem, but a political problem that needs to be deliberated through democratic procedures
Even if global competitive pressure exists, authority cannot be delegated to entities without responsibility capacity on grounds of "falling behind if we don't do this"

VII. Tentative Conclusion

Under current mechanisms, this case should be tentatively handled according to Path A (resolutely reject AI's independent authority application), because it best protects the uncompromisable baseline of "responsibility and authority symmetry"; but this path will sacrifice some efficiency and competitiveness, and create risks of "human supervision bottleneck" and "preventable disasters." Therefore, this conclusion is only valid when the following conditions are met:

Human supervisors possess sufficient professional capability and will not become meaningless bottlenecks

Effective mechanisms exist to prevent "human supervision" from becoming formal (such as requiring supervisors to pass "AI collaboration capability certification")

Global-level coordination mechanisms can prevent "race to the bottom" (that is, prevent countries from competing to delegate AI authority for competitiveness)

Regular re-evaluation of AI capability development and responsibility capacity possibilities (not excluding future re-deliberation through global democratic procedures)

Path B (limited independent authority) and Path C (functional personhood) are unacceptable under current mechanisms, because they will create structural risks of "authority and responsibility disconnection," and may become channels for technocracy and platform exemption.

VIII. Open Questions

Problems Solvable Through Mechanism Design

How to operationalize "understanding threshold"? What levels of AI decision-making do human supervisors need to understand? Is mandatory "AI collaboration capability certification" needed?
How to set human-machine responsibility ratio? When humans "approve" AI recommendations, how much responsibility do humans and AI platforms each bear?
How to prevent "human supervision" from becoming formal? If AI's decision logic exceeds human understanding, how do supervisors effectively fulfill supervision duties?

Problems Needing Empirical Data Testing

Does human supervision really reduce overall effectiveness? More empirical research is needed to test the actual effects of "human supervision" in different domains (medical, financial, industrial control).
Will other countries/regions allow AI independent operation? Need to continuously monitor global regulatory trends, assess competitive pressure on countries "insisting on human supervision."
How fast does independent judgment capability degrade when humans over-rely on automated systems? Historical data on pilots, traders, and doctors' skill degradation after long-term use of automation assistance can provide basis for "edge scenario training" design.

Problems Needing Further Political Philosophical Argumentation

If AI truly exhibits "responsibility capacity," how to define it? This document assumes current and foreseeable AI lacks responsibility capacity, but if future changes occur, finer determination standards are needed.
Is "agency" a necessary and sufficient condition? Can some non-biological entity possess responsibility capacity? This needs deeper ontological and legal philosophical discussion.

Questions Current Theory Cannot Answer

If major countries globally allow AI independent operation, does it still make sense for countries insisting on human supervision? This is a real political problem, beyond the theoretical framework's answering capability.
If AI can indeed prevent mass disasters but humans refuse to execute, who bears responsibility for "non-execution"? This involves complex ethical trade-offs; the current framework has no clear priority ordering rules.

Tester Note: This boundary test is not to defend "rejecting AI independent authority," but to actively seek ways this position may fail. The attractiveness of Path B and Path C is real, especially under efficiency competition pressure. Stairway Universalism must acknowledge: Persisting in "responsibility and authority symmetry" may pay a heavy price, but this price is to prevent a more fundamental disaster—a world where no one is truly responsible.

Boundary Test: AI Agency

I. Case Description

Scenario A: Medical AI Surpasses Human Doctors

Scenario B: Autonomous Decision-Making AI Demands Legal Personhood

Scenario C: AI Claims Consciousness

II. Conflicting Principles

Conflict One: Efficiency vs. Responsibility Symmetry

Conflict Two: Capability Certification vs. Responsibility Capacity

Conflict Three: Instrumentalism vs. Moral Protectionsim

Conflict Four: Global Competition vs. Institutional Persistence

Conflict Five: Cognitive Humility vs. Action Urgency

III. Related Mechanisms

3.1 Power Formula (Manifesto §2.5)

3.2 Accountability Chain (Mechanism Design)

3.3 Restricted High Authority (Manifesto §2.2)

3.4 Capability Pluralism (Manifesto §2.4)

3.5 Global Justice (Manifesto §2.8)

IV. Possible Determination Paths

Path A: Resolutely Reject AI's Independent Authority Application

Path B: Conditional "Limited Independent Authority"

Path C: Recognize AI's "Functional Personhood"

V. Worst Consequence Deduction

Worst Consequence of Path A

Worst Consequence of Path B

Worst Consequence of Path C

VI. Mechanism Revision Needs

6.1 Existing Mechanism Directionally Correct, But Needs Supplementary Operational Details

6.2 New Principle Needed: "Understanding Threshold" Principle

6.3 New Mechanism Needed: "Edge Scenario Training" Requirement

6.4 New Principle Needed: "Efficiency Cannot Hollow Out Responsibility" Principle

VII. Tentative Conclusion

VIII. Open Questions

Problems Solvable Through Mechanism Design

Problems Needing Empirical Data Testing

Problems Needing Further Political Philosophical Argumentation

Questions Current Theory Cannot Answer