Open call · Fall 2026 cohort · Book development

The Calibration Program.
Research program and forthcoming book.

The work that funds the book.

Most claims about AI and leadership are anecdote. The Calibration Program is two focused, AI-assisted research studies of how generative AI is changing executive judgment at Fortune 500 scale—plus the frameworks, case studies, and governance tools that turn the research into a complete book. Target publication: Q4 2027.

Why we're running this

In the last twenty-four months, every F500 organization has put generative AI into the hands of its leaders. Vendors claim it makes decisions faster, sharper, more data-informed. Critics claim it erodes judgment, inflates confidence, and floods portfolios with low-quality work. Both camps are arguing from anecdote.

The Calibration Program is a focused attempt to look at the actual effect, at two levels of analysis, with public methodology, hand-validated findings, and honest scope. We are not running it because we know the answer. We are running it because nobody does, and because the answer changes how every operator we advise should think about AI rollouts.

Both studies are open. We publish the methodology before the fieldwork, the coding scheme before the data, and the data before the opinions. We publish whatever we find, the flattering patterns and the unflattering ones. This is the work that funds the opinions.

This research becomes a book. The Calibration Program is the foundation for a forthcoming book: The Calibration Debt: How AI Changes Executive Judgment in Enterprise (target publication Q4 2027). The book combines the two core studies with governance frameworks, case studies from participating companies, and practical tools any F500 leader can use to audit their own calibration debt.

Study 01 · The micro question · Book Part I

The Confidence Inflation Hypothesis.

Does AI assistance make Fortune 500 executives more confident in their prioritization decisions without making those decisions better, and how large is the gap?

Working
hypothesis
The Calibration Debt AI assistance lifts how confident F500 executives feel about prioritization decisions faster than it lifts how good those decisions actually turn out to be. The gap between felt confidence and decision quality is the calibration debt. Working hypothesis, not a pre-registered prediction. The point is to look honestly.
Method
  • 20–30 director and VP-level participants across 10–15 Fortune 500 companies, recruited through the NS Studio network and partner companies.
  • Single 60-minute semi-structured interview per participant, conducted by the lead researcher.
  • Interview covers: how AI is currently used in their prioritization work, recent decisions where AI assisted (or didn't), where the assistance felt most and least trustworthy, and what they wish their AI tools did differently.
  • Transcripts coded thematically using a published coding scheme. Frontier LLMs (Claude / GPT-4-class) used for first-pass coding under hand validation by the researcher.
  • Inter-rater reliability between human and LLM coding reported on a held-out validation subset.
Scope
parameters
Sample size20 minimum, 30 target
Output typeQualitative findings report. No statistical inference claims.
MethodologyPublic methodology document on nsstudiollc.com before any interviews are conducted.
Honest reportingIf interviews don't surface a meaningful calibration gap, that null finding is the report.
ConflictsNo vendor funding accepted. Partner companies receive findings; do not influence design or interpretation.
Book chaptersChapters 1-3 of forthcoming book based on this study.
What we'll
publish
  • Public methodology and coding scheme before fieldwork begins.
  • Anonymized findings report, themes, illustrative quotes (with consent), and pattern observations across the cohort.
  • All LLM coding prompts, validation samples, and inter-rater agreement statistics published alongside the report.
  • Plain-English brief. A self-assessment any F500 leader can use to estimate their own calibration debt.
  • Book chapters 1-3: The Individual Effect – The Confidence Inflation Hypothesis (target Q4 2027)
Study 02 · The macro question · Book Part II

Priority Drift Under AI.

Does adopting generative AI tools accelerate the decay rate of stated F500 strategic priorities, that is, does AI make companies more scattered over time, not less?

Working
hypothesis
Priority Drift Companies that have moved hardest into generative AI may show shorter-lived stated strategic priorities, priorities that appear, get repeated for a quarter or two, and then quietly disappear. AI lowers the cost of starting things. That can create portfolio sprawl, not focus. Working hypothesis, not a pre-registered prediction. We're looking to see whether the pattern is real.
Method
  • Focused corpus of 200–300 quarterly earnings call transcripts: ~30 F500 companies across a mix of industries (tech, retail, financial services, healthcare), 2 calls per company per year, 5-year window.
  • Priorities extracted from each call using a published coding scheme. Frontier LLMs (Claude / GPT-4-class) used for first-pass extraction; researcher hand-validates a subset of every batch.
  • Priority survival tracked across quarters: how often does a stated priority reappear in the next call, the call after, and so on. Patterns are descriptive, not statistical.
  • AI adoption signals taken from public sources, 10-K language, vendor partnership announcements, named AI investments in earnings calls themselves.
  • Cross-company comparison: do high-AI-adoption companies show different priority patterns than peers in the same industry?
Scope
parameters
Corpus size200–300 calls across ~30 companies
CodingLLM-assisted extraction with hand validation. Inter-rater agreement reported on a 30-call validation sample.
MethodologyPublic coding scheme on nsstudiollc.com before corpus coding begins.
Honest reportingIf high-AI-adoption companies don't show different priority patterns, the null finding is the report.
OutputPattern report. Descriptive analysis. No claims of causal inference.
Book chaptersChapters 4-6 of forthcoming book based on this study.
What we'll
publish
  • Public coding scheme and LLM extraction prompts before corpus coding begins.
  • Anonymized priority dataset across the corpus, where partner agreements permit.
  • Pattern report: company-level and industry-level findings, illustrative cases (with sources cited).
  • Plain-English brief and a self-check any operator can run against their own company's last eight quarterly calls.
  • Book chapters 4-6: The Organizational Effect – Priority Drift Under AI (target Q4 2027)
Book development

From research to frameworks to published book.

The Calibration Program research feeds a complete book manuscript. Two core studies + three additional parts with frameworks, case studies, and governance tools = a complete guide to AI-era executive judgment.

Book title (working)

The Calibration Debt: How AI Changes Executive Judgment in Enterprise

Proposed structure

Part I The Individual Effect
Chapters 1-3: The Confidence Inflation Hypothesis
Based on Study 01: 20-30 F500 executive interviews
Part II The Organizational Effect
Chapters 4-6: Priority Drift Under AI
Based on Study 02: 200-300 earnings call analysis
Part III What to Do About It
Chapters 7-9: The Calibration Audit, Governance Frameworks, Case Studies
Derived from participating companies + additional fieldwork
Part IV The Future
Chapters 10-11: AI-Native Decision-Making, Next-Gen Enterprise Oversight
Author's synthesis and forward-looking frameworks

Book timeline

Q3 2026 – Q3 2027
Complete Study 01 and Study 02 fieldwork. Document case studies from participating companies.
Q4 2027
Manuscript completion. Parts I-II from research findings. Parts III-IV from frameworks and synthesis.
Q1 2028
Beta readers (20 participants from research program). Revision based on feedback.
Q2-Q4 2028
Publication. Speaking circuit. Partner company readouts with book frameworks applied to their data.

How this work actually gets done.

The Calibration Program is led by a single principal researcher , the founder of NS Studio, with frontier LLMs (Claude, GPT-4-class) used as a structured research assistant under direct human validation. We are open about this because it is the methodology, not a workaround.

What the human does. Designs the questions and the coding scheme. Conducts every executive interview personally. Hand-codes the validation subsets that anchor every batch of LLM-assisted analysis. Decides what the patterns mean. Writes every word of every published finding and every book chapter.

What the LLM does. Transcribes interview audio. Reads transcripts and extracts coded statements at scale, prompted with the published coding scheme. Drafts data tables for human review. Helps with the volume work that would otherwise require a research team.

What we publish so you can check the work. Every prompt used. Every coding scheme. Inter-rater agreement statistics between the human and the LLM on the validation subsets. The validation samples themselves. If the LLM and the human disagreed too much on a batch, that batch gets re-done. You can replicate the work, extend it, or disconfirm it.

This is what AI-native research looks like in 2026. The alternative, secretly using LLMs while pretending the work was done by hand, is what most consulting firms are doing right now. We'd rather just say what we're doing.

Questions we're still wrestling with.

We are publishing these openly because we don't have clean answers yet. If any of them are your problem, write to us. The program is better with more critics.

How do we get useful answers about decision quality from a single 60-minute interview, without the whole conversation collapsing into recency bias?

How do we handle the selection effect where the leaders most willing to participate may already be the most calibration-aware?

Is "stated priority" on an earnings call a clean enough construct, given that earnings calls are themselves a performance for analysts?

What's the right inter-rater agreement threshold between human and LLM coding before we trust an extraction batch, and how do we report it honestly?

How much of Study 02 should be one researcher hand-coding everything end-to-end, versus LLM-assisted with hand validation? Where does each method earn its keep?

What's the right way to share early findings without anchoring participants who haven't been interviewed yet?

How do we structure Parts III-IV of the book (frameworks, case studies, governance tools) to be useful without being prescriptive vendor theater?

Open call · Fall 2026 cohort

We are looking for two kinds of contributor.

The Calibration Program is funded by NS Studio and runs independently of vendor or platform sponsorship. Partner companies receive findings privately three months before any public release. All research participants receive early access to book chapters as they're drafted.

Study 01 participants

We're looking for 20–30 directors and VPs at Fortune 500 companies for a single 60-minute interview about how AI is changing your prioritization work.

  • One conversation, scheduled at your convenience
  • Anonymized in all published findings
  • You get the full findings report before public release
  • Early access to book chapters as they're drafted
  • No vendor, no sales pitch, no follow-up sequence

Most participants find the interview itself useful, it's a structured hour to think out loud about how AI is actually changing how you prioritize.

Volunteer for an interview →

Partner companies

We're looking for 5–8 Fortune 500 organizations willing to go deeper than a single interview, under standard mutual NDA:

  • 2–3 director/VP-level participants for Study 01 from your team
  • Optional: your company in the Study 02 corpus with permission to discuss findings
  • Permission to anonymize and aggregate

In return: a private benchmark report against the cohort, early access to all findings, advance copy of the book, and one full facilitated readout for your leadership team.

Become a partner →
Follow the program

Working notes, methodological updates, book progress, and early findings: posted as they happen.

Two channels. The personal one is where the research voice lives, methodological wrestling, early signals, public corrections, book development updates. The studio one is where firm-level news and engagement openings get posted.

Program timeline

Honest about when, not just whether.

Q3 2026 · now
Foundation phase, methodology drafts, coding scheme development, methods learning in public.
Open call for participants and partners runs through October 2026. Pilot interviews begin late Q3.
Q4 2026 · methodology published
Public methodology and coding schemes for both studies posted on nsstudiollc.com before any fieldwork begins.
The order matters. Methodology before data, coding scheme before extraction, scope before fieldwork.
Q1-Q3 2027 · fieldwork
Study 01 interviews. Study 02 corpus coding with hand-validated LLM extraction.
Quarterly methodology and progress notes published openly. No findings released during fieldwork.
Q4 2027 · partner readouts + manuscript
Private benchmark reports to partner companies. Manuscript completion (Parts I-IV). Three-month head start before public release.
Partners read findings against their own cohort, anonymized. Book manuscript enters beta reader phase.
Q1 2028 · beta readers
20 beta readers from research program participants. Manuscript revision based on feedback.
Research participants get first look at complete book. Frameworks tested against their real-world experience.
Q2-Q4 2028 · publication
Book publication + findings reports, plain-English briefs, all coding schemes, prompts, and validation samples.
Partner companies receive advance copies 3 months before public release. Full research appendix published open-access. Speaking circuit begins.

Want to be in the research program?

If you're a director or VP at a Fortune 500 company and you make AI-assisted prioritization decisions every week, we want to hear from you. One 60-minute interview. Anonymized. Early access to book chapters as they're drafted.

Participate in research + book
Or Ask about the book

DISCLOSURES · The Calibration Program is funded by NS Studio LLC and led by a single principal researcher with frontier LLMs (Claude / GPT-4-class) used as a structured research assistant under direct human validation. We accept no vendor or platform funding for either study. Partner companies participate under standard mutual NDAs and have no influence over study design, methodology, analysis, or publication decisions. Null findings will be published under the same timeline as positive findings. No participating executive's identity will be revealed in any public artifact without explicit written consent. All interview audio is transcribed via privacy-respecting tools, stored encrypted, and deleted at the close of the program. Research findings feed a forthcoming book, The Calibration Debt: How AI Changes Executive Judgment in Enterprise (target publication Q4 2027). Research participants receive early access to book chapters. Partner companies receive advance copies 3 months before public release. Status: Open call · Fall 2026 cohort · methodology drafts in development.