Designing for calibrated trust and verification in agentic AI video conferencing tools

2026 (Work in Progress)

Project Type

Corporate Sponsorship

Duration

January – June 2026

Tools

Figma, Elicit, Claude

Deliverables

Interactive prototypes, adoption framework

Team

Fourward Team: 2 UX Researchers and 2 Product Designers Microsoft XSD Team: 3 Researchers

Role

Design Lead

Designing for calibrated trust and verification in agentic AI video conferencing tools

2026 (Work in Progress)

Project Type

Corporate Sponsorship

Duration

January – June 2026

Tools

Figma, Elicit, Claude

Deliverables

Interactive prototypes, adoption framework

Team

Fourward Team: 2 UX Researchers and 2 Product Designers Microsoft XSD Team: 3 Researchers

Role

Design Lead

Designing for calibrated trust and verification in agentic AI video conferencing tools

2026 (Work in Progress)

Project Type

Corporate Sponsorship

Duration

January – June 2026

Tools

Figma, Elicit, Claude

Deliverables

Interactive prototypes, adoption framework

Team

Fourward Team: 2 UX Researchers and 2 Product Designers Microsoft XSD Team: 3 Researchers

Role

Design Lead

/ Overview

/ Overview

/ Overview

What happens when nobody double-checks?

Working with Microsoft's Customer Experience Design (XSD) team as part of a 6-month HCDE capstone, I researched how users decide when and how to verify AI-generated meeting outputs — and designed for calibrated trust, not maximum trust.

Workplace video conferencing is one of the first contexts where companies are deploying AI agents at scale. Microsoft Copilot automatically generates meeting summaries, action items, and transcripts — but when people don't verify these outputs, consequences compound: inaccurate records become organizational memory, tasks get assigned that were never agreed upon, and leaders make strategic decisions from incomplete summaries.

Project Status

This case study is updated as the project progresses. Research phases (lit review, expert interviews, user interviews, co-design workshop) are complete. Concept validation is currently underway. Final presentation to Microsoft in June 2026.

AI generates → User skims/skips → User shares.
Our goal: design the verification step back into the workflow.

12

expert interviews on trust and AI in the workplace

7

user interviews on real verification behavior

30+

academic papers synthesized on trust, automation, and human-AI interaction

2

design prototypes addressing verification gaps

/ Prototype — Ambient Meeting Awareness Panel

mid-fidelity prototype, work in progress. currently in usability study with 4 participants.
During meeting

Ambient Awareness Panel concept: a non-disruptive, live-tracking AI during meetings.

End of meeting

End-of-meeting co-review: host-triggered alignment session before the meeting closes.

/ The Problem

/ The Problem

AI meeting tools are built for trust. Nobody designed for verification.

Microsoft Copilot and similar tools present all meeting outputs — summaries, action items, decisions — with equal confidence, whether they're accurate or fabricated. Our research identified two compounding gaps in how users interact with these outputs.

Verification Gap 1

Navigation friction

Moving from a claim in the summary to its source in the transcript is slow and disorienting. UTC timestamps, fragmented tool ecosystems, and no inline links mean most verification stops at memory check—the least reliable layer.

Design opportunity

Make the path from claim to source fast enough that verification becomes a natural part of reading, not a separate task.

Verification Gap 2

Fabrication detection

AI confidently asserts content that was never said. Because all outputs look identical regardless of confidence, users have no signal to know what needs checking. Subtle fabrications pass undetected.

Design opportunity

Surface AI confidence levels so users know what to verify—without making every output feel uncertain.

Microsoft Teams CoPilot generates AI summaries after user's meetings. These artifacts often go unverified, or very lightly skimmed.

The verification cascade

Users verify through four escalating layers: memory check (effortless) → transcript search (moderate) → recording rewatch (high effort) → human escalation (highest + socially costly). Most verification stops at Layer 1 or 2. Subtle fabrications — which require Layer 3 or 4 to catch — pass undetected.

/ Research

/ Research

A five-phase research and design process

I designed our methodology to move from understanding how verification works today, to generating ideas with users, to prototyping and validating solutions — before final delivery to Microsoft in June 2026.

✅ Phase 01
Understand

Literature review + Expert & User Interviews

I synthesized 30+ papers on trust, automation, and AI. Conducted 12 expert interviews and 7 user interviews to understand how verification works (and fails) in real workplace contexts.

✅ Phase 02
Generate

Co-Design Workshop

4-participant co-design session (2 hrs) using a 9-panel storyboard scenario. Generated 20+ ideas across before, during, and after meeting phases. Synthesized into two confirmed design directions.

✅ Phase 03
Prototype

Lo-fi interactive prototypes, human-created and AI-enhanced

Sketched an Ambient Meeting Awareness Panel (HMW 1) from co-design ideas and prototyped in Figma Make. Confidence Transparency prototype (HMW 2) in progress.

🔄 Phase 04
Validate

Concept Validation

8–10 participants, 45–60 min sessions. Testing whether our design directions solve the right problems in the right way. Currently underway — findings will inform prototype iterations.

🔜 Phase 05
Test

Usability Testing + Final Delivery

Semi-structured think-aloud sessions with returning participants. Final prototypes, design principles, and recommendations delivered to Microsoft XSD. Capstone showcase: June 2026.

/ Key Findings

/ Key Findings

What our research sessions taught me about verification

Finding 01

Institutional trust creates the verification gap.

What we heard

"No worries! It's Microsoft CoPilot! It usually gets these things right."

Users skip verification not because they trust the AI's accuracy, but because they trust Microsoft as an institution.

The implication

Institutional trust is inherited, not earned. It leads to overtrust: verification skipped when it should happen. Building AI that signals uncertainty doesn't fight institutional trust; it calibrates it.

Design response

Surface uncertainty where it exists. Users should be able to trust the parts that are reliable, and quickly identify the parts that need checking.

The design question

If users trust the brand, not the output — how do we design for appropriate skepticism without destroying the confidence that makes AI adoption possible?

Finding 02

Verification happens only when four conditions align.

Prior knowledge

Users verify when they have something to check against — they were in the meeting and remember what was said.

Speed

Verification must be fast — under 30 seconds. Anything slower gets abandoned. Navigation friction is the primary cause of verification failure.

High stakes

External sharing, client meetings, and consequential decisions prompt verification. Internal or low-stakes outputs get skimmed or skipped entirely.

Personal responsibility

Users verify when they feel personally accountable for the output being sent. When accountability is diffuse or shared, verification erodes.

Finding 03

Imposed AI creates affective resistance before the tool is ever used.

What we observed

When AI tools are introduced institutionally rather than adopted voluntarily, workers develop affective responses (disgust, guilt, ambivalence) that no interface improvement can fix.

"The guy driving the fancy BMW wants us to use AI, but he doesn't even know what for."

"You cannot turn it off right now. It's so annoying."

The implication

Affective trust explains resistance and the hard limits people draw around what AI is allowed to do. When people can't opt out, resentment builds upstream of the interface.

Expert interviews confirmed that accountability is ~90% individual today — but expected to shift as AI tools become institutionally mandated, creating an accountability gap that compounds the resentment.

the response

To correspond with Microsoft's business goal of AI adoption, I will deliver a worker accountability and protection framework around AI-generated workflow. Workers feel safer using AI when they know their rights and work are protected.

Microsoft can share this proposed framework with their customers (organizations) and investigate if these policies lead to higher trust and therefore higher organizational AI adoption.

The real design question

If adoption is mandated but trust must be earned — how do we design the first interaction so it doesn't feel like surveillance?

/ Design Direction

/ Design Direction

Two prototypes, two verification gaps

From co-design synthesis, we confirmed two focus areas — one addressing the during-meeting phase (preventing verification failure upstream) and one addressing the after-meeting phase (detecting fabrication when it occurs). Together they span both verification gaps.

HMW 1 — During meeting
Ambient Meeting Awareness Panel
How might we

Help workers align during meetings so the verification process is easier after — preventing fabrication from entering the record in the first place.

The concept

A collapsible sidebar in Teams that passively captures decisions, action items, open disagreements, and agenda progress in real time. Never interrupts. User controls visibility. Transitions into post-meeting verification view at meeting end.

Key principles

Passive over active. Edges over middle (alignment at start/end, not mid-conversation). Host-controlled escalation for the end-of-meeting co-review.

Ambient Meeting Awareness Panel concept: a non-disruptive, live-tracking AI during meetings.

End-of-meeting review flow: Aligning other meeting participants for an easier post-meeting verification.

The design challenge

How do we design a tool that captures everything without becoming a distraction itself?

HMW 2 — After meeting
Confidence Transparency
How might we

Communicate AI confidence levels to workers who need to verify outputs — so they know what needs checking without having to read everything at maximum skepticism.

The concept

Inline confidence tiering in the post-meeting summary — distinguishing direct quotes from inferred content, with source-linked timestamp chips and a "no source found" flag for fabrication signals.

Status

Ideation complete. Prototype in progress — targeting completion by end of May, 2026, in parallel with concept validation sessions for HMW 1.

[ Confidence Transparency — prototype coming soon 👀 ]

[ Confidence Transparency — prototype coming soon 👀 ]

Confidence Transparency prototype: post-meeting summary with inline confidence tiering and source-linked timestamps.

The design challenge

How do we signal uncertainty without making everything feel uncertain? If AI keeps signals "low confidence" too much, users will assume the tool is not capable, which would hinder adoption.

/ What I'm Learning

/ What I'm Learning

Some things I've learned along this project

Research depth changes what you design

The difference between designing for a brief and designing from evidence is visible in every decision: why the panel is passive, why it doesn't alert, why the host triggers the review.

Designing for AI requires designing for trust, not just usability

Standard usability heuristics — efficiency, learnability, error recovery — don't fully account for the trust dynamics in AI systems. A feature can be perfectly usable and still erode appropriate reliance. We're designing for calibration, not just interaction.

Co-design is a great design method, in addition to being a research method

The co-design workshop generated 20+ ideas in 2 hours that we wouldn't have reached through desk research alone. More importantly, the workshop revealed which problems participants cared about — the before/during/after expansion came directly from what participants wanted to design for.

Speculative work requires more rigor

Because we're designing for a future AI capability (confidence tiering in Copilot), every design decision needs stronger justification than product work against existing systems. We can't point to a shipped feature — we have to point to research.

/ thank you for stopping by!

Be in touch! I promise I won't bite!

© 2026 by yours truly

/ thank you for stopping by!

Be in touch! I promise I won't bite!

© 2026 by yours truly

/ thank you for stopping by!

Be in touch! I promise I won't bite!

© 2026 by yours truly