Englander Institute for Precision Medicine

AI-Powered Laryngoscopy: Exploring the Future With Google Gemini.

TitleAI-Powered Laryngoscopy: Exploring the Future With Google Gemini.
Publication TypeJournal Article
Year of Publication2025
AuthorsSetzen SA, Andreadis K, Elemento O, Rameau A
JournalLaryngoscope
Date Published2025 Feb 20
ISSN1531-4995
Abstract

Foundation models (FMs) are general-purpose artificial intelligence (AI) neural networks trained on massive datasets, including code, text, audio, images, and video, to handle myriad tasks from generating texts to analyzing images or composing music. We evaluated Google Gemini 1.5 Pro, currently the largest token context window multimodal FM and best-performing commercial model for video analysis, for interpreting laryngoscopy frames and videos from Google Images and YouTube. Gemini recognized the procedure as laryngoscopy in 87/88 frames (98.9%) and in 15/15 video-laryngoscopies (100%), accurately diagnosed a pathology in 55/88 frames (62.5%) and 3/15 videos (20.0%), identified lesion sides in 58/88 frames (65.9%) and 6/15 videos (40%) and narrated two operative video-laryngoscopies without fine-tuning. Findings suggest that Gemini 1.5 Pro shows significant potential for analyzing laryngoscopy, demonstrating the potential for FMs as clinical decision support tools in complex expert tasks in otolaryngology. LEVEL OF EVIDENCE: 3.

DOI10.1002/lary.32089
Alternate JournalLaryngoscope
PubMed ID39976345
Grant ListK76 AG079040 / AG / NIA NIH HHS / United States
OT2 OD032720 / CD / ODCDC CDC HHS / United States

Weill Cornell Medicine Englander Institute for Precision Medicine 413 E 69th Street
Belfer Research Building
New York, NY 10021