AI-Powered Laryngoscopy: Exploring the Future With Google Gemini. | Englander Institute for Precision Medicine

Title	AI-Powered Laryngoscopy: Exploring the Future With Google Gemini.
Publication Type	Journal Article
Year of Publication	2025
Authors	Setzen SA, Andreadis K, Elemento O, Rameau A
Journal	Laryngoscope
Date Published	2025 Feb 20
ISSN	1531-4995
Abstract	Foundation models (FMs) are general-purpose artificial intelligence (AI) neural networks trained on massive datasets, including code, text, audio, images, and video, to handle myriad tasks from generating texts to analyzing images or composing music. We evaluated Google Gemini 1.5 Pro, currently the largest token context window multimodal FM and best-performing commercial model for video analysis, for interpreting laryngoscopy frames and videos from Google Images and YouTube. Gemini recognized the procedure as laryngoscopy in 87/88 frames (98.9%) and in 15/15 video-laryngoscopies (100%), accurately diagnosed a pathology in 55/88 frames (62.5%) and 3/15 videos (20.0%), identified lesion sides in 58/88 frames (65.9%) and 6/15 videos (40%) and narrated two operative video-laryngoscopies without fine-tuning. Findings suggest that Gemini 1.5 Pro shows significant potential for analyzing laryngoscopy, demonstrating the potential for FMs as clinical decision support tools in complex expert tasks in otolaryngology. LEVEL OF EVIDENCE: 3.
DOI	10.1002/lary.32089
Alternate Journal	Laryngoscope
PubMed ID	39976345
Grant List	K76 AG079040 / AG / NIA NIH HHS / United States OT2 OD032720 / CD / ODCDC CDC HHS / United States