Episode Summary
In the first episode of Bytesight, Dr. Uwe Schalles, former diagnostics and data integration expert at Roche and Ventana—joins the conversation to explore the foundational role of data curation in the development of AI for digital pathology. Drawing from decades of experience in the diagnostics industry, he discusses the challenges and realities of sourcing high-quality data and bridging infrastructure gaps in the transition to AI-powered pathology.
The episode moves beyond surface-level hype and sheds light on the behind-the-scenes complexities of building usable datasets, highlighting issues that many AI developers overlook.
Key Insights
-
Curation Over Collection: Building AI-ready datasets is not just about volume. Dr. Schalles explains the labor-intensive process of curating, harmonizing, and structuring data to meet algorithm development needs—something he experienced firsthand while working on AI models trained on H&E-stained slides.
-
Infrastructure Disparity: Unlike radiology, which benefits from standard digital capture systems, digital pathology still relies heavily on manual workflows and uneven scanner adoption. This makes acquiring consistent, large-scale digital slide data a technical and logistical challenge.
-
Diverse Data Sources: Dr. Schalles outlines how data must be sourced from academic hospitals, CROs, internal clinical trials, and even data brokers. Each source comes with different legal, ethical, and quality considerations, further complicating integration.
-
Collaborative Pilots Matter: He highlights the importance of pilot projects and early collaborations—including with PAICON to prototype new ways of accessing and sharing pathology data, ultimately enabling scalable AI development.
Why It Matters
This episode sets the tone for the Bytesight series by unpacking the real, technical prerequisites of applying AI in healthcare. Dr. Schalles’ insights offer a practical roadmap for startups, developers, and clinical institutions navigating the bottlenecks between data ambition and data reality.