Ryan Louie
rylouie@cs.stanford.edu
Computer Science Department
Stanford University
I am a postdoc in Stanford's Computer Science department, affiliated with the Stanford NLP and HCI groups and advised by Diyi Yang and Emma Brunskill. I received my PhD from Northwestern's Technology and Social Behavior program, where I worked closely with Prof. Haoqi Zhang and Darren Gergle on Human-AI Interaction and Social Computing research topics. Prior to that, I received my B.S. in Robotics from Olin College of Engineering.
My research advances approaches for collaborating with, adapting, and evaluating AI systems for upskilling in societally-important domains, such as mental health care and creativity. First, I develop human-AI collaboration interfaces that help people express their personal intentions and knowledge, and iteratively shape generative AI and language model behavior through intelligible forms of feedback. Second, I create domain-adaptation techniques that learn efficiently from limited supervision while ensuring outputs remain faithful to quality standards. Third, I advance evaluation in human-centered AI by creating frameworks and conducting experiments to assess how AI systems foster human outcomes. By solving these challenges together, my work enables a rigorous science of AI upskilling: understanding how factors in the human, the AI model, and the interface between them impact human-centered outcomes.
I am on the job market in Fall 2025! Please contact me about any full-time opportunities in the academic and industry market in the areas of Human-Centered AI and LLMs for Mental Health.
Research Areas
Human-AI Collaboration Interfaces: I develop human-AI collaboration tools so AI can upskill in personally-meaningful and specialized domains. These domains require that AI have a sensitivity to people's personal ideas or domain-knowledge. To empower a novice composer to make a song feel "sadder" or a clinical expert to create a simulated patient that displays "stubborn behaviors", we need to address problems interfacing between human intentions and AI model behavior. This research develops interactive tools for users to communicate feedback in ways that an AI system can adjust its behaviors, which fosters human agency when collaborating with AI. My initial work on Steering Interfaces for Novice Co-Creation decomposed generation into manageable subtasks and guided outputs along semantic dimensions, significantly increasing users' trust, control, and sense of authorship. Building on this, I developed Roleplay-doh, a human-AI collaboration pipeline that elicits expert knowledge as natural language principles, enabling senior mental health counselors to co-design customized AI patients as simulated practice partners for novice counselors.
.
EMNLP 2024 Main.
Domain-Adaptation for NLP when Expert Supervision is Scarce
While collaboration interfaces enable experts to steer pre-trained models' behavior, when the task itself requires learning from expert judgment—such as generating feedback that mirrors expert supervision—interfaces alone are insufficient. The model must internalize expert reasoning patterns. Standard fine-tuning approaches assume access to large labeled datasets, but in specialized domains, expert annotation is expensive and expert reasoning is nuanced. The technical challenge is developing adaptation techniques that extract maximum learning signal from limited expert supervision while ensuring outputs remain faithful to expert standards. My work on Faithful Feedback Generation co-designed the feedback task with senior psychotherapists and invented a domain-grounded fine-tuning technique that bootstraps from expert annotations to self-improve a supervised fine-tuned language model, significantly decreasing the frequency of low-quality feedback generation.
.
ACL 2024 Main.
Evaluation of Human-centered NLP and AI Systems
Developing effective interfaces and adapting models to specialized domains requires knowing how system design decisions causally impact human outcomes. In NLP and AI research, intrinsic evaluations of model outputs via automated or annotator judgments usually suffice. However, human-centered NLP and AI research needs extrinsic evaluations—about how the system helps human users achieve their goals—via methods developed in HCI and experimental social sciences. To address the challenge of evaluating diverse human-AI systems in the LLM era, I developed SPHERE, an evaluation card that scaffolds five key dimensions to guide comprehensive evaluation design and documentation. I also developed the Expressive Communication framework to jointly evaluate how ML model and interface design choices impact creative expression outcomes in AI-assisted music composition. These contributions to AI evaluation inform my work on training counselors via LLMs, including a randomized controlled study with 90+ novice counselors that found practice with LLM-simulated patients combined with structured feedback produced meaningful gains in counseling skills, while practice alone degraded empathy.
.
Under review at CHI 2026.
.
IUI 2022.