Expanding the How: Methods from HCI for HCLLMs

Finally, in addition to providing new perspectives on how we ought to design HCLLMs, HCI also introduces a set of methods that researchers and practitioners can employ in pursuit of these goals. Given HCI’s interdisciplinary roots, there are many “ways of knowing” or methodological orientations that researchers employ within the field. In this section, we will discuss the applicability of three classes of research methods—experimental methods, participatory research, and grounded theory—that are particularly applicable to HCLLMs and are less utilized within NLP. For a comprehensive overview of other methodological practices in HCI, we refer readers to Olson & Kellogg (2014).

Experimental Research

An important step in designing and evaluating HCLLMs requires measuring their impact on human outcomes, rather than only characterizing behavior. While traditional methods in NLP, such as benchmarking, provide a fast and scalable way to evaluate models, these numbers are often divorced from the context that LLMs are applied and do not capture their impact in practice (McIntosh et al., 2024; Raji et al., 2021). Unlike observational studies, which are useful in informing us whether variables are related, experimental research helps reveal many types of relationships (e.g., association, causality) between variables of interest (Gergle & Tan, 2014). Experiments can vary along the spectrum of research control, ranging from laboratory experiments to field experiments that are conducted in real-world settings but often have many factors that researchers cannot control for (Oulasvirta, 2008).

For HCLLMs, experimental methods enable researchers to move beyond measuring what models can do to understanding what they actually do and why. First, experimental methods also allow researchers to better understand mechanisms that explain observed patterns of behavior. For example, to better understand why users become dependent or overreliant on LLMs, recent work (Cheng et al., 2025) focused on the construct of social sycophancy, finding through a series of laboratory experiments, that users are more likely to trust and also use sycophantic models. Isolating specific mechanisms can then inform design interventions that are first validated intrinsically (i.e., benchmarks) and then extrinsically with additional experiments. Second, experimental methods can quantify the utility of LLMs when deployed in practice, allowing researchers to evaluate outcomes rather than capability. A number of field experiments have started, looking at LLMs’ impacts across domains including education (Cuna & Shen, 2025; Wang et al., 2025), software engineering (Becker et al., 2025), healthcare (Korom et al., 2025), and so on. For example, @becker2025measuring’s work provides evidence that challenges the assumption that LLMs increase developer productivity, demonstrating through a field experiment that using these tools increases the time it takes for developers to complete their tasks. These findings complicate our understanding of models and point to the exogenous factors (e.g., institutional endorsement, workflow integration, user onboarding) that affect outcomes. Overall, moving from what a model can perform within a clearly scoped benchmark setting to how it impacts users or society writ large—and why—is a critical contribution from HCI’s methodological toolkit that can complement traditional NLP evaluations.

Participatory Approaches

Building HCLLMs requires us to engage with the communities that are impacted by and using these technologies. A useful approach for understanding how we can work toward this engagement is to draw on participatory research methodologies. More than being a prescriptive methodology to follow, “participatory research” delineates an orientation towards involving relevant stakeholders in the knowledge creation process (Bergold & Thomas, 2012). Examples of methods that fall under this broader umbrella of “participatory research” include participatory action research (Hayes, 2011) and community-based participatory research (Unertl et al., 2016; Wallerstein et al., 2017). For a more extensive list of participatory research frameworks, we refer the reader to Vaughn & Jacquez (2020). Despite the many variants that fall under participatory research, a unifying emphasis is on fostering a democratic and inclusive process that treats stakeholders or community members as equal research collaborators rather than subjects (Bergold & Thomas, 2012; Vaughn & Jacquez, 2020).

Participatory approaches have been adopted in creating machine learning solutions, such as developing context-specific models for detecting feminicide (Suresh et al., 2022) or providing frameworks for communities to articulate algorithmic policies (Lee et al., 2019). Tseng et al. (2025) provides an example of how participatory research approaches can be applied to designing HCLLMs. Through interviews with different stakeholders in the journalism ecosystem (e.g., reporters, editors, executives), Tseng et al. (2025) articulate how LLMs must be designed to address journalists’ needs, such as using an open-sourced model that can be fine-tuned for their tasks rather than relying on commercial offerings.

While we have listed exemplars of work that have adopted participatory approaches, it is important to state that participation is neither a panacea to the many of the challenges with creating HCLLM nor is it easy to institute these methods in practice. These methods require building trust with communities, which can be a long-term process requiring significant time investment (Le Dantec & Fox, 2015). To ensure that participatory approaches are equitable in practice, it further requires centering communities, especially those who do not hold positions of power, and ensuring that research tangibly benefits these individuals rather than acting as an extractive force (Harrington et al., 2019).

Qualitative Inquiry

Qualitative methods encompass a broad range of approaches, including ethnographic studies, interviews, participatory design workshops, thematic analysis, and other interpretive techniques (Blandford et al., 2016). For HCLLMs, existing work has also sought to extract insights by analyzing users’ chat histories with models, such as common interaction patterns or high-level values reflected in model responses (Huang et al., 2025; Tamkin et al., 2024). While quantitative measures reveal how well models perform or what users do, qualitative approaches provide a deeper understanding of why and how individuals think about, interact with, and make sense of these systems. As Geertz (2008) illustrates through the notion of “thick description,” qualitative methods enable researchers to capture the meanings, contexts, and social dynamics that underlie behavior, rather than merely documenting surface-level patterns.

Qualitative methods offer several particularly valuable contributions to HCLLMs. First, they can uncover needs and harms in underrepresented populations, revealing how specific groups engage with LLMs and the nuanced benefits and harms they experience—insights that aggregate metrics often obscure (e.g., @ma2024evaluating’s interviews with LGBTQ+ individuals exploring LLMs in mental health support; @qadri2025case’s workshops with South Asian participants examining cultural misrepresentations in AI). Qualitative methods also help generate new design insights and hypotheses by surfacing unexpected use patterns, workarounds, and unmet needs that can inform future system design. Through providing this rich description about how users interact with these models, qualitative work enables contextual understanding of how LLM interactions are embedded in broader social and work practices, revealing dependencies and consequences that otherwise may be missed. For example, @tamkin2024clio’s analysis of user interactions with Claude revealed areas of unsafe behavior that their current guardrails were not designed to catch, allowing them to refine their system design through empirical insights. Finally, qualitative approaches complement and enrich quantitative findings by helping triangulate results and providing interpretive depth that explains why certain patterns emerge [@zhao2025sphere].

Becker, J., Rush, N., Barnes, E., & Rein, D. (2025). Measuring the impact of early-2025 ai on experienced open-source developer productivity. arXiv Preprint arXiv:2507.09089.

Bergold, J., & Thomas, S. (2012). Participatory research methods: A methodological approach in motion. Historical Social Research/Historische Sozialforschung, 191–222.

Blandford, A., Furniss, D., & Makri, S. (2016). Qualitative HCI research: Going behind the scenes. Morgan & Claypool Publishers.

Cheng, M., Lee, C., Khadpe, P., Yu, S., Han, D., & Jurafsky, D. (2025). Sycophantic AI Decreases Prosocial Intentions and Promotes Dependence. arXiv Preprint arXiv:2510.01395.

Cuna, M. G., & Shen, E. (2025). The Hidden Curriculum. https://mgcuna.github.io/website/JMP_latest.pdf

Geertz, C. (2008). Thick description: Toward an interpretive theory of culture. In The cultural geography reader (pp. 41–51). Routledge.

Gergle, D., & Tan, D. S. (2014). Experimental research in HCI. In Ways of Knowing in HCI (pp. 191–227). Springer.

Harrington, C., Erete, S., & Piper, A. M. (2019). Deconstructing community-based collaborative design: Towards more equitable participatory design engagements. Proceedings of the ACM on Human-Computer Interaction, 3(CSCW), 1–25.

Hayes, G. R. (2011). The relationship of action research to human-computer interaction. ACM Transactions on Computer-Human Interaction (TOCHI), 18(3), 1–20.

Huang, S., Durmus, E., McCain, M., Handa, K., Tamkin, A., Hong, J., Stern, M., Somani, A., Zhang, X., & Ganguli, D. (2025). Values in the wild: Discovering and analyzing values in real-world language model interactions. arXiv Preprint arXiv:2504.15236.

Korom, R., Kiptinness, S., Adan, N., Said, K., Ithuli, C., Rotich, O., Kimani, B., King’ori, I., Kamau, S., Atemba, E., & others. (2025). AI-based clinical decision support for primary care: A real-world study. arXiv Preprint arXiv:2507.16947.

Le Dantec, C. A., & Fox, S. (2015). Strangers at the gate: Gaining access, building rapport, and co-constructing community-based research. Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing, 1348–1358.

Lee, M. K., Kusbit, D., Kahng, A., Kim, J. T., Yuan, X., Chan, A., See, D., Noothigattu, R., Lee, S., Psomas, A., & others. (2019). WeBuildAI: Participatory framework for algorithmic governance. Proceedings of the ACM on Human-Computer Interaction, 3(CSCW), 1–35.

McIntosh, T. R., Susnjak, T., Arachchilage, N., Liu, T., Watters, P., & Halgamuge, M. N. (2024). Inadequacies of large language model benchmarks in the era of generative artificial intelligence. ArXiv Preprint, abs/2402.09880. https://arxiv.org/abs/2402.09880

Olson, J. S., & Kellogg, W. A. (2014). Ways of Knowing in HCI (Vol. 2). Springer.

Oulasvirta, A. (2008). Field experiments in HCI: promises and challenges. In Future interaction design II (pp. 87–116). Springer.

Raji, I. D., Bender, E. M., Paullada, A., Denton, E., & Hanna, A. (2021). AI and the everything in the whole wide world benchmark. ArXiv Preprint, abs/2111.15366. https://arxiv.org/abs/2111.15366

Suresh, H., Movva, R., Dogan, A. L., Bhargava, R., Cruxen, I., Cuba, A. M., Taurino, G., So, W., & D’Ignazio, C. (2022). Towards intersectional feminist and participatory ML: A case study in supporting feminicide counterdata collection. Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, 667–678.

Tamkin, A., McCain, M., Handa, K., Durmus, E., Lovitt, L., Rathi, A., Huang, S., Mountfield, A., Hong, J., Ritchie, S., & others. (2024). Clio: Privacy-preserving insights into real-world ai use. arXiv Preprint arXiv:2412.13678.

Tseng, E., Young, M., Le Quéré, M. A., Rinehart, A., & Suresh, H. (2025). “ Ownership, Not Just Happy Talk”: Co-Designing a Participatory Large Language Model for Journalism. Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency, 3119–3130.

Unertl, K. M., Schaefbauer, C. L., Campbell, T. R., Senteio, C., Siek, K. A., Bakken, S., & Veinot, T. C. (2016). Integrating community-based participatory research and informatics approaches to improve the engagement and health of underserved populations. Journal of the American Medical Informatics Association, 23(1), 60–73.

Vaughn, L. M., & Jacquez, F. (2020). Participatory research methods–choice points in the research process. Journal of Participatory Research Methods, 1(1).

Wallerstein, N., Duran, B., & others. (2017). The theoretical, historical and practice roots of CBPR. Community-Based Participatory Research for Health: Advancing Social and Health Equity, 2, 25–46.

Wang, R. E., Ribeiro, A. T., Robinson, C. D., Loeb, S., & Demszky, D. (2025). Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise. https://arxiv.org/abs/2410.03017

Expanding the How: Methods from HCI for HCLLMs

Experimental Research

Participatory Approaches

Qualitative Inquiry

Graph View

Table of Contents

Backlinks

Literature NotesLiterature NotesLiterature NotesLiterature NotesLiterature NotesLiterature Notes

Experimental Research

Participatory Approaches

Qualitative Inquiry

Graph View

Table of Contents

Backlinks