Human-centered design (HCD) is an approach that centers the needs and experiences of users in all steps of the design process. There are four guiding principles within HCD (Interaction Design Foundation, 2025ReferenceInteraction Design Foundation. (2025). What Is Human-Centered Design (HCD)? https://www.interaction-design.org/literature/topics/human-centered-design):

  1. Centering real people, their needs, and experiences in the design process

  2. Solving the core problems

  3. Thinking of everything as existing in an interconnected system

  4. Engaging in iterative prototyping to find solutions

When applying these principles of HCD to LLMs, it reframes the development away from solely optimizing technical performance toward understanding how these systems impact and fit into people’s real lives. Rather than treating metrics such as accuracy or perplexity as the ultimate goal, HCD foregrounds the needs, values, and contexts of the humans who develop, use, and are affected by these models. Thus, to start, we must first understand who the real people affected by LLMs are and what are the problems they face. In our consideration, we account for people involved across the entire lifecycle of LLM development, starting from the data used to train these models or the objectives and principles that model developers prioritize to the way that end users interact with models. In this subsection, we provide an overview of important stakeholder groups and their corresponding roles in the development of HCLLMs: data workers, model developers, end users, and indirect stakeholders.

Humans as Data Workers.

Data workers are group of stakeholders that are essential to the HCLLM ecosystem but also often overlooked. Data workers form the often invisible human labor that undergirds the datasets and methods that make LLM development possible. Within this category of data workers, we also have classes of individuals. For example, there are those who directly label datasets or rank model outputs as data annotators. Their judgments instantiate the “human preferences” that guide alignment, embedding positionality into what the model learns to value (Kirk et al., 2024ReferenceKirk, H. R., Whitefield, A., Röttger, P., Bean, A. M., Margatina, K., Mosquera, R., Ciro, J. M., Bartolo, M., Williams, A., He, H., Vidgen, B., & Hale, S. A. (2024). The PRISM Alignment Dataset: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models. The Thirty-Eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track. https://openreview.net/forum?id=DFr5hteojx; Ouyang et al., 2022ReferenceOuyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P. F., Leike, J., & Lowe, R. (2022). Training language models to follow instructions with human feedback. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, & A. Oh (Eds.), Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022. http://papers.nips.cc/paper%5C_files/paper/2022/hash/b1efde53be364a73914f58805a001731-Abstract-Conference.html). While annotators are the ones directly labeling the data, selecting which values or ideologies are prioritized often comes from other parties who hold power or authority over data workers (e.g., project manager, client requesting annotated data) (Miceli et al., 2020ReferenceMiceli, M., Schuessler, M., & Yang, T. (2020). Between subjectivity and imposition: Power dynamics in data annotation for computer vision. Proceedings of the ACM on Human-Computer Interaction, 4(CSCW2), 1–25.; Wang et al., 2022ReferenceWang, D., Prabhat, S., & Sambasivan, N. (2022). Whose AI Dream? In search of the aspiration in data annotation. Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, 1–16.). In addition to data annotators, there are also safety and moderation workers, who engage in various methods to expose and rectify potential harms in the model before deployment. Similar to content moderation in other domains, such as social media, this work often entails sustained exposure to toxic or disturbing material, raising serious questions about the ethics and mental health implications of this invisible labor (Pendse et al., 2025ReferencePendse, S. R., Gergle, D., Kornfield, R., Meyerhoff, J., Mohr, D., Suh, J., Wescott, A., Williams, C., & Schleider, J. (2025). When Testing AI Tests Us: Safeguarding Mental Health on the Digital Frontlines. Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency, 1793–1804.). Finally, there are data subjects, or the individuals whose internet data forms the bulk of the web corpora used to pre-train LLMs. Importantly, many of those who are data subjects are doing so unwittingly as their data is included without consent or even awareness (Birhane & Prabhu, 2021ReferenceBirhane, A., & Prabhu, V. U. (2021). Large image datasets: A pyrrhic win for computer vision? 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), 1536–1546.; Paullada et al., 2021ReferencePaullada, A., Raji, I. D., Bender, E. M., Denton, E., & Hanna, A. (2021). Data and its (dis) contents: A survey of dataset development and use in machine learning research. Patterns, 2(11).). Together, these data workers form the backbone of HCLLM development, yet their contributions are systematically undervalued and underprotected.

Humans as Model Developers.

Next, model developers of HCLLMs are the ones most immediately tasked with shaping the model architecture, training pipeline, and deliverable behaviors of HCLLMs. They make crucial decisions about what data to collect, how to preprocess it, and which training objectives and evaluation protocols to employ. Prior works have shown how the values of model developers can be implicitly imbued in resulting technical artifacts, such as in how quality filters are defined for data (Gururangan et al., 2022ReferenceGururangan, S., Card, D., Dreier, S., Gade, E., Wang, L., Wang, Z., Zettlemoyer, L., & Smith, N. A. (2022). Whose Language Counts as High Quality? Measuring Language Ideologies in Text Data Selection. In Y. Goldberg, Z. Kozareva, & Y. Zhang (Eds.), Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (pp. 2562–2580). Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.emnlp-main.165) and what research problems are prioritized (Birhane et al., 2022ReferenceBirhane, A., Kalluri, P., Card, D., Agnew, W., Dotan, R., & Bao, M. (2022). The values encoded in machine learning research. Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, 173–184.). In designing HCLLMs, model developers must harmonize the trade-offs between performance, fairness, inclusivity, and safety. For example, they must ensure the fairness, privacy, and safety of the training data of LLMs (Data for HCLLMs), design comprehensive evaluation frameworks to capture real-world usage scenarios (Evaluation), and devise governance mechanisms and transparency guidelines that help ensure responsible deployment (Responsible Human-Centered LLMs).

Humans as End Users.

End users—the individuals and groups who directly interact with LLM systems—represent one of the largest and most diverse stakeholder groups. The general-purpose nature of LLMs means that end users span a vast range: students seeking homework help; professionals drafting reports; people with mental health concerns looking for support and companionship; creative writers brainstorming ideas; non-native speakers translating text; and countless others (Chatterji et al., 2025ReferenceChatterji, A., Cunningham, T., Deming, D. J., Hitzig, Z., Ong, C., Shan, C. Y., & Wadman, K. (2025). How people use chatgpt [Techreport]. National Bureau of Economic Research.; Handa, Bent, et al., 2025; Handa, Tamkin, et al., 2025)ReferenceHanda, K., Bent, D., Tamkin, A., McCain, M., Durmus, E., Stern, M., Schiraldi, M., Huang, S., Ritchie, S., Syverud, S., Jagadish, K., Vo, M., Bell, M., & Ganguli, D. (2025, April 8). Anthropic Education Report: How University Students Use Claude. https://www.anthropic.com/news/anthropic-education-report-how-university-students-use-claudeReferenceHanda, K., Tamkin, A., McCain, M., Huang, S., Durmus, E., Heck, S., Mueller, J., Hong, J., Ritchie, S., Belonax, T., & others. (2025). Which economic tasks are performed with ai? evidence from millions of claude conversations. arXiv Preprint arXiv:2503.04761.. Each user brings their own distinct goals, expertise levels, cultural backgrounds, and needs when interfacing with models. Thus, as LLMs shift from nascent technologies into being increasingly ingrained in users’ lives, understanding their real-world impact becomes critical. Despite the outsized impact that LLMs may have on their lives, most end users have relatively little control or influence in the design and creation of these technologies. There are methods that allow end-users to exert some degree of agency over models, such as through fine-tuning, which offers a more heavyweight intervention (Tan et al., 2024ReferenceTan, Z., Zeng, Q., Tian, Y., Liu, Z., Yin, B., & Jiang, M. (2024). Democratizing Large Language Models via Personalized Parameter-Efficient Fine-tuning. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 6476–6491.; L. Wang et al., 2025ReferenceWang, L., Yurechko, K., Dani, P., Chen, Q. Z., & Zhang, A. X. (2025). End User Authoring of Personalized Content Classifiers: Comparing Example Labeling, Rule Writing, and LLM Prompting. Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, 1–21.). Research prototypes and product features also offer ways to tailor user interactions with LLMs through creating personalized memory stores or drawing on community knowledge banks (OpenAI, 2024ReferenceOpenAI. (2024). Memory and new controls for ChatGPT. https://openai.com/index/memory-and-new-controls-for-chatgpt/; Ryan et al., 2025ReferenceRyan, M. J., Shaikh, O., Bhagirath, A., Frees, D., Held, W. B., & Yang, D. (2025). Synthesizeme! inducing persona-guided prompts for personalized reward models in llms. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 8045–8078.; Zhao et al., 2025ReferenceZhao, D., Yang, D., & Bernstein, M. S. (2025). Knoll: Creating a knowledge ecosystem for large language models. Proceedings of the 38th Annual ACM Symposium on User Interface Software and Technology, 1–23.; Zhong et al., 2024ReferenceZhong, W., Guo, L., Gao, Q., Ye, H., & Wang, Y. (2024). Memorybank: Enhancing large language models with long-term memory. Proceedings of the AAAI Conference on Artificial Intelligence, 38(17), 19724–19731.). While these personalization mechanisms offer some user agency, they also raise fundamental questions about the broader relationship between end users and LLMs.

Key questions related to end users include: How do these systems affect productivity, creativity, learning, and decision-making skills? What are the risks of over-reliance, deskilling, or perpetuating existing biases? And how can we design HCLLMs that empower users rather than constrain or harm them? Answering these questions requires moving beyond system capabilities—bridging existing methods in NLP with those from HCI—to examine how LLMs may reshape human capacities, agency, and well-being in practice.

Humans as Indirect Stakeholders

Finally, even people who are not direct end users of LLMs, or indirect stakeholders can still be meaningfully impacted by them (Friedman, 1996ReferenceFriedman, B. (1996). Value-sensitive design. Interactions, 3(6), 16–23.). If we consider an example of an LLM used to assist physicians with taking clinical notes, patients are also affected by this system even if they are not directly interfacing with it (Haberle et al., 2024ReferenceHaberle, T., Cleveland, C., Snow, G. L., Barber, C., Stookey, N., Thornock, C., Younger, L., Mullahkhel, B., & Ize-Ludlow, D. (2024). The impact of nuance DAX ambient listening AI documentation: a cohort study. Journal of the American Medical Informatics Association, 31(4), 975–979.; Korom et al., 2025ReferenceKorom, R., Kiptinness, S., Adan, N., Said, K., Ithuli, C., Rotich, O., Kimani, B., King’ori, I., Kamau, S., Atemba, E., & others. (2025). AI-based clinical decision support for primary care: A real-world study. arXiv Preprint arXiv:2507.16947.). Beyond patients, we can also think of many other potential indirect stakeholders in this example, such as families or caregivers, insurance providers, hospital administrators, and so on. Enumerating all indirect stakeholders can seem like an intractable problem; in part, this underscores that human-centered LLMs are not just technical artifacts; they are sociotechnical systems with many externalities to consider. There is no prescriptive formula for deciding which indirect stakeholders to prioritize; however, thinking about factors, such as who may not be well-represented in making design decisions, who is most likely to be harmed by such systems (both immediately and over longer time horizons), or conversely who may stand to benefit in ways that are not explicitly intended, can inform designers. More broadly, rather than treating indirect stakeholders as an overwhelming checklist, HCLLM designers should use this complexity as motivation to identify the most consequential stakeholders early and involve them throughout the design process, rather than only including them as an afterthought (Participatory Approaches).

Birhane, A., Kalluri, P., Card, D., Agnew, W., Dotan, R., & Bao, M. (2022). The values encoded in machine learning research. Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, 173–184.
Birhane, A., & Prabhu, V. U. (2021). Large image datasets: A pyrrhic win for computer vision? 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), 1536–1546.
Chatterji, A., Cunningham, T., Deming, D. J., Hitzig, Z., Ong, C., Shan, C. Y., & Wadman, K. (2025). How people use chatgpt [Techreport]. National Bureau of Economic Research.
Friedman, B. (1996). Value-sensitive design. Interactions, 3(6), 16–23.
Gururangan, S., Card, D., Dreier, S., Gade, E., Wang, L., Wang, Z., Zettlemoyer, L., & Smith, N. A. (2022). Whose Language Counts as High Quality? Measuring Language Ideologies in Text Data Selection. In Y. Goldberg, Z. Kozareva, & Y. Zhang (Eds.), Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (pp. 2562–2580). Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.emnlp-main.165
Haberle, T., Cleveland, C., Snow, G. L., Barber, C., Stookey, N., Thornock, C., Younger, L., Mullahkhel, B., & Ize-Ludlow, D. (2024). The impact of nuance DAX ambient listening AI documentation: a cohort study. Journal of the American Medical Informatics Association, 31(4), 975–979.
Handa, K., Bent, D., Tamkin, A., McCain, M., Durmus, E., Stern, M., Schiraldi, M., Huang, S., Ritchie, S., Syverud, S., Jagadish, K., Vo, M., Bell, M., & Ganguli, D. (2025, April 8). Anthropic Education Report: How University Students Use Claude. https://www.anthropic.com/news/anthropic-education-report-how-university-students-use-claude
Handa, K., Tamkin, A., McCain, M., Huang, S., Durmus, E., Heck, S., Mueller, J., Hong, J., Ritchie, S., Belonax, T., & others. (2025). Which economic tasks are performed with ai? evidence from millions of claude conversations. arXiv Preprint arXiv:2503.04761.
Interaction Design Foundation. (2025). What Is Human-Centered Design (HCD)? https://www.interaction-design.org/literature/topics/human-centered-design
Kirk, H. R., Whitefield, A., Röttger, P., Bean, A. M., Margatina, K., Mosquera, R., Ciro, J. M., Bartolo, M., Williams, A., He, H., Vidgen, B., & Hale, S. A. (2024). The PRISM Alignment Dataset: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models. The Thirty-Eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track. https://openreview.net/forum?id=DFr5hteojx
Korom, R., Kiptinness, S., Adan, N., Said, K., Ithuli, C., Rotich, O., Kimani, B., King’ori, I., Kamau, S., Atemba, E., & others. (2025). AI-based clinical decision support for primary care: A real-world study. arXiv Preprint arXiv:2507.16947.
Miceli, M., Schuessler, M., & Yang, T. (2020). Between subjectivity and imposition: Power dynamics in data annotation for computer vision. Proceedings of the ACM on Human-Computer Interaction, 4(CSCW2), 1–25.
OpenAI. (2024). Memory and new controls for ChatGPT. https://openai.com/index/memory-and-new-controls-for-chatgpt/
Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P. F., Leike, J., & Lowe, R. (2022). Training language models to follow instructions with human feedback. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, & A. Oh (Eds.), Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022. http://papers.nips.cc/paper%5C_files/paper/2022/hash/b1efde53be364a73914f58805a001731-Abstract-Conference.html
Paullada, A., Raji, I. D., Bender, E. M., Denton, E., & Hanna, A. (2021). Data and its (dis) contents: A survey of dataset development and use in machine learning research. Patterns, 2(11).
Pendse, S. R., Gergle, D., Kornfield, R., Meyerhoff, J., Mohr, D., Suh, J., Wescott, A., Williams, C., & Schleider, J. (2025). When Testing AI Tests Us: Safeguarding Mental Health on the Digital Frontlines. Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency, 1793–1804.
Ryan, M. J., Shaikh, O., Bhagirath, A., Frees, D., Held, W. B., & Yang, D. (2025). Synthesizeme! inducing persona-guided prompts for personalized reward models in llms. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 8045–8078.
Tan, Z., Zeng, Q., Tian, Y., Liu, Z., Yin, B., & Jiang, M. (2024). Democratizing Large Language Models via Personalized Parameter-Efficient Fine-tuning. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 6476–6491.
Wang, D., Prabhat, S., & Sambasivan, N. (2022). Whose AI Dream? In search of the aspiration in data annotation. Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, 1–16.
Wang, L., Yurechko, K., Dani, P., Chen, Q. Z., & Zhang, A. X. (2025). End User Authoring of Personalized Content Classifiers: Comparing Example Labeling, Rule Writing, and LLM Prompting. Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, 1–21.
Zhao, D., Yang, D., & Bernstein, M. S. (2025). Knoll: Creating a knowledge ecosystem for large language models. Proceedings of the 38th Annual ACM Symposium on User Interface Software and Technology, 1–23.
Zhong, W., Guo, L., Gao, Q., Ye, H., & Wang, Y. (2024). Memorybank: Enhancing large language models with long-term memory. Proceedings of the AAAI Conference on Artificial Intelligence, 38(17), 19724–19731.