Introduction

Figure. This survey has three core sections, focused on (1) defining, (2) developing, and (3) deploying human-centered LLMs (HCLLMs). In the first section, we conceptualize human-centeredness through HCI (HCI for HCLLMs) (who, what, and how). In the second, we illustrate how these principles appear across the LLM development pipeline (Data for HCLLMs, NLP for HCLLMs, Evaluation), and finally discuss considerations for deploying HCLLMs in a responsible manner (Responsible Human-Centered LLMs). Finally, we synthesize takeaways from the three core sections into a case study on the deployment of HCLLMs for the future of work (Case Study: HCLLMs and the Future of Work).

Large Language Models (LLMs) have transitioned from research artifacts to production infrastructure. They now power developer tools, enterprise copilots, search and recommendation systems, content moderation pipelines, and domain-specific assistants across healthcare (Thirunavukarasu et al., 2023), finance (Nie et al., 2024; Xie et al., 2024), education (Adiguzel et al., 2023; Gan et al., 2023; Wang et al., 2024), science (Si et al., 2024; Zhang et al., 2024) and law (Guha et al., 2023; Katz et al., 2024; Li et al., 2025). As LLMs are integrated into individual and collective processes, they can no longer be understood as isolated tools bounded by static performance metrics or leaderboard positions. LLMs are sociotechnical systems with global influence, and should be developed and evaluated in starkly more human terms. Are these models helpful, steerable, and safe under adversarial pressure, aligned across global markets, robust to distribution shift, and adaptable to evolving user goals and expectations? Do models comply with data governance regimes, privacy regulations, and ethical concerns around intellectual property? How can we build models that not only avoid harm but also actively contribute to human flourishing? Can LLMs do more than just passively assist humans; can they also actively collaborate with us as equal partners?

This survey advances the framework of Human-Centered Large Language Modeling (HCLLMs) as a unifying lens for understanding and answering these questions. Rather than treating human-centered objectives as simple patches or alignment problems downstream of capability scaling, we argue that human-centered methods must be embedded across the entire LLM development pipeline, from data sourcing and filtering, to post-training and alignment, evaluation, deployment, and long-term maintenance (see the figure).

Importantly, we will demonstrate how human-centered objectives tend to resist universal solutions. The optimal path will depend both on who you ask and how you operationalize concepts like harm and benefit. Broad themes like transparency, privacy, safety, and justice frequently emerge (Jobin et al., 2019), but there will be significant variation in perspective on how these ideals should be implemented (Awad et al., 2018; Jobin et al., 2019). Governments and non-profit organizations may codify the most dominant perspectives into laws and policies (Jobin et al., 2019), but high-level guidelines may fail to account for the nuances of real-world use (Hagendorff, 2020), and lag behind the rapid evolution of language models themselves (Auernhammer, 2020). In the face of these challenges, stakeholders often remain passive, which only endorses the status-quo (Birhane et al., 2022; Crawford, 2021; Kalluri, 2020).

This survey paper elaborates on and endorses the alternative, Human-Centered Design (Capel & Brereton, 2023) (HCD), where users and other stakeholders are centrally involved with ideating, building, evaluating, and deploying Large Language Models (Shneiderman, 2020, 2022). Their centrality at every stage of the design process is what distinguishes HCD from other instantiations of human factors design (Xu, 2019) that account for general user needs (e.g., transparency) in only a small slice of the design or deployment process (Auernhammer, 2020). LLM development rarely centers humans, but there is a growing body of research from Natural Language Processing (NLP) and Human-Computer Interaction (HCI) that points towards these ideals. We will cover this foundation for HCLLMs in an in-depth survey of relevant human factors approaches in HCI (HCI for HCLLMs) and NLP (NLP for HCLLMs), including more details on the Data Pipeline (Data for HCLLMs), and the Evaluation (Evaluation) of LLMs. With this foundation established, we will return to the principle axes of Responsible and Ethical Deployment (Responsible Human-Centered LLMs), like transparency, privacy, and safety. Synthesizing our discussion across these prior chapters, we will conclude with a concrete Case Study (Case Study: HCLLMs and the Future of Work) on the considerations of HCLLMs for the future of work.

Adiguzel, T., Kaya, M. H., & Cansu, F. K. (2023). Revolutionizing education with AI: Exploring the transformative potential of ChatGPT. Contemporary Educational Technology, 15(3).

Auernhammer, J. (2020). Human-centered AI: The role of Human-centered Design Research in the development of AI.

Awad, E., Dsouza, S., Kim, R., Schulz, J., Henrich, J., Shariff, A., Bonnefon, J.-F., & Rahwan, I. (2018). The moral machine experiment. Nature, 563(7729), 59–64.

Birhane, A., Kalluri, P., Card, D., Agnew, W., Dotan, R., & Bao, M. (2022). The values encoded in machine learning research. Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, 173–184.

Capel, T., & Brereton, M. (2023). What is Human-Centered about Human-Centered AI? A Map of the Research Landscape. In A. Schmidt, K. Väänänen, T. Goyal, P. O. Kristensson, A. Peters, S. Mueller, J. R. Williamson, & M. L. Wilson (Eds.), Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI 2023, Hamburg, Germany, April 23-28, 2023 (p. 359:1-359:23). ACM. https://doi.org/10.1145/3544548.3580959

Crawford, K. (2021). The atlas of AI: Power, politics, and the planetary costs of artificial intelligence. Yale University Press.

Gan, W., Qi, Z., Wu, J., & Lin, J. C.-W. (2023). Large language models in education: Vision and opportunities. 2023 IEEE International Conference on Big Data (BigData), 4776–4785.

Guha, N., Nyarko, J., Ho, D. E., Ré, C., Chilton, A., K, A., Chohlas-Wood, A., Peters, A., Waldon, B., Rockmore, D. N., Zambrano, D., Talisman, D., Hoque, E., Surani, F., Fagan, F., Sarfaty, G., Dickinson, G. M., Porat, H., Hegland, J., … Li, Z. (2023). LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, & S. Levine (Eds.), Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023. http://papers.nips.cc/paper%5C_files/paper/2023/hash/89e44582fd28ddfea1ea4dcb0ebbf4b0-Abstract-Datasets%5C_and%5C_Benchmarks.html

Hagendorff, T. (2020). The ethics of AI ethics: An evaluation of guidelines. Minds and Machines, 30(1), 99–120.

Jobin, A., Ienca, M., & Vayena, E. (2019). The global landscape of AI ethics guidelines. Nature Machine Intelligence, 1(9), 389–399.

Kalluri, P. (2020). Don’t ask if artificial intelligence is good or fair, ask how it shifts power. Nature, 583(7815), 169–169.

Katz, D. M., Bommarito, M. J., Gao, S., & Arredondo, P. (2024). Gpt-4 passes the bar exam. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 382(2270).

Li, H., Chen, J., Yang, J., Ai, Q., Jia, W., Liu, Y., Lin, K., Wu, Y., Yuan, G., Hu, Y., & others. (2025). Legalagentbench: Evaluating llm agents in legal domain. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2322–2344.

Nie, Y., Kong, Y., Dong, X., Mulvey, J. M., Poor, H. V., Wen, Q., & Zohren, S. (2024). A survey of large language models for financial applications: Progress, prospects and challenges. arXiv Preprint arXiv:2406.11903.

Shneiderman, B. (2020). Human-Centered Artificial Intelligence: Three Fresh Ideas. AIS Transactions on Human-Computer Interaction, 12(3), 109–124. https://doi.org/10.17705/1thci.00131

Shneiderman, B. (2022). Human-Centered AI. Oxford University Press.

Si, C., Yang, D., & Hashimoto, T. (2024). Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers. https://arxiv.org/abs/2409.04109

Thirunavukarasu, A. J., Ting, D. S. J., Elangovan, K., Gutierrez, L., Tan, T. F., & Ting, D. S. W. (2023). Large language models in medicine. Nature Medicine, 29(8), 1930–1940.

Wang, S., Xu, T., Li, H., Zhang, C., Liang, J., Tang, J., Yu, P. S., & Wen, Q. (2024). Large Language Models for Education: A Survey and Outlook. https://arxiv.org/abs/2403.18105

Xie, Q., Han, W., Chen, Z., Xiang, R., Zhang, X., He, Y., Xiao, M., Li, D., Dai, Y., Feng, D., & others. (2024). Finben: A holistic financial benchmark for large language models. Advances in Neural Information Processing Systems, 37, 95716–95743.

Xu, W. (2019). Toward human-centered AI: a perspective from human-computer interaction. Interactions, 26(4), 42–46.

Zhang, Y., Chen, X., Jin, B., Wang, S., Ji, S., Wang, W., & Han, J. (2024). A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 8783–8817.

Introduction

Graph View

Backlinks

Literature NotesLiterature NotesLiterature NotesLiterature NotesLiterature NotesLiterature Notes

Graph View

Backlinks