Responsibly Deploying HCLLMs in the Workforce

Perhaps one of the top-of-mind questions when it comes to the future of work are the long-term societal impacts that these technologies will have. How will employment be affected by the increasing popularity of LLMs? What skills will be in-demand and which will be less important? Deploying HCLLMs responsibly means taking into account these long-term externalities now, and mitigating against potential harms before they occur.

The Paradox of Productivity and Work Intensification.

A common perception is that integrating LLMs into workflows will inherently reduce the volume of human effort. Initial stand-alone studies looking at whether using LLMs improves how quickly workers complete tasks show positive evidence: people complete tasks more quickly with AI assistance (Cui et al., 2025; Karny et al., 2024; Peng et al., 2023; Shen & Tamkin, 2026). However, when it comes to the workforce, people are not completing singular tasks in a vacuum. They must negotiate tasks with many demands, coordinate with coworkers, navigate company politics, and so on. Thus, when we look at the impact on productivity in a more holistic setting, the impact of LLMs is less clear. Counter to the earlier studies, recent work has suggested that rather than freeing up workers, LLMs may actually amplify the intensity of labor (Harvey et al., 2025; Ranganathan & Ye, 2026). For example, a recent study conducted by the Harvard Business Review observed 200 employees at a U.S.-based technology company, finding that LLM usage promotes the phenomenon of “work slippage” in which AI leads to task expansion and increased cognitive load rather than a net reduction in hours (Ranganathan & Ye, 2026). This phenomenon is not solely confined to the technology industry where LLM usage is more prevalent (Acemoglu & Restrepo, 2020; Harvey et al., 2025; Johnson et al., 2025). For example, despite the purported benefits that LLMs have for educators, using models often lead to more teacher time devoted towards validating generated outputs beyond current time spent (Harvey et al., 2025). Thus, instead of saving time, these tools may in fact create new tasks for workers or potentially introduce friction into existing tasks.

To address these risks, we provide two approaches. First, we argue that the way “productivity” is measured in the workplace in light of advancements with LLMs must be updated. Traditionally, productivity has centered on output per unit time, capturing how efficient people work (& Company, 2025). However, now that LLMs can quickly produce outputs, efficiency-based metrics may be defunct. Instead, echoing practices in existing work (Shao et al., 2024; Wang et al., 2025), we call for broader evaluations that consider not only whether tasks are completed but also the quality of these outcomes. More broadly, updating productivity metrics to match the current state of the world provides a more precise picture of how LLMs are impacting work. Our second approach is to find methods that center workers’ voices. Survey efforts such as Shao et al. (2025) provide insight into what workers want. Complementing this approach, we argue for additional studies that can capture the “thick” descriptions of what users are experiencing, such as through interviews or ethnographic approaches (Anugraha et al., 2026; Geertz, 2008). Such methods can shed light on what is not captured through numbers alone, such as hidden forms of labor, additional cognitive load, or coordination costs. Together, these directions point toward a more holistic framework for evaluating AI in the workplace that accounts for both outcomes and worker experience.

The Risk of Cognitive Deskilling.

A second concern is the potential for deskilling, where reliance on LLMs erodes the foundational expertise of human workers. When models handle the “first draft” (or sometimes the end-to-end execution) of complex tasks, workers may lose the opportunity to engage in learning processes necessary to develop deep domain mastery. Perhaps in an isolated setting workers may be completing the task faster, but they are also less likely to learn transferable skills or knowledge that can help them in future tasks (Shen & Tamkin, 2026). It is important to note that these effects are heterogeneous across workers. How someone engages with these technologies matters: automating tasks wholesale is likely to accelerate deskilling, whereas more collaborative, iterative interactions where the worker critically evaluates, corrects, and builds on model outputs may partially preserve or even scaffold skill development (Shen & Tamkin, 2026). Another moderator is expertise level. For instance, novices may rely on these tools more heavily and more uncritically than experts, which is especially concerning given that AI assistance can impart an illusion of comprehension without providing sufficient expertise in an area (Macnamara et al., 2024; Messeri & Crockett, 2024; Shen & Tamkin, 2026).

These concerns highlight the importance of carefully designing how humans collaborate with LLMs, as discussed in Developing HCLLMs for the Future of Work. For example, the risk of deskilling can be included as a factor when considering how to delegate tasks. In this vein, rather than framing LLMs as tools that automate tasks end-to-end, systems should be designed to empower users, supporting human judgment, reflection, and skill development throughout the workflow. To achieve this goal, this requires intentional design interventions that encourage workers to remain actively engaged with the task, such as mechanisms that promote critical evaluation of model outputs or structures that preserve opportunities for learning and expertise development (Ma et al., 2025; Reicherts et al., 2025).

Macroeconomic Divides and Digital Accessibility.

Finally, the integration of LLMs also threatens to exacerbate existing economic disparities. Traditional automation has historically widened the wage gap between high-skilled and low-skilled workers (Acemoglu & Restrepo, 2021), but LLMs introduce a specific “productivity divide” rooted in accessibility. Research into usage patterns suggests that ChatGPT adoption is positively correlated with higher education, high socioeconomic status, and residency in urbanized zip codes (Daepp & Counts, 2024). If access to state-of-the-art models remains concentrated within privileged demographics, the resulting disparity in AI-augmented productivity could further exacerbate systemic inequality.

Addressing this divide requires interventions at multiple levels. For one, as open-source models grow more capable, this can help reduce barriers to entry and broaden who is able to benefit from AI assistance. However, access alone is insufficient. As discussed in Bias and Fairness Evaluation, model performance and usability can vary across social groups, meaning that some populations may benefit less from interacting with LLMs even when they are available (Durmus et al., 2023; Hassan et al., 2025; Hofmann et al., 2024). Continued efforts to identify and mitigate such disparities in model behavior remain critical. Finally, improving AI literacy will be essential as a new generation of workers grows up with these technologies at their fingertips. Educational initiatives that teach users how to critically evaluate, effectively collaborate with, and responsibly leverage AI systems can help ensure that the productivity gains from LLMs are distributed more equitably (Cardon et al., 2023; Morales-Navarro et al., 2024, 2025; Okolo, 2024; Solyst et al., 2025).

& Company, M. (2025). What is productivity? https://www.mckinsey.com/featured-insights/mckinsey-explainers/what-is-productivity

Acemoglu, D., & Restrepo, P. (2020). The wrong kind of AI? Artificial intelligence and the future of labour demand. Cambridge Journal of Regions, Economy and Society, 13(1), 25–35.

Acemoglu, D., & Restrepo, P. (2021). Tasks, Automation, and the Rise in US Wage Inequality. Econometrica, 89(5), 1973–2019.

Anugraha, D., Padmakumar, V., & Yang, D. (2026). SparkMe: Adaptive Semi-Structured Interviewing for Qualitative Insight Discovery. arXiv Preprint arXiv:2602.21136.

Cardon, P., Fleischmann, C., Aritz, J., Logemann, M., & Heidewald, J. (2023). The challenges and opportunities of AI-assisted writing: Developing AI literacy for the AI age. Business and Professional Communication Quarterly, 86(3), 257–295.

Cui, Z. K., Demirer, M., Jaffe, S., Musolff, L., Peng, S., & Salz, T. (2025). The effects of generative AI on high-skilled work: Evidence from three field experiments with software developers. Available at SSRN 4945566.

Daepp, M. I. G., & Counts, S. (2024). The Emerging AI Divide in the United States. Working Paper.

Durmus, E., Nguyen, K., Liao, T. I., Schiefer, N., Askell, A., Bakhtin, A., Chen, C., Hatfield-Dodds, Z., Hernandez, D., Joseph, N., & others. (2023). Towards measuring the representation of subjective global opinions in language models. ArXiv Preprint, abs/2306.16388. https://arxiv.org/abs/2306.16388

Geertz, C. (2008). Thick description: Toward an interpretive theory of culture. In The cultural geography reader (pp. 41–51). Routledge.

Harvey, E., Koenecke, A., & Kizilcec, R. F. (2025). “ Don’t Forget the Teachers”: Towards an Educator-Centered Understanding of Harms from Large Language Models in Education. Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, 1–19.

Hassan, M. F., Khattak, F. K., & Seyyed-Kalantari, L. (2025). Dialectic Preference Bias in Large Language Models. Proceedings of the AAAI Symposium Series, 5(1), 365–369.

Hofmann, V., Kalluri, P. R., Jurafsky, D., & King, S. (2024). Dialect prejudice predicts AI decisions about people’s character, employability, and criminality. https://arxiv.org/abs/2403.00742

Johnson, C., Yenamandra, T., Ciskey, J., Ambrogi, T., & Olmedo Gunio, L. (2025). The AI Productivity Paradox in Software Development: A Meta-Analysis of Implementation Outcomes and ROI Patterns (2024-2025). Available at SSRN 5902163.

Karny, S., Mayer, L. W., Ayoub, J., Song, M., Su, H., Tian, D., Moradi-Pari, E., & Steyvers, M. (2024). Learning with AI Assistance: A Path to Better Task Performance or Dependence? Proceedings of the ACM Collective Intelligence Conference, 10–17.

Ma, S., Chen, Q., Wang, X., Zheng, C., Peng, Z., Yin, M., & Ma, X. (2025). Towards human-ai deliberation: Design and evaluation of llm-empowered deliberative ai for ai-assisted decision-making. Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, 1–23.

Macnamara, B. N., Berber, I., Çavuşoğlu, M. C., Krupinski, E. A., Nallapareddy, N., Nelson, N. E., Smith, P. J., Wilson-Delfosse, A. L., & Ray, S. (2024). Does using artificial intelligence assistance accelerate skill decay and hinder skill development without performers’ awareness? Cognitive Research: Principles and Implications, 9(1), 46.

Messeri, L., & Crockett, M. J. (2024). Artificial intelligence and illusions of understanding in scientific research. Nature, 627(8002), 49–58.

Morales-Navarro, L., Kafai, Y. B., Vogelstein, L., Yu, E., & Metaxa, D. (2025). Learning about algorithm auditing in five steps: Scaffolding how high school youth can systematically and critically evaluate machine learning applications. Proceedings of the AAAI Conference on Artificial Intelligence, 39(28), 29186–29194.

Morales-Navarro, L., Kafai, Y., Konda, V., & Metaxa, D. (2024). Youth as peer auditors: Engaging teenagers with algorithm auditing of machine learning applications. Proceedings of the 23rd Annual ACM Interaction Design and Children Conference, 560–573.

Okolo, C. T. (2024). Beyond AI hype: A hands-on workshop series for enhancing AI literacy in middle and high school students. Proceedings of the 2024 on RESPECT Annual Conference, 86–93.

Peng, S., Kalliamvakou, E., Cihon, P., & Demirer, M. (2023). The impact of ai on developer productivity: Evidence from github copilot. arXiv Preprint arXiv:2302.06590.

Ranganathan, A., & Ye, X. M. (2026). AI Doesn’t Reduce Work—It Intensifies It. Harvard Business Review. https://hbr.org/2026/02/ai-doesnt-reduce-work-it-intensifies-it

Reicherts, L., Zhang, Z. T., Von Oswald, E., Liu, Y., Rogers, Y., & Hassib, M. (2025). AI, help me think—but for myself: Assisting people in complex decision-making by providing different kinds of cognitive support. Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, 1–19.

Shao, Y., Samuel, V., Jiang, Y., Yang, J., & Yang, D. (2024). Collaborative gym: A framework for enabling and evaluating human-agent collaboration. arXiv Preprint arXiv:2412.15701.

Shao, Y., Zope, H., Jiang, Y., Pei, J., Nguyen, D., Brynjolfsson, E., & Yang, D. (2025). Future of Work with AI Agents: Auditing Automation and Augmentation Potential across the US Workforce. arXiv Preprint arXiv:2506.06576.

Shen, J. H., & Tamkin, A. (2026). How AI impacts skill formation. arXiv Preprint arXiv:2601.20245.

Solyst, J., Peng, C., Deng, W. H., Pratapa, P., Ogan, A., Hammer, J., Hong, J., & Eslami, M. (2025). Investigating Youth AI Auditing. Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency, 2098–2111.

Wang, Z. Z., Shao, Y., Shaikh, O., Fried, D., Neubig, G., & Yang, D. (2025). How Do AI Agents Do Human Work? Comparing AI and Human Workflows Across Diverse Occupations. arXiv Preprint arXiv:2510.22780.

Responsibly Deploying HCLLMs in the Workforce

The Paradox of Productivity and Work Intensification.

The Risk of Cognitive Deskilling.

Macroeconomic Divides and Digital Accessibility.

Graph View

Table of Contents

Backlinks

Literature NotesLiterature NotesLiterature NotesLiterature NotesLiterature NotesLiterature Notes

The Paradox of Productivity and Work Intensification.

The Risk of Cognitive Deskilling.

Macroeconomic Divides and Digital Accessibility.

Graph View

Table of Contents

Backlinks