Current Approaches to Steerability
Steerability is the second dimension we highlight for HCLLM deployment. Steerability is the degree to which a model can be aligned along a particular dimension (Miehling et al., 2025ReferenceMiehling, E., Desmond, M., Ramamurthy, K. N., Daly, E. M., Varshney, K. R., Farchi, E., Dognin, P., Rios, J., Bouneffouf, D., Liu, M., & others. (2025). Evaluating the prompt steerability of large language models. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) (pp. 7874–7900).), such as preferences, norms, or user-constraints (Chen et al., 2025ReferenceChen, K., He, Z., Shi, T., & Lerman, K. (2025). STEER-BENCH: A Benchmark for Evaluating the Steerability of Large Language Models. arXiv Preprint arXiv:2505.20645.). In contrast to static notions of alignment that aim to produce a single globally acceptable behavior, steerability emphasizes conditional control, or the ability to modulate LLM outputs in accord with its particular users. This property is especially important for HCLLMs, which are designed to interact with diverse users embedded in heterogeneous social, cultural, and institutional contexts.
Steerability spans multiple dimensions. First, personalization is the goal in which an LLM’s outputs adaptively reflect the preferences of individual users or groups of users, along with their prior knowledge, goals, and needs (Tseng et al., 2024ReferenceTseng, Y.-M., Huang, Y.-C., Hsiao, T.-Y., Chen, W.-L., Huang, C.-W., Meng, Y., & Chen, Y.-N. (2024). Two tales of persona in llms: A survey of role-playing and personalization. Findings of the Association for Computational Linguistics: EMNLP 2024, 16612–16631.). Personalized models should be able to provide more relevant recommendations (Hu et al., 2024ReferenceHu, J., Xia, W., Zhang, X., Fu, C., Wu, W., Huan, Z., Li, A., Tang, Z., & Zhou, J. (2024). Enhancing sequential recommendation via llm-based semantic embedding learning. Companion Proceedings of the ACM Web Conference 2024, 103–111.), and they should be calibrated to the user’s preferred writing styles (Zhang et al., 2024ReferenceZhang, Z., Rossi, R. A., Kveton, B., Shao, Y., Yang, D., Zamani, H., Dernoncourt, F., Barrow, J., Yu, T., Kim, S., & others. (2024). Personalization of large language models: A survey. Transactions on Machine Learning Research.), learning styles (Park et al., 2024ReferencePark, M., Kim, S., Lee, S., Kwon, S., & Kim, K. (2024). Empowering personalized learning through a conversation-based tutoring system with student modeling. Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, 1–10.), and norms around privacy (Asthana et al., 2024ReferenceAsthana, S., Im, J., Chen, Z., & Banovic, N. (2024). “ I know even if you don’t tell me”: Understanding Users’ Privacy Preferences Regarding AI-based Inferences of Sensitive Information for Personalization. Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems, 1–21.; Shao et al., 2024ReferenceShao, Y., Li, T., Shi, W., Liu, Y., & Yang, D. (2024). Privacylens: Evaluating privacy norm awareness of language models in action. Advances in Neural Information Processing Systems, 37, 89373–89407.) and social behavior (Li et al., 2024ReferenceLi, S., Sun, T., Cheng, Q., & Qiu, X. (2024). Agent Alignment in Evolving Social Norms. https://arxiv.org/abs/2401.04620). A user’s preferences can be derived from explicit feedback, as in pairwise preference datasets, or from implicit feedback like historical interaction data. For more in-depth discussion of personalization methods, see Personalization.
Related to personalization is persona alignment or role play. Here, the goal is that HCLLMs will adopt consistent identities or roles, like software developers, expert tutors, empathetic counselors, or skeptical reviewers, which remain stable across interactions (J. Li et al., 2024ReferenceLi, J., Peris, C., Mehrabi, N., Goyal, P., Chang, K.-W., Galstyan, A., Zemel, R., & Gupta, R. (2024). The steerability of large language models toward data-driven personas. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 7290–7305.; Samuel et al., 2024ReferenceSamuel, V., Zou, H. P., Zhou, Y., Chaudhari, S., Kalyan, A., Rajpurohit, T., Deshpande, A., Narasimhan, K., & Murahari, V. (2024). Personagym: Evaluating persona agents and llms. arXiv Preprint arXiv:2407.18416.; Shanahan et al., 2023ReferenceShanahan, M., McDonell, K., & Reynolds, L. (2023). Role play with large language models. Nature, 623(7987), 493–498.). Persona alignment is especially important in multi-agent settings where multiple LLM personas interact and collaborate (Guo et al., 2024ReferenceGuo, T., Chen, X., Wang, Y., Chang, R., Pei, S., Chawla, N. V., Wiest, O., & Zhang, X. (2024). Large language model based multi-agents: a survey of progress and challenges. Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 8048–8057.; J. S. Park et al., 2023ReferencePark, J. S., O’Brien, J., Cai, C. J., Morris, M. R., Liang, P., & Bernstein, M. S. (2023). Generative agents: Interactive simulacra of human behavior. Proceedings of the 36th Annual Acm Symposium on User Interface Software and Technology, 1–22.).
In addition to better personalization and role playing, steerable HCLLMs should be able to adaptively understand low-resource languages, dialects, or sociolinguistic varieties (Ziems et al., 2023ReferenceZiems, C., Held, W., Yang, J., Dhamala, J., Gupta, R., & Yang, D. (2023). Multi-VALUE: A framework for cross-dialectal English NLP. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 744–768.). We refer to this target as linguistic alignment. This form of steerability is critical for equitable access, as models trained predominantly on high-resource, standardized corpora often underperform for marginalized linguistic communities (see Data Representation, Bias and Ethics). Finally, cultural alignment means that models can be steered to reflect the norms, values, and narratives of particular communities or demographic groups (Santurkar et al., 2023ReferenceSanturkar, S., Durmus, E., Ladhak, F., Lee, C., Liang, P., & Hashimoto, T. (2023). Whose Opinions Do Language Models Reflect? In A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, & J. Scarlett (Eds.), International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA (Vol. 202, pp. 29971–30004). PMLR. https://proceedings.mlr.press/v202/santurkar23a.html). Unlike personalization, which targets individuals, cultural alignment operates at the level of shared practices and collective meaning-making.
A range of technical mechanisms support steerability. At inference time, models may be steered through in-context learning, prompt engineering, or output filtering (Wies et al., 2023ReferenceWies, N., Levine, Y., & Shashua, A. (2023). The learnability of in-context learning. Advances in Neural Information Processing Systems, 36, 36637–36651.). Post-hoc control methods can condition generation on specific attributes or enforce constraints via decoding strategies. More structurally, personalized reward models and fine-tuning procedures can encode user- or group-specific objectives into model parameters (D. Chen et al., 2024ReferenceChen, D., Chen, Y., Rege, A., & Vinayak, R. K. (2024). PAL: Pluralistic Alignment Framework for Learning from Heterogeneous Preferences. NeurIPS 2024 Workshop on Fine-Tuning in Modern Machine Learning: Principles and Scalability. https://openreview.net/forum?id=wVg2kVQOzq).
Steerability is still fundamentally constrained by the representational biases embedded in pre-training and post-training data (Mihalcea et al., 2025ReferenceMihalcea, R., Ignat, O., Bai, L., Borah, A., Chiruzzo, L., Jin, Z., Kwizera, C., Nwatu, J., Poria, S., & Solorio, T. (2025). Why AI Is WEIRD and Shouldn’t Be This Way: Towards AI for Everyone, with Everyone, by Everyone. Proceedings of the AAAI Conference on Artificial Intelligence, 39(27), 28657–28670.). If certain identities, linguistic forms, or cultural narratives are underrepresented or stereotyped in the training corpus, then prompt-based steering may have limited expressive range. In this sense, steerability is not just a matter of control at inference time, but is rooted much earlier in data provenance (Data Provenance) and evaluation (Evaluation). Steerability starts with measuring where biases arise, localizing their sources in the data pipeline, and redesigning collection and annotation practices accordingly.
Looking Forward
Looking forward, we envision several research directions that extend current notions of steerability. First, we consider continual learning for preference, culture, and pluralistic alignment. Continual preference learning would allow models to adapt dynamically to evolving user needs by tracking implicit cues in interaction patterns or contextual data (Shaikh et al., 2025ReferenceShaikh, O., Sapkota, S., Rizvi, S., Horvitz, E., Park, J. S., Yang, D., & Bernstein, M. S. (2025). Creating General User Models from Computer Use. Proceedings of the 38th Annual ACM Symposium on User Interface Software and Technology. https://doi.org/10.1145/3746059.3747722). Such systems often rely on persistent memory stores like chat histories or retrieval-augmented generation. A central challenge is achieving this adaptivity while preserving user autonomy and privacy (Zhang et al., 2025ReferenceZhang, Z., Zhang, Y. E., Shi, F., & Li, T. (2025). Autonomy Matters: A Study on Personalization-Privacy Dilemma in LLM Agents. arXiv Preprint arXiv:2510.04465.). In cultural and pluralistic alignment, rather than optimizing toward a static representation of culture or user values, HCLLMs should accommodate evolving norms and intra-group disagreement, as supported by continual learning. One way to elicit these evolving norms is to facilitate community-centered discussion and debate, following methods like STELA (Bergman et al., 2024ReferenceBergman, S., Marchal, N., Mellor, J., Mohamed, S., Gabriel, I., & Isaac, W. (2024). STELA: a community-centred approach to norm elicitation for AI alignment. Scientific Reports, 14(1), 6616.). Rather than treating communities as homogeneous preference aggregates, pluralistic alignment frameworks should model disagreement as a first-class signal.
Pluralistic alignment comes with an array of technical and political challenges that will need to be addressed. On the technical side, post-training can induce mode-collapse, in which heterogeneous group preferences and opinions are compressed in a lossy manner, suppressing minority viewpoints and preserving only the majority preferences for a given group (Bisbee et al., 2024ReferenceBisbee, J., Clinton, J. D., Dorff, C., Kenkel, B., & Larson, J. M. (2024). Synthetic replacements for human survey data? The perils of large language models. Political Analysis, 32(4), 401–416.; Durmus et al., 2024ReferenceDurmus, E., Nguyen, K., Liao, T., Schiefer, N., Askell, A., Bakhtin, A., Chen, C., Hatfield-Dodds, Z., Hernandez, D., Joseph, N., Lovitt, L., McCandlish, S., Sikder, O., Tamkin, A., Thamkul, J., Kaplan, J., Clark, J., & Ganguli, D. (2024). Towards Measuring the Representation of Subjective Global Opinions in Language Models. First Conference on Language Modeling. https://openreview.net/forum?id=zl16jLb91v; Röttger et al., 2024ReferenceRöttger, P., Hofmann, V., Pyatkin, V., Hinck, M., Kirk, H., Schütze, H., & Hovy, D. (2024). Political compass or spinning arrow? towards more meaningful evaluations for values and opinions in large language models. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 15295–15311.). To elicit diverse generations from LLMs, some inference time methods use iterative prompting (Feng et al., 2024ReferenceFeng, S., Sorensen, T., Liu, Y., Fisher, J., Park, C. Y., Choi, Y., & Tsvetkov, Y. (2024). Modular Pluralism: Pluralistic Alignment via Multi-LLM Collaboration. In Y. Al-Onaizan, M. Bansal, & Y.-N. Chen (Eds.), Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (pp. 4151–4171). Association for Computational Linguistics. https://aclanthology.org/2024.emnlp-main.240; Hayati et al., 2024ReferenceHayati, S. A., Lee, M., Rajagopal, D., & Kang, D. (2024). How Far Can We Extract Diverse Perspectives from Large Language Models? In Y. Al-Onaizan, M. Bansal, & Y.-N. Chen (Eds.), Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (pp. 5336–5366). Association for Computational Linguistics. https://aclanthology.org/2024.emnlp-main.306), but these methods are not calibrated to real-world distributions. Other methods involve modifying the prompt with explicit identity terms for diverse user groups (Giulianelli et al., 2023ReferenceGiulianelli, M., Baan, J., Aziz, W., Fern’andez, R., & Plank, B. (2023). What Comes Next? Evaluating Uncertainty in Neural Text Generators Against Human Production Variability. Conference on Empirical Methods in Natural Language Processing. https://api.semanticscholar.org/CorpusID:258823318), but identity coded names induce models to draw on distributions of stereotypical representations or out-group perceptions rather than in-group perspectives (Wang et al., 2025ReferenceWang, A., Morgenstern, J., & Dickerson, J. P. (2025). Large language models that replace human participants can harmfully misportray and flatten identity groups. Nature Machine Intelligence, 7(3), 400–411.). To address mode-collapse in a manner that preserves in-group perspectives, it may be necessary to update LLM priors implicitly through distributionally-aligned in-context examples, following methods like spectrum tuning (Sorensen et al., 2025ReferenceSorensen, T., Newman, B., Moore, J., Park, C., Fisher, J., Mireshghallah, N., Jiang, L., & Choi, Y. (2025). Spectrum tuning: Post-training for distributional coverage and in-context steerability. arXiv Preprint arXiv:2510.06084.).
It is not only a methodological challenge, but also a political challenge to achieve localized, domain-specific alignment processes that enable communities and stakeholders to meaningfully shape model behavior (Delgado et al., 2023ReferenceDelgado, F., Yang, S., Madaio, M., & Yang, Q. (2023). The participatory turn in ai design: Theoretical foundations and the current state of practice. Proceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization, 1–23.). Methodologically, we have the participatory HCI approaches covered in Participatory Approaches. However, politically, most model developers lack incentives to share control with communities (Gabriel, 2020ReferenceGabriel, I. (2020). Artificial intelligence, values, and alignment. Minds and Machines, 30(3), 411–437.). Current alignment pipelines are centralized by the small set of companies with resources to develop LLMs. Similarly, academic institutions and governments can serve to centralize decision-making across the LLM development pipeline (Suresh et al., 2024ReferenceSuresh, H., Tseng, E., Young, M., Gray, M., Pierson, E., & Levy, K. (2024). Participation in the age of foundation models. Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, 1609–1621.). There are a number of less centralized alternatives. For example, Masakhane (Orife et al., 2020ReferenceOrife, I., Kreutzer, J., Sibanda, B., Whitenack, D., Siminyu, K., Martinus, L., Ali, J. T., Abbott, J., Marivate, V., Kabongo, S., & others. (2020). Masakhane–machine translation for Africa. arXiv Preprint arXiv:2003.11529.) is a network of NLP researchers working on NLP for local African languages, with community involvement at every stage, from dataset creation and annotation to model training. EleutherAI is a distributed, volunteer-driven research collective that has created open pre-training corpora (Gao et al., 2021ReferenceGao, L., Biderman, S., Black, S., Golding, L., Hoppe, T., Foster, C., Phang, J., He, H., Thite, A., Nabeshima, N., Presser, S., & Leahy, C. (2021). The Pile: An 800GB Dataset of Diverse Text for Language Modeling. In ArXiv preprint: Vol. abs/2101.00027. https://arxiv.org/abs/2101.00027), models (Black et al., 2022ReferenceBlack, S., Biderman, S., Hallahan, E., Anthony, Q., Gao, L., Golding, L., He, H., Leahy, C., McDonell, K., Phang, J., & others. (2022). Gpt-neox-20b: An open-source autoregressive language model. Proceedings of BigScience Episode# 5–Workshop on Challenges & Perspectives in Creating Large Language Models, 95–136.), and scaling checkpoints (Biderman et al., 2023ReferenceBiderman, S., Schoelkopf, H., Anthony, Q. G., Bradley, H., O’Brien, K., Hallahan, E., Khan, M. A., Purohit, S., Prashanth, U. S., Raff, E., & others. (2023). Pythia: A suite for analyzing large language models across training and scaling. International Conference on Machine Learning, 2397–2430.). BigScience was a research effort in which over 1,000 researchers from academia and industry coordinated through HuggingFace and built BLOOM (Workshop et al., 2022ReferenceWorkshop, B., Scao, T. L., Fan, A., Akiki, C., Pavlick, E., Ilić, S., Hesslow, D., Castagné, R., Luccioni, A. S., Yvon, F., & others. (2022). Bloom: A 176b-parameter open-access multilingual language model. arXiv Preprint arXiv:2211.05100.), a 176B-parameter multilingual autoregressive language model, with decision-making steps clearly and publicly documented.
These initiatives illustrate that steerability does not need to be confined to top-down fine-tuning interfaces or proprietary alignment pipelines. Distributed model development allows communities to steer models from the ground up, establishing linguistic resources, value functions, and development practices that ultimately shape downstream behavior. At the same time, decentralization introduces its own tensions, including coordination costs, uneven resource distribution, and challenges of accountability. Open and community-led efforts may broaden participation, but they must still grapple with questions of safety, quality control, and transparent maintenance of HCLLMs. In the following sections, we will discuss safety (Safe HCLLMs) and interpretability (Interpretable and Explainable HCLLMs), as well as the tensions between these objectives.