We established that the goal of the post-training stages is to align LLMs to human preferences. Nonetheless, the term “human preferences” is blanket statement that obscures the diverse and potentially conflicting desires that different users have. Rather than trying to create LLMs that can satisfy everyone, the goal of personalization is to align a model’s outputs with the preferences of a single individual, both in terms of content and style (Tseng et al., 2024ReferenceTseng, Y.-M., Huang, Y.-C., Hsiao, T.-Y., Chen, W.-L., Huang, C.-W., Meng, Y., & Chen, Y.-N. (2024). Two tales of persona in llms: A survey of role-playing and personalization. Findings of the Association for Computational Linguistics: EMNLP 2024, 16612–16631.; Zhang et al., 2024ReferenceZhang, Z., Rossi, R. A., Kveton, B., Shao, Y., Yang, D., Zamani, H., Dernoncourt, F., Barrow, J., Yu, T., Kim, S., & others. (2024). Personalization of large language models: A survey. Transactions on Machine Learning Research.).

Current Approaches

We cover three families of techniques for personalizing LLMs: prompting-based approaches, retrieval-based approaches, and personalized alignment (e.g., learning from human preferences). There is no “best” technique for personalization and instead depends on factors, such as what data is available, compute efficiency, degree of personalization required, and so on.

Prompting-based Approaches.

Given the capabilities of LLM, prompt-based approaches provide an efficient and popular approach to personalizing model outputs. One prompt-based technique is to provide a user persona directly in the text instructions provided to the model. Here, the critical research decision is figuring out what content to provide for personalization. For example, prior work has explored providing demographic information (e.g., race, gender) to the model for personalization, although these personas run the risk of stereotypes or caricaturization (Cheng, Durmus, et al., 2023; Cheng, Piccardi, et al., 2023; Gupta et al., 2023; Huang et al., 2023)ReferenceCheng, M., Durmus, E., & Jurafsky, D. (2023). Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models. In A. Rogers, J. Boyd-Graber, & N. Okazaki (Eds.), Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 1504–1532). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.acl-long.84ReferenceCheng, M., Piccardi, T., & Yang, D. (2023). CoMPosT: Characterizing and Evaluating Caricature in LLM Simulations. In H. Bouamor, J. Pino, & K. Bali (Eds.), Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (pp. 10853–10875). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.emnlp-main.669ReferenceGupta, V., Venkit, P. N., Laurençon, H., Wilson, S., & Passonneau, R. J. (2023). CALM: A Multi-task Benchmark for Comprehensive Assessment of Language Model Bias. ArXiv Preprint, abs/2308.12539. https://arxiv.org/abs/2308.12539ReferenceHuang, J., Wang, W., Li, E. J., Lam, M. H., Ren, S., Yuan, Y., Jiao, W., Tu, Z., & Lyu, M. (2023). On the humanity of conversational ai: Evaluating the psychological portrayal of llms. The Twelfth International Conference on Learning Representations.. In addition, this approach has been adopted to imbue character traits upon the model, such as prompting the responses to reflect certain personality qualities (e.g., extroversion, warmth) or different tones . Since these methods can suffer from inefficiency and information loss (Liu et al., 2024ReferenceLiu, N. F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., & Liang, P. (2024). Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics, 12, 157–173.), other works have also explored embedding user information into tokenized prompt embeddings (Hebert et al., 2024ReferenceHebert, L., Sayana, K., Jash, A., Karatzoglou, A., Sodhi, S., Doddapaneni, S., Cai, Y., & Kuzmin, D. (2024). PERSOMA: PERsonalized SOft ProMpt Adapter Architecture for Personalized Language Prompting. In ArXiv preprint: Vol. abs/2408.00960. https://arxiv.org/abs/2408.00960; Q. Huang et al., 2024ReferenceHuang, Q., Liu, X., Ko, T., Wu, B., Wang, W., Zhang, Y., & Tang, L. (2024). Selective Prompting Tuning for Personalized Conversations with LLMs. In ArXiv preprint: Vol. abs/2406.18187. https://arxiv.org/abs/2406.18187; Li et al., 2024ReferenceLi, X., Lipton, Z. C., & Leqi, L. (2024). Personalized Language Modeling from Personalized Human Feedback. In ArXiv preprint: Vol. abs/2402.05133. https://arxiv.org/abs/2402.05133).

Retrieval-based Approaches.

What information about the user is relevant depends on the context. Retrieval augmented personalization methods retrieve information from an external knowledge base that is then incorporated at inference-time to personalize the model’s output. As an initial foray into this area, Salemi et al. (2024)ReferenceSalemi, A., Mysore, S., Bendersky, M., & Zamani, H. (2024). LaMP: When Large Language Models Meet Personalization. In L.-W. Ku, A. Martins, & V. Srikumar (Eds.), Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 7370–7392). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.acl-long.399 benchmark different retrievers, including BM25 and a dense retrieval model, on a series of personalization tasks. Other work has continued to refine retrieval methods, exploring how to improve capabilities while reducing the amount of retrieved data via summarization (Richardson et al., 2023ReferenceRichardson, C., Zhang, Y., Gillespie, K., Kar, S., Singh, A., Raeesy, Z., Khan, O. Z., & Sethy, A. (2023). Integrating Summarization and Retrieval for Enhanced Personalization via Large Language Models. In ArXiv preprint: Vol. abs/2310.20081. https://arxiv.org/abs/2310.20081) or adopting techniques from other areas such as collaborative filtering (Shi et al., 2025ReferenceShi, T., Xu, J., Zhang, X., Zang, X., Zheng, K., Song, Y., & Li, H. (2025). Retrieval augmented generation with collaborative filtering for personalized text generation. Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, 1294–1304.). Although retrieval-based methods allow for on-the-fly personalization of models at inference-time, the performance is constrained by retriever quality. For example, lexical retrievers, such as BM25, may match on shallow keyword similarity rather than deep semantic understanding of user needs. Furthermore, this approach may be more constrained in cold-start situations or domains with sparse user data, where the knowledge base itself may be insufficient to support meaningful personalization.

Personalized Alignment.

Finally, rather than adapting model behavior at inference time, training-based approaches can embed individual preferences into alignment objectives. For example, several works present methods for creating personalized reward models, such as approaches that train multiple reward models that are later merged (Jang et al., 2023ReferenceJang, J., Kim, S., Lin, B. Y., Wang, Y., Hessel, J., Zettlemoyer, L., Hajishirzi, H., Choi, Y., & Ammanabrolu, P. (2023). Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging. ArXiv Preprint, abs/2310.11564. https://arxiv.org/abs/2310.11564). Critically, these approaches require that the dimensions for personalization are defined a priori, limiting the scope to which they can be applied. Alternative approaches have sought to loosen these constraints, proposing methods that learn preferences from historical user interactions, which can then be used to train personalized reward models or directly align LLMs (Balepur et al., 2025ReferenceBalepur, N., Padmakumar, V., Yang, F., Feng, S., Rudinger, R., & Boyd-Graber, J. L. (2025). Whose Boat Does it Float? Improving Personalization in Preference Tuning via Inferred User Personas. In W. Che, J. Nabende, E. Shutova, & M. T. Pilehvar (Eds.), Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics.; Poddar et al., 2024ReferencePoddar, S., Wan, Y., Ivison, H., Gupta, A., & Jaques, N. (2024). Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning. The Thirty-Eighth Annual Conference on Neural Information Processing Systems. https://openreview.net/forum?id=gRG6SzbW9p; Ryan et al., 2025ReferenceRyan, M. J., Shaikh, O., Bhagirath, A., Frees, D., Held, W. B., & Yang, D. (2025). Synthesizeme! inducing persona-guided prompts for personalized reward models in llms. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 8045–8078.). However, training-based personalization faces practical limitations: these methods require substantial amounts of user data in domains where data is often scarce, risk overfitting to individual user patterns, and incur significantly higher computational costs compared to retrieval-based or prompting approaches (Ryan et al., 2025ReferenceRyan, M. J., Shaikh, O., Bhagirath, A., Frees, D., Held, W. B., & Yang, D. (2025). Synthesizeme! inducing persona-guided prompts for personalized reward models in llms. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 8045–8078.; Shaikh et al., 2024ReferenceShaikh, O., Lam, M., Hejna, J., Shao, Y., Bernstein, M., & Yang, D. (2024). Show, Don’t Tell: Aligning Language Models with Demonstrated Feedback. ArXiv Preprint, abs/2406.00888. https://arxiv.org/abs/2406.00888; Zhang et al., 2024ReferenceZhang, Z., Rossi, R. A., Kveton, B., Shao, Y., Yang, D., Zamani, H., Dernoncourt, F., Barrow, J., Yu, T., Kim, S., & others. (2024). Personalization of large language models: A survey. Transactions on Machine Learning Research.). These tradeoffs make training-based personalization most suitable for scenarios where there is sufficient data present and customization justifies the additional resource investment.

Future of Personalization for HCLLMs

A central challenge in personalization is deciding what the model should adapt to. For example, economists will distinguish between stated preferences—what people say they want—and revealed preferences—inferred from behavior (Samuelson, 1948ReferenceSamuelson, P. A. (1948). Consumption theory in terms of revealed preference. Economica, 15(60), 243–253.). In language model personalization, this tension also manifests. Behavioral signals such as query reformulations, response ratings, or conversation length may reflect immediate satisfaction but diverge from users’ stated goals or long-term values. Consider a user learning a new subject who frequently requests direct answers. Based on inferred signals, the model might be personalized to comply whereas a model targeting learning outcomes might instead offer scaffolded hints. This raises fundamental questions about system objectives and user agency: which preferences should dominate when conflict arises, and how should systems handle patterns users exhibit but might not endorse upon reflection?

Temporal dynamics present an additional challenge. User preferences and needs evolve over timescales, ranging from within-session learning to long-term skill development and shifting life contexts. Yet, current approaches treat personalization as a static task. While recent work has extended the length of conversation in personalization datasets, we lack real-world benchmarks that capture these longer-term dynamics or benchmarks for evaluating personalization over time (Kirk et al., 2024ReferenceKirk, H. R., Whitefield, A., Röttger, P., Bean, A. M., Margatina, K., Mosquera, R., Ciro, J. M., Bartolo, M., Williams, A., He, H., Vidgen, B., & Hale, S. A. (2024). The PRISM Alignment Dataset: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models. The Thirty-Eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track. https://openreview.net/forum?id=DFr5hteojx; Zhang et al., 2024ReferenceZhang, Z., Rossi, R. A., Kveton, B., Shao, Y., Yang, D., Zamani, H., Dernoncourt, F., Barrow, J., Yu, T., Kim, S., & others. (2024). Personalization of large language models: A survey. Transactions on Machine Learning Research.; Zhao et al., 2025ReferenceZhao, J., Huang, J., Wu, Z., Bau, D., & Shi, W. (2025). Llms encode harmfulness and refusal separately. arXiv Preprint arXiv:2507.11878.). There is some early work that explores updating user profiles over time, but critical questions remain open (Wang et al., 2024ReferenceWang, T., Tao, M., Fang, R., Wang, H., Wang, S., Jiang, Y. E., & Zhou, W. (2024). AI persona: Towards life-long personalization of llms. arXiv Preprint arXiv:2412.13103.). For instance, how can systems distinguish transient preferences (a user exploring a new hobby) from enduring ones (a domain expert’s consistent working style)? What mechanisms allow for efficient model updates under these circumstances? These temporal considerations compound the preference alignment challenges; even if we can perfectly identify and incorporate a user’s current preferences into LLM outputs, those preferences themselves may be moving targets.

Balepur, N., Padmakumar, V., Yang, F., Feng, S., Rudinger, R., & Boyd-Graber, J. L. (2025). Whose Boat Does it Float? Improving Personalization in Preference Tuning via Inferred User Personas. In W. Che, J. Nabende, E. Shutova, & M. T. Pilehvar (Eds.), Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics.
Cheng, M., Durmus, E., & Jurafsky, D. (2023). Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models. In A. Rogers, J. Boyd-Graber, & N. Okazaki (Eds.), Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 1504–1532). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.acl-long.84
Cheng, M., Piccardi, T., & Yang, D. (2023). CoMPosT: Characterizing and Evaluating Caricature in LLM Simulations. In H. Bouamor, J. Pino, & K. Bali (Eds.), Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (pp. 10853–10875). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.emnlp-main.669
Gupta, V., Venkit, P. N., Laurençon, H., Wilson, S., & Passonneau, R. J. (2023). CALM: A Multi-task Benchmark for Comprehensive Assessment of Language Model Bias. ArXiv Preprint, abs/2308.12539. https://arxiv.org/abs/2308.12539
Hebert, L., Sayana, K., Jash, A., Karatzoglou, A., Sodhi, S., Doddapaneni, S., Cai, Y., & Kuzmin, D. (2024). PERSOMA: PERsonalized SOft ProMpt Adapter Architecture for Personalized Language Prompting. In ArXiv preprint: Vol. abs/2408.00960. https://arxiv.org/abs/2408.00960
Huang, J., Wang, W., Li, E. J., Lam, M. H., Ren, S., Yuan, Y., Jiao, W., Tu, Z., & Lyu, M. (2023). On the humanity of conversational ai: Evaluating the psychological portrayal of llms. The Twelfth International Conference on Learning Representations.
Huang, Q., Liu, X., Ko, T., Wu, B., Wang, W., Zhang, Y., & Tang, L. (2024). Selective Prompting Tuning for Personalized Conversations with LLMs. In ArXiv preprint: Vol. abs/2406.18187. https://arxiv.org/abs/2406.18187
Jang, J., Kim, S., Lin, B. Y., Wang, Y., Hessel, J., Zettlemoyer, L., Hajishirzi, H., Choi, Y., & Ammanabrolu, P. (2023). Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging. ArXiv Preprint, abs/2310.11564. https://arxiv.org/abs/2310.11564
Kirk, H. R., Whitefield, A., Röttger, P., Bean, A. M., Margatina, K., Mosquera, R., Ciro, J. M., Bartolo, M., Williams, A., He, H., Vidgen, B., & Hale, S. A. (2024). The PRISM Alignment Dataset: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models. The Thirty-Eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track. https://openreview.net/forum?id=DFr5hteojx
Li, X., Lipton, Z. C., & Leqi, L. (2024). Personalized Language Modeling from Personalized Human Feedback. In ArXiv preprint: Vol. abs/2402.05133. https://arxiv.org/abs/2402.05133
Liu, N. F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., & Liang, P. (2024). Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics, 12, 157–173.
Poddar, S., Wan, Y., Ivison, H., Gupta, A., & Jaques, N. (2024). Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning. The Thirty-Eighth Annual Conference on Neural Information Processing Systems. https://openreview.net/forum?id=gRG6SzbW9p
Richardson, C., Zhang, Y., Gillespie, K., Kar, S., Singh, A., Raeesy, Z., Khan, O. Z., & Sethy, A. (2023). Integrating Summarization and Retrieval for Enhanced Personalization via Large Language Models. In ArXiv preprint: Vol. abs/2310.20081. https://arxiv.org/abs/2310.20081
Ryan, M. J., Shaikh, O., Bhagirath, A., Frees, D., Held, W. B., & Yang, D. (2025). Synthesizeme! inducing persona-guided prompts for personalized reward models in llms. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 8045–8078.
Salemi, A., Mysore, S., Bendersky, M., & Zamani, H. (2024). LaMP: When Large Language Models Meet Personalization. In L.-W. Ku, A. Martins, & V. Srikumar (Eds.), Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 7370–7392). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.acl-long.399
Samuelson, P. A. (1948). Consumption theory in terms of revealed preference. Economica, 15(60), 243–253.
Shaikh, O., Lam, M., Hejna, J., Shao, Y., Bernstein, M., & Yang, D. (2024). Show, Don’t Tell: Aligning Language Models with Demonstrated Feedback. ArXiv Preprint, abs/2406.00888. https://arxiv.org/abs/2406.00888
Shi, T., Xu, J., Zhang, X., Zang, X., Zheng, K., Song, Y., & Li, H. (2025). Retrieval augmented generation with collaborative filtering for personalized text generation. Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, 1294–1304.
Tseng, Y.-M., Huang, Y.-C., Hsiao, T.-Y., Chen, W.-L., Huang, C.-W., Meng, Y., & Chen, Y.-N. (2024). Two tales of persona in llms: A survey of role-playing and personalization. Findings of the Association for Computational Linguistics: EMNLP 2024, 16612–16631.
Wang, T., Tao, M., Fang, R., Wang, H., Wang, S., Jiang, Y. E., & Zhou, W. (2024). AI persona: Towards life-long personalization of llms. arXiv Preprint arXiv:2412.13103.
Zhang, Z., Rossi, R. A., Kveton, B., Shao, Y., Yang, D., Zamani, H., Dernoncourt, F., Barrow, J., Yu, T., Kim, S., & others. (2024). Personalization of large language models: A survey. Transactions on Machine Learning Research.
Zhao, J., Huang, J., Wu, Z., Bau, D., & Shi, W. (2025). Llms encode harmfulness and refusal separately. arXiv Preprint arXiv:2507.11878.