Pluralism

While we have discussed methods for aligning models to human preferences and values, the important, unaddressed question is whose values. Not all humans share the same set of values (Durmus et al., 2024), and what is permissible to some may be irrelevant or harmful to others. Rather than assuming values are monolithic and aligning models to a “Silicon Valley default” set of preferences, pluralistic alignment is an approach of aligning language models to simultaneously serve diverse preferences. As Sorensen et al. (2024) define, pluralistic alignment is the process of creating language models capable of representing a diverse set of human values and perspectives.

Current Approaches

Defining Pluralism

Prior work has proposed three definitions as to how pluralism can be embedded into models (Sorensen, Moore, Fisher, Gordon, Mireshghallah, Rytting, Ye, Jiang, Lu, Dziri, & others, 2024):

Overton Pluralism: An Overton pluralistc model generates all reasonable perspectives for a given subject. The model draws on the concept of the “Overton window,” which encompasses the range of ideas and perspectives that are considered acceptable within the mainstream. Scholars such as Lake et al. (2024) have posited that Overton pluralism is compatible with the status quo alignment process, as longer conversational outputs typically share a few varied perspectives.
Steerable Pluralism: Steerable pluralism advocates for using situational context to steer the model towards a particular perspective. One popular use case of steerable pluralism is improving the cultural alignment of language models to particular groups and cultures (Masoud et al., 2023; Tao et al., 2024). Personalization also falls under steerable pluralism as the model is steered to the values of a particular user.
Distributional Pluralism: Finally, in distributional pluralism, developers steer models to produce responses that roughly correspond with a population distribution. That is, if 70% of the population holds a particular opinion, the model will generate that opinion 70% of the time. As Lake et al. (2024) note, base models are somewhat distributionally aligned to the opinions present in pre-training data.

Methods for Pluralistic Alignment.

Several works have proposed different methods for alignment, depending on what type of pluralism they seek to achieve. For Overton pluralism, researchers have explored prompting techniques (Meincke et al., 2024) and approaches that generate multiple perspectives before synthesizing them (Feng et al., 2024; Hayati et al., 2024; Li et al., 2024). For steerable pluralism, personalized reward models (Chen et al., 2024; Jang et al., 2023; Poddar et al., 2024) and optimization techniques (e.g., Group Preference Optimization (Zhao et al., 2023)) enable alignment to specific user or group preferences with minimal context. Other common implementation methods include targeted prompting (AlKhamissi et al., 2024) and generate-then-filter approaches (Feng et al., 2024). Finally, for distributional pluralism, researchers generate diverse perspectives and filter based on population distributions (Feng et al., 2024), with surveys serving as key measurement tools since they enable matching LLM probability distributions to actual population responses. OpinionsQA (Santurkar et al., 2023) and GlobalOpinionsQA (Durmus et al., 2024) are two widely-used survey benchmarks for evaluating distributional alignment.

In tandem, a data-centric approach to pluralistic alignment has focused on collecting datasets that represent a diversity of values and perspectives. Chatbot Arena (Chiang et al., 2024) and PRISM (Kirk et al., 2024) are two such collections of preference data that retain user labels. The PERSONA dataset attempts to simulate 1,500 users with synthetic personas for the purpose of studying pluralistic alignment, providing synthetic prompts and feedback pairs (Castricato et al., 2025). The ValuePrism dataset (Sorensen, Jiang, Hwang, Levine, Pyatkin, West, Dziri, Lu, Rao, Bhagavatula, Sap, et al., 2024) is a collection of values, rights, and duties applied to various scenarios. They also release the KALEIDO model for measuring how certain statements agree or disagree with particular values. ValueConsistency (Moore et al., 2024) is a dataset of 300 controversial topics in four languages with corresponding controversial questions. Moral Stories (Emelin et al., 2021) is a dataset of narratives with moral dilemmas with multiple endings. The stories were written from annotators of various demographic backgrounds and express different social and moral norms.

Future of Pluralistic Alignment for HCLLMs

Achieving effective pluralistic alignment faces several challenges. First, figuring out how to appropriately and effectively model diverse perspectives is a key precursor to effective pluralistic alignment. Existing literature has shown that models tend to emphasize stereotypical representations when asked to simulate perspectives from different demographic groups (Cheng, Durmus, et al., 2023; Cheng, Piccardi, et al., 2023; Deshpande et al., 2023), meaning that approaches that generate and synthesize varied perspectives require careful design to avoid misrepresenting communities. While prompting LLMs with demographic features can improve alignment with group opinions (AlKhamissi et al., 2024), the choice of base model sometimes has a more significant effect than the sociodemographic features themselves (Beck et al., 2024). Effective perspective modeling remains an open problem.

A second challenge involves understanding and balancing the societal impacts of pluralistic models on public opinion. It is well-established that aligning LLMs to human preferences can exacerbate model sycophancy (Sharma et al., 2024), where models agree with users regardless of validity. Personalized models developed through steerable pluralism risk creating personalized echo chambers that reinforce rather than challenge user beliefs. Already there are concerns about “echo chambers” in media consumption, which may only be further augmented with LLMs (Barberá, 2020; Cinelli et al., 2021). Even though pluralistic alignment aims to broaden the viewpoints that models may espouse, these methods still define what perspectives are considered “in bounds” of acceptability or follow ostensible public opinion distributions. In doing so, these methods may inadvertently silence marginalized viewpoints that fall outside mainstream acceptability. Future work should empirically investigate how different pluralistic alignment strategies affect opinion formation and marginalization in practice, moving beyond theoretical concerns to measurable impacts on diverse user populations.

Finally, pluralistic alignment raises questions of data sovereignty and consent. Many communities, particularly indigenous groups, do not want their data and opinions collected for training AI systems (Rainie et al., 2019). Even when motivated by the goal of broad representation, developers must carefully consider whether all communities want their perspectives embedded in AI systems, respecting the right of groups to opt out of technological representation entirely.

AlKhamissi, B., ElNokrashy, M., Alkhamissi, M., & Diab, M. (2024). Investigating Cultural Alignment of Large Language Models. In L.-W. Ku, A. Martins, & V. Srikumar (Eds.), Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 12404–12422). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.acl-long.671

Barberá, P. (2020). Social media, echo chambers, and political polarization. Social Media and Democracy: The State of the Field, Prospects for Reform, 34–55.

Beck, T., Schuff, H., Lauscher, A., & Gurevych, I. (2024). Sensitivity, Performance, Robustness: Deconstructing the Effect of Sociodemographic Prompting. In Y. Graham & M. Purver (Eds.), Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 2589–2615). Association for Computational Linguistics. https://aclanthology.org/2024.eacl-long.159/

Castricato, L., Lile, N., Rafailov, R., Fränken, J.-P., & Finn, C. (2025). Persona: A reproducible testbed for pluralistic alignment. Proceedings of the 31st International Conference on Computational Linguistics, 11348–11368.

Chen, D., Chen, Y., Rege, A., & Vinayak, R. K. (2024). PAL: Pluralistic Alignment Framework for Learning from Heterogeneous Preferences. NeurIPS 2024 Workshop on Fine-Tuning in Modern Machine Learning: Principles and Scalability. https://openreview.net/forum?id=wVg2kVQOzq

Cheng, M., Durmus, E., & Jurafsky, D. (2023). Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models. In A. Rogers, J. Boyd-Graber, & N. Okazaki (Eds.), Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 1504–1532). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.acl-long.84

Cheng, M., Piccardi, T., & Yang, D. (2023). CoMPosT: Characterizing and Evaluating Caricature in LLM Simulations. In H. Bouamor, J. Pino, & K. Bali (Eds.), Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (pp. 10853–10875). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.emnlp-main.669

Chiang, W.-L., Zheng, L., Sheng, Y., Angelopoulos, A. N., Li, T., Li, D., Zhu, B., Zhang, H., Jordan, M. I., Gonzalez, J. E., & Stoica, I. (2024). Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference. ICML. https://openreview.net/forum?id=3MW8GKNyzI

Cinelli, M., De Francisci Morales, G., Galeazzi, A., Quattrociocchi, W., & Starnini, M. (2021). The echo chamber effect on social media. Proceedings of the National Academy of Sciences, 118(9).

Deshpande, A., Murahari, V., Rajpurohit, T., Kalyan, A., & Narasimhan, K. (2023). Toxicity in chatgpt: Analyzing persona-assigned language models. In H. Bouamor, J. Pino, & K. Bali (Eds.), Findings of the Association for Computational Linguistics: EMNLP 2023 (pp. 1236–1270). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.findings-emnlp.88

Durmus, E., Nguyen, K., Liao, T., Schiefer, N., Askell, A., Bakhtin, A., Chen, C., Hatfield-Dodds, Z., Hernandez, D., Joseph, N., Lovitt, L., McCandlish, S., Sikder, O., Tamkin, A., Thamkul, J., Kaplan, J., Clark, J., & Ganguli, D. (2024). Towards Measuring the Representation of Subjective Global Opinions in Language Models. First Conference on Language Modeling. https://openreview.net/forum?id=zl16jLb91v

Emelin, D., Le Bras, R., Hwang, J. D., Forbes, M., & Choi, Y. (2021). Moral Stories: Situated Reasoning about Norms, Intents, Actions, and their Consequences. In M.-F. Moens, X. Huang, L. Specia, & S. W. Yih (Eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (pp. 698–718). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.emnlp-main.54

Feng, S., Sorensen, T., Liu, Y., Fisher, J., Park, C. Y., Choi, Y., & Tsvetkov, Y. (2024). Modular Pluralism: Pluralistic Alignment via Multi-LLM Collaboration. In Y. Al-Onaizan, M. Bansal, & Y.-N. Chen (Eds.), Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (pp. 4151–4171). Association for Computational Linguistics. https://aclanthology.org/2024.emnlp-main.240

Hayati, S. A., Lee, M., Rajagopal, D., & Kang, D. (2024). How Far Can We Extract Diverse Perspectives from Large Language Models? In Y. Al-Onaizan, M. Bansal, & Y.-N. Chen (Eds.), Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (pp. 5336–5366). Association for Computational Linguistics. https://aclanthology.org/2024.emnlp-main.306

Jang, J., Kim, S., Lin, B. Y., Wang, Y., Hessel, J., Zettlemoyer, L., Hajishirzi, H., Choi, Y., & Ammanabrolu, P. (2023). Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging. ArXiv Preprint, abs/2310.11564. https://arxiv.org/abs/2310.11564

Kirk, H. R., Whitefield, A., Röttger, P., Bean, A. M., Margatina, K., Mosquera, R., Ciro, J. M., Bartolo, M., Williams, A., He, H., Vidgen, B., & Hale, S. A. (2024). The PRISM Alignment Dataset: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models. The Thirty-Eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track. https://openreview.net/forum?id=DFr5hteojx

Lake, T., Choi, E., & Durrett, G. (2024). From Distributional to Overton Pluralism: Investigating Large Language Model Alignment. Pluralistic Alignment Workshop at NeurIPS 2024. https://openreview.net/forum?id=roe8GMahZL

Li, T., Zhang, X., Du, C., Pang, T., Liu, Q., Guo, Q., Shen, C., & Liu, Y. (2024). Your Large Language Model is Secretly a Fairness Proponent and You Should Prompt it Like One. In ArXiv preprint: Vol. abs/2402.12150. https://arxiv.org/abs/2402.12150

Masoud, R. I., Liu, Z., Ferianc, M., Treleaven, P., & Rodrigues, M. (2023). Cultural Alignment in Large Language Models: An Explanatory Analysis Based on Hofstede’s Cultural Dimensions. In ArXiv preprint: Vol. abs/2309.12342. https://arxiv.org/abs/2309.12342

Meincke, L., Mollick, E. R., & Terwiesch, C. (2024). Prompting Diverse Ideas: Increasing AI Idea Variance. In ArXiv preprint: Vol. abs/2402.01727. https://arxiv.org/abs/2402.01727

Moore, J., Deshpande, T., & Yang, D. (2024). Are Large Language Models Consistent over Value-laden Questions? In Y. Al-Onaizan, M. Bansal, & Y.-N. Chen (Eds.), Findings of the Association for Computational Linguistics: EMNLP 2024 (pp. 15185–15221). Association for Computational Linguistics. https://aclanthology.org/2024.findings-emnlp.891

Poddar, S., Wan, Y., Ivison, H., Gupta, A., & Jaques, N. (2024). Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning. The Thirty-Eighth Annual Conference on Neural Information Processing Systems. https://openreview.net/forum?id=gRG6SzbW9p

Rainie, S. C., Kukutai, T., Walter, M., Figueroa-Rodrı́guez, O. L., Walker, J., & Axelsson, P. (2019). Indigenous data sovereignty.

Santurkar, S., Durmus, E., Ladhak, F., Lee, C., Liang, P., & Hashimoto, T. (2023). Whose Opinions Do Language Models Reflect? In A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, & J. Scarlett (Eds.), International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA (Vol. 202, pp. 29971–30004). PMLR. https://proceedings.mlr.press/v202/santurkar23a.html

Sharma, M., Tong, M., Korbak, T., Duvenaud, D., Askell, A., Bowman, S. R., DURMUS, E., Hatfield-Dodds, Z., Johnston, S. R., Kravec, S. M., Maxwell, T., McCandlish, S., Ndousse, K., Rausch, O., Schiefer, N., Yan, D., Zhang, M., & Perez, E. (2024). Towards Understanding Sycophancy in Language Models. The Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=tvhaxkMKAn

Sorensen, T., Jiang, L., Hwang, J. D., Levine, S., Pyatkin, V., West, P., Dziri, N., Lu, X., Rao, K., Bhagavatula, C., Sap, M., Tasioulas, J., & Choi, Y. (2024). Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties. In M. J. Wooldridge, J. G. Dy, & S. Natarajan (Eds.), Thirty-Eighth AAAI Conference on Artificial Intelligence, AAAI 2024, Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence, IAAI 2024, Fourteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2014, February 20-27, 2024, Vancouver, Canada (pp. 19937–19947). AAAI Press. https://doi.org/10.1609/AAAI.V38I18.29970

Sorensen, T., Moore, J., Fisher, J., Gordon, M. L., Mireshghallah, N., Rytting, C. M., Ye, A., Jiang, L., Lu, X., Dziri, N., Althoff, T., & Choi, Y. (2024). Position: A Roadmap to Pluralistic Alignment. ICML. https://openreview.net/forum?id=gQpBnRHwxM

Sorensen, T., Moore, J., Fisher, J., Gordon, M., Mireshghallah, N., Rytting, C. M., Ye, A., Jiang, L., Lu, X., Dziri, N., & others. (2024). A roadmap to pluralistic alignment. ArXiv Preprint, abs/2402.05070. https://arxiv.org/abs/2402.05070

Tao, Y., Viberg, O., Baker, R. S., & Kizilcec, R. F. (2024). Cultural bias and cultural alignment of large language models. PNAS Nexus, 3(9), pgae346.

Zhao, S., Dang, J., & Grover, A. (2023). Group Preference Optimization: Few-Shot Alignment of Large Language Models. R0-FoMo:Robustness of Few-Shot and Zero-Shot Learning in Large Foundation Models. https://openreview.net/forum?id=p6hzUHjQn1

Pluralism

Current Approaches

Defining Pluralism

Methods for Pluralistic Alignment.

Future of Pluralistic Alignment for HCLLMs

Graph View

Table of Contents

Backlinks

Literature NotesLiterature NotesLiterature NotesLiterature NotesLiterature NotesLiterature Notes

Current Approaches

Defining Pluralism

Methods for Pluralistic Alignment.

Future of Pluralistic Alignment for HCLLMs

Graph View

Table of Contents

Backlinks