While we have discussed methods for aligning models to human preferences and values, the important, unaddressed question is whose values. Not all humans share the same set of values (Durmus et al., 2024ReferenceDurmus, E., Nguyen, K., Liao, T., Schiefer, N., Askell, A., Bakhtin, A., Chen, C., Hatfield-Dodds, Z., Hernandez, D., Joseph, N., Lovitt, L., McCandlish, S., Sikder, O., Tamkin, A., Thamkul, J., Kaplan, J., Clark, J., & Ganguli, D. (2024). Towards Measuring the Representation of Subjective Global Opinions in Language Models. First Conference on Language Modeling. https://openreview.net/forum?id=zl16jLb91v), and what is permissible to some may be irrelevant or harmful to others. Rather than assuming values are monolithic and aligning models to a “Silicon Valley default” set of preferences, pluralistic alignment is an approach of aligning language models to simultaneously serve diverse preferences. As Sorensen et al. (2024)ReferenceSorensen, T., Moore, J., Fisher, J., Gordon, M. L., Mireshghallah, N., Rytting, C. M., Ye, A., Jiang, L., Lu, X., Dziri, N., Althoff, T., & Choi, Y. (2024). Position: A Roadmap to Pluralistic Alignment. ICML. https://openreview.net/forum?id=gQpBnRHwxM define, pluralistic alignment is the process of creating language models capable of representing a diverse set of human values and perspectives.

Current Approaches

Defining Pluralism

Prior work has proposed three definitions as to how pluralism can be embedded into models (Sorensen, Moore, Fisher, Gordon, Mireshghallah, Rytting, Ye, Jiang, Lu, Dziri, & others, 2024ReferenceSorensen, T., Moore, J., Fisher, J., Gordon, M., Mireshghallah, N., Rytting, C. M., Ye, A., Jiang, L., Lu, X., Dziri, N., & others. (2024). A roadmap to pluralistic alignment. ArXiv Preprint, abs/2402.05070. https://arxiv.org/abs/2402.05070):

  1. Overton Pluralism: An Overton pluralistc model generates all reasonable perspectives for a given subject. The model draws on the concept of the “Overton window,” which encompasses the range of ideas and perspectives that are considered acceptable within the mainstream. Scholars such as Lake et al. (2024)ReferenceLake, T., Choi, E., & Durrett, G. (2024). From Distributional to Overton Pluralism: Investigating Large Language Model Alignment. Pluralistic Alignment Workshop at NeurIPS 2024. https://openreview.net/forum?id=roe8GMahZL have posited that Overton pluralism is compatible with the status quo alignment process, as longer conversational outputs typically share a few varied perspectives.

  2. Steerable Pluralism: Steerable pluralism advocates for using situational context to steer the model towards a particular perspective. One popular use case of steerable pluralism is improving the cultural alignment of language models to particular groups and cultures (Masoud et al., 2023ReferenceMasoud, R. I., Liu, Z., Ferianc, M., Treleaven, P., & Rodrigues, M. (2023). Cultural Alignment in Large Language Models: An Explanatory Analysis Based on Hofstede’s Cultural Dimensions. In ArXiv preprint: Vol. abs/2309.12342. https://arxiv.org/abs/2309.12342; Tao et al., 2024ReferenceTao, Y., Viberg, O., Baker, R. S., & Kizilcec, R. F. (2024). Cultural bias and cultural alignment of large language models. PNAS Nexus, 3(9), pgae346.). Personalization also falls under steerable pluralism as the model is steered to the values of a particular user.

  3. Distributional Pluralism: Finally, in distributional pluralism, developers steer models to produce responses that roughly correspond with a population distribution. That is, if 70% of the population holds a particular opinion, the model will generate that opinion 70% of the time. As Lake et al. (2024)ReferenceLake, T., Choi, E., & Durrett, G. (2024). From Distributional to Overton Pluralism: Investigating Large Language Model Alignment. Pluralistic Alignment Workshop at NeurIPS 2024. https://openreview.net/forum?id=roe8GMahZL note, base models are somewhat distributionally aligned to the opinions present in pre-training data.

Methods for Pluralistic Alignment.

Several works have proposed different methods for alignment, depending on what type of pluralism they seek to achieve. For Overton pluralism, researchers have explored prompting techniques (Meincke et al., 2024ReferenceMeincke, L., Mollick, E. R., & Terwiesch, C. (2024). Prompting Diverse Ideas: Increasing AI Idea Variance. In ArXiv preprint: Vol. abs/2402.01727. https://arxiv.org/abs/2402.01727) and approaches that generate multiple perspectives before synthesizing them (Feng et al., 2024ReferenceFeng, S., Sorensen, T., Liu, Y., Fisher, J., Park, C. Y., Choi, Y., & Tsvetkov, Y. (2024). Modular Pluralism: Pluralistic Alignment via Multi-LLM Collaboration. In Y. Al-Onaizan, M. Bansal, & Y.-N. Chen (Eds.), Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (pp. 4151–4171). Association for Computational Linguistics. https://aclanthology.org/2024.emnlp-main.240; Hayati et al., 2024ReferenceHayati, S. A., Lee, M., Rajagopal, D., & Kang, D. (2024). How Far Can We Extract Diverse Perspectives from Large Language Models? In Y. Al-Onaizan, M. Bansal, & Y.-N. Chen (Eds.), Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (pp. 5336–5366). Association for Computational Linguistics. https://aclanthology.org/2024.emnlp-main.306; Li et al., 2024ReferenceLi, T., Zhang, X., Du, C., Pang, T., Liu, Q., Guo, Q., Shen, C., & Liu, Y. (2024). Your Large Language Model is Secretly a Fairness Proponent and You Should Prompt it Like One. In ArXiv preprint: Vol. abs/2402.12150. https://arxiv.org/abs/2402.12150). For steerable pluralism, personalized reward models (Chen et al., 2024ReferenceChen, D., Chen, Y., Rege, A., & Vinayak, R. K. (2024). PAL: Pluralistic Alignment Framework for Learning from Heterogeneous Preferences. NeurIPS 2024 Workshop on Fine-Tuning in Modern Machine Learning: Principles and Scalability. https://openreview.net/forum?id=wVg2kVQOzq; Jang et al., 2023ReferenceJang, J., Kim, S., Lin, B. Y., Wang, Y., Hessel, J., Zettlemoyer, L., Hajishirzi, H., Choi, Y., & Ammanabrolu, P. (2023). Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging. ArXiv Preprint, abs/2310.11564. https://arxiv.org/abs/2310.11564; Poddar et al., 2024ReferencePoddar, S., Wan, Y., Ivison, H., Gupta, A., & Jaques, N. (2024). Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning. The Thirty-Eighth Annual Conference on Neural Information Processing Systems. https://openreview.net/forum?id=gRG6SzbW9p) and optimization techniques (e.g., Group Preference Optimization (Zhao et al., 2023ReferenceZhao, S., Dang, J., & Grover, A. (2023). Group Preference Optimization: Few-Shot Alignment of Large Language Models. R0-FoMo:Robustness of Few-Shot and Zero-Shot Learning in Large Foundation Models. https://openreview.net/forum?id=p6hzUHjQn1)) enable alignment to specific user or group preferences with minimal context. Other common implementation methods include targeted prompting (AlKhamissi et al., 2024ReferenceAlKhamissi, B., ElNokrashy, M., Alkhamissi, M., & Diab, M. (2024). Investigating Cultural Alignment of Large Language Models. In L.-W. Ku, A. Martins, & V. Srikumar (Eds.), Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 12404–12422). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.acl-long.671) and generate-then-filter approaches (Feng et al., 2024ReferenceFeng, S., Sorensen, T., Liu, Y., Fisher, J., Park, C. Y., Choi, Y., & Tsvetkov, Y. (2024). Modular Pluralism: Pluralistic Alignment via Multi-LLM Collaboration. In Y. Al-Onaizan, M. Bansal, & Y.-N. Chen (Eds.), Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (pp. 4151–4171). Association for Computational Linguistics. https://aclanthology.org/2024.emnlp-main.240). Finally, for distributional pluralism, researchers generate diverse perspectives and filter based on population distributions (Feng et al., 2024ReferenceFeng, S., Sorensen, T., Liu, Y., Fisher, J., Park, C. Y., Choi, Y., & Tsvetkov, Y. (2024). Modular Pluralism: Pluralistic Alignment via Multi-LLM Collaboration. In Y. Al-Onaizan, M. Bansal, & Y.-N. Chen (Eds.), Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (pp. 4151–4171). Association for Computational Linguistics. https://aclanthology.org/2024.emnlp-main.240), with surveys serving as key measurement tools since they enable matching LLM probability distributions to actual population responses. OpinionsQA (Santurkar et al., 2023ReferenceSanturkar, S., Durmus, E., Ladhak, F., Lee, C., Liang, P., & Hashimoto, T. (2023). Whose Opinions Do Language Models Reflect? In A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, & J. Scarlett (Eds.), International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA (Vol. 202, pp. 29971–30004). PMLR. https://proceedings.mlr.press/v202/santurkar23a.html) and GlobalOpinionsQA (Durmus et al., 2024ReferenceDurmus, E., Nguyen, K., Liao, T., Schiefer, N., Askell, A., Bakhtin, A., Chen, C., Hatfield-Dodds, Z., Hernandez, D., Joseph, N., Lovitt, L., McCandlish, S., Sikder, O., Tamkin, A., Thamkul, J., Kaplan, J., Clark, J., & Ganguli, D. (2024). Towards Measuring the Representation of Subjective Global Opinions in Language Models. First Conference on Language Modeling. https://openreview.net/forum?id=zl16jLb91v) are two widely-used survey benchmarks for evaluating distributional alignment.

In tandem, a data-centric approach to pluralistic alignment has focused on collecting datasets that represent a diversity of values and perspectives. Chatbot Arena (Chiang et al., 2024ReferenceChiang, W.-L., Zheng, L., Sheng, Y., Angelopoulos, A. N., Li, T., Li, D., Zhu, B., Zhang, H., Jordan, M. I., Gonzalez, J. E., & Stoica, I. (2024). Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference. ICML. https://openreview.net/forum?id=3MW8GKNyzI) and PRISM (Kirk et al., 2024ReferenceKirk, H. R., Whitefield, A., Röttger, P., Bean, A. M., Margatina, K., Mosquera, R., Ciro, J. M., Bartolo, M., Williams, A., He, H., Vidgen, B., & Hale, S. A. (2024). The PRISM Alignment Dataset: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models. The Thirty-Eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track. https://openreview.net/forum?id=DFr5hteojx) are two such collections of preference data that retain user labels. The PERSONA dataset attempts to simulate 1,500 users with synthetic personas for the purpose of studying pluralistic alignment, providing synthetic prompts and feedback pairs (Castricato et al., 2025ReferenceCastricato, L., Lile, N., Rafailov, R., Fränken, J.-P., & Finn, C. (2025). Persona: A reproducible testbed for pluralistic alignment. Proceedings of the 31st International Conference on Computational Linguistics, 11348–11368.). The ValuePrism dataset (Sorensen, Jiang, Hwang, Levine, Pyatkin, West, Dziri, Lu, Rao, Bhagavatula, Sap, et al., 2024ReferenceSorensen, T., Jiang, L., Hwang, J. D., Levine, S., Pyatkin, V., West, P., Dziri, N., Lu, X., Rao, K., Bhagavatula, C., Sap, M., Tasioulas, J., & Choi, Y. (2024). Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties. In M. J. Wooldridge, J. G. Dy, & S. Natarajan (Eds.), Thirty-Eighth AAAI Conference on Artificial Intelligence, AAAI 2024, Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence, IAAI 2024, Fourteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2014, February 20-27, 2024, Vancouver, Canada (pp. 19937–19947). AAAI Press. https://doi.org/10.1609/AAAI.V38I18.29970) is a collection of values, rights, and duties applied to various scenarios. They also release the KALEIDO model for measuring how certain statements agree or disagree with particular values. ValueConsistency (Moore et al., 2024ReferenceMoore, J., Deshpande, T., & Yang, D. (2024). Are Large Language Models Consistent over Value-laden Questions? In Y. Al-Onaizan, M. Bansal, & Y.-N. Chen (Eds.), Findings of the Association for Computational Linguistics: EMNLP 2024 (pp. 15185–15221). Association for Computational Linguistics. https://aclanthology.org/2024.findings-emnlp.891) is a dataset of 300 controversial topics in four languages with corresponding controversial questions. Moral Stories (Emelin et al., 2021ReferenceEmelin, D., Le Bras, R., Hwang, J. D., Forbes, M., & Choi, Y. (2021). Moral Stories: Situated Reasoning about Norms, Intents, Actions, and their Consequences. In M.-F. Moens, X. Huang, L. Specia, & S. W. Yih (Eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (pp. 698–718). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.emnlp-main.54) is a dataset of narratives with moral dilemmas with multiple endings. The stories were written from annotators of various demographic backgrounds and express different social and moral norms.

Future of Pluralistic Alignment for HCLLMs

Achieving effective pluralistic alignment faces several challenges. First, figuring out how to appropriately and effectively model diverse perspectives is a key precursor to effective pluralistic alignment. Existing literature has shown that models tend to emphasize stereotypical representations when asked to simulate perspectives from different demographic groups (Cheng, Durmus, et al., 2023; Cheng, Piccardi, et al., 2023; Deshpande et al., 2023)ReferenceCheng, M., Durmus, E., & Jurafsky, D. (2023). Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models. In A. Rogers, J. Boyd-Graber, & N. Okazaki (Eds.), Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 1504–1532). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.acl-long.84ReferenceCheng, M., Piccardi, T., & Yang, D. (2023). CoMPosT: Characterizing and Evaluating Caricature in LLM Simulations. In H. Bouamor, J. Pino, & K. Bali (Eds.), Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (pp. 10853–10875). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.emnlp-main.669ReferenceDeshpande, A., Murahari, V., Rajpurohit, T., Kalyan, A., & Narasimhan, K. (2023). Toxicity in chatgpt: Analyzing persona-assigned language models. In H. Bouamor, J. Pino, & K. Bali (Eds.), Findings of the Association for Computational Linguistics: EMNLP 2023 (pp. 1236–1270). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.findings-emnlp.88, meaning that approaches that generate and synthesize varied perspectives require careful design to avoid misrepresenting communities. While prompting LLMs with demographic features can improve alignment with group opinions (AlKhamissi et al., 2024ReferenceAlKhamissi, B., ElNokrashy, M., Alkhamissi, M., & Diab, M. (2024). Investigating Cultural Alignment of Large Language Models. In L.-W. Ku, A. Martins, & V. Srikumar (Eds.), Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 12404–12422). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.acl-long.671), the choice of base model sometimes has a more significant effect than the sociodemographic features themselves (Beck et al., 2024ReferenceBeck, T., Schuff, H., Lauscher, A., & Gurevych, I. (2024). Sensitivity, Performance, Robustness: Deconstructing the Effect of Sociodemographic Prompting. In Y. Graham & M. Purver (Eds.), Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 2589–2615). Association for Computational Linguistics. https://aclanthology.org/2024.eacl-long.159/). Effective perspective modeling remains an open problem.

A second challenge involves understanding and balancing the societal impacts of pluralistic models on public opinion. It is well-established that aligning LLMs to human preferences can exacerbate model sycophancy (Sharma et al., 2024ReferenceSharma, M., Tong, M., Korbak, T., Duvenaud, D., Askell, A., Bowman, S. R., DURMUS, E., Hatfield-Dodds, Z., Johnston, S. R., Kravec, S. M., Maxwell, T., McCandlish, S., Ndousse, K., Rausch, O., Schiefer, N., Yan, D., Zhang, M., & Perez, E. (2024). Towards Understanding Sycophancy in Language Models. The Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=tvhaxkMKAn), where models agree with users regardless of validity. Personalized models developed through steerable pluralism risk creating personalized echo chambers that reinforce rather than challenge user beliefs. Already there are concerns about “echo chambers” in media consumption, which may only be further augmented with LLMs (Barberá, 2020ReferenceBarberá, P. (2020). Social media, echo chambers, and political polarization. Social Media and Democracy: The State of the Field, Prospects for Reform, 34–55.; Cinelli et al., 2021ReferenceCinelli, M., De Francisci Morales, G., Galeazzi, A., Quattrociocchi, W., & Starnini, M. (2021). The echo chamber effect on social media. Proceedings of the National Academy of Sciences, 118(9).). Even though pluralistic alignment aims to broaden the viewpoints that models may espouse, these methods still define what perspectives are considered “in bounds” of acceptability or follow ostensible public opinion distributions. In doing so, these methods may inadvertently silence marginalized viewpoints that fall outside mainstream acceptability. Future work should empirically investigate how different pluralistic alignment strategies affect opinion formation and marginalization in practice, moving beyond theoretical concerns to measurable impacts on diverse user populations.

Finally, pluralistic alignment raises questions of data sovereignty and consent. Many communities, particularly indigenous groups, do not want their data and opinions collected for training AI systems (Rainie et al., 2019ReferenceRainie, S. C., Kukutai, T., Walter, M., Figueroa-Rodrı́guez, O. L., Walker, J., & Axelsson, P. (2019). Indigenous data sovereignty.). Even when motivated by the goal of broad representation, developers must carefully consider whether all communities want their perspectives embedded in AI systems, respecting the right of groups to opt out of technological representation entirely.

AlKhamissi, B., ElNokrashy, M., Alkhamissi, M., & Diab, M. (2024). Investigating Cultural Alignment of Large Language Models. In L.-W. Ku, A. Martins, & V. Srikumar (Eds.), Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 12404–12422). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.acl-long.671
Barberá, P. (2020). Social media, echo chambers, and political polarization. Social Media and Democracy: The State of the Field, Prospects for Reform, 34–55.
Beck, T., Schuff, H., Lauscher, A., & Gurevych, I. (2024). Sensitivity, Performance, Robustness: Deconstructing the Effect of Sociodemographic Prompting. In Y. Graham & M. Purver (Eds.), Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 2589–2615). Association for Computational Linguistics. https://aclanthology.org/2024.eacl-long.159/
Castricato, L., Lile, N., Rafailov, R., Fränken, J.-P., & Finn, C. (2025). Persona: A reproducible testbed for pluralistic alignment. Proceedings of the 31st International Conference on Computational Linguistics, 11348–11368.
Chen, D., Chen, Y., Rege, A., & Vinayak, R. K. (2024). PAL: Pluralistic Alignment Framework for Learning from Heterogeneous Preferences. NeurIPS 2024 Workshop on Fine-Tuning in Modern Machine Learning: Principles and Scalability. https://openreview.net/forum?id=wVg2kVQOzq
Cheng, M., Durmus, E., & Jurafsky, D. (2023). Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models. In A. Rogers, J. Boyd-Graber, & N. Okazaki (Eds.), Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 1504–1532). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.acl-long.84
Cheng, M., Piccardi, T., & Yang, D. (2023). CoMPosT: Characterizing and Evaluating Caricature in LLM Simulations. In H. Bouamor, J. Pino, & K. Bali (Eds.), Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (pp. 10853–10875). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.emnlp-main.669
Chiang, W.-L., Zheng, L., Sheng, Y., Angelopoulos, A. N., Li, T., Li, D., Zhu, B., Zhang, H., Jordan, M. I., Gonzalez, J. E., & Stoica, I. (2024). Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference. ICML. https://openreview.net/forum?id=3MW8GKNyzI
Cinelli, M., De Francisci Morales, G., Galeazzi, A., Quattrociocchi, W., & Starnini, M. (2021). The echo chamber effect on social media. Proceedings of the National Academy of Sciences, 118(9).
Deshpande, A., Murahari, V., Rajpurohit, T., Kalyan, A., & Narasimhan, K. (2023). Toxicity in chatgpt: Analyzing persona-assigned language models. In H. Bouamor, J. Pino, & K. Bali (Eds.), Findings of the Association for Computational Linguistics: EMNLP 2023 (pp. 1236–1270). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.findings-emnlp.88
Durmus, E., Nguyen, K., Liao, T., Schiefer, N., Askell, A., Bakhtin, A., Chen, C., Hatfield-Dodds, Z., Hernandez, D., Joseph, N., Lovitt, L., McCandlish, S., Sikder, O., Tamkin, A., Thamkul, J., Kaplan, J., Clark, J., & Ganguli, D. (2024). Towards Measuring the Representation of Subjective Global Opinions in Language Models. First Conference on Language Modeling. https://openreview.net/forum?id=zl16jLb91v
Emelin, D., Le Bras, R., Hwang, J. D., Forbes, M., & Choi, Y. (2021). Moral Stories: Situated Reasoning about Norms, Intents, Actions, and their Consequences. In M.-F. Moens, X. Huang, L. Specia, & S. W. Yih (Eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (pp. 698–718). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.emnlp-main.54
Feng, S., Sorensen, T., Liu, Y., Fisher, J., Park, C. Y., Choi, Y., & Tsvetkov, Y. (2024). Modular Pluralism: Pluralistic Alignment via Multi-LLM Collaboration. In Y. Al-Onaizan, M. Bansal, & Y.-N. Chen (Eds.), Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (pp. 4151–4171). Association for Computational Linguistics. https://aclanthology.org/2024.emnlp-main.240
Hayati, S. A., Lee, M., Rajagopal, D., & Kang, D. (2024). How Far Can We Extract Diverse Perspectives from Large Language Models? In Y. Al-Onaizan, M. Bansal, & Y.-N. Chen (Eds.), Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (pp. 5336–5366). Association for Computational Linguistics. https://aclanthology.org/2024.emnlp-main.306
Jang, J., Kim, S., Lin, B. Y., Wang, Y., Hessel, J., Zettlemoyer, L., Hajishirzi, H., Choi, Y., & Ammanabrolu, P. (2023). Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging. ArXiv Preprint, abs/2310.11564. https://arxiv.org/abs/2310.11564
Kirk, H. R., Whitefield, A., Röttger, P., Bean, A. M., Margatina, K., Mosquera, R., Ciro, J. M., Bartolo, M., Williams, A., He, H., Vidgen, B., & Hale, S. A. (2024). The PRISM Alignment Dataset: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models. The Thirty-Eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track. https://openreview.net/forum?id=DFr5hteojx
Lake, T., Choi, E., & Durrett, G. (2024). From Distributional to Overton Pluralism: Investigating Large Language Model Alignment. Pluralistic Alignment Workshop at NeurIPS 2024. https://openreview.net/forum?id=roe8GMahZL
Li, T., Zhang, X., Du, C., Pang, T., Liu, Q., Guo, Q., Shen, C., & Liu, Y. (2024). Your Large Language Model is Secretly a Fairness Proponent and You Should Prompt it Like One. In ArXiv preprint: Vol. abs/2402.12150. https://arxiv.org/abs/2402.12150
Masoud, R. I., Liu, Z., Ferianc, M., Treleaven, P., & Rodrigues, M. (2023). Cultural Alignment in Large Language Models: An Explanatory Analysis Based on Hofstede’s Cultural Dimensions. In ArXiv preprint: Vol. abs/2309.12342. https://arxiv.org/abs/2309.12342
Meincke, L., Mollick, E. R., & Terwiesch, C. (2024). Prompting Diverse Ideas: Increasing AI Idea Variance. In ArXiv preprint: Vol. abs/2402.01727. https://arxiv.org/abs/2402.01727
Moore, J., Deshpande, T., & Yang, D. (2024). Are Large Language Models Consistent over Value-laden Questions? In Y. Al-Onaizan, M. Bansal, & Y.-N. Chen (Eds.), Findings of the Association for Computational Linguistics: EMNLP 2024 (pp. 15185–15221). Association for Computational Linguistics. https://aclanthology.org/2024.findings-emnlp.891
Poddar, S., Wan, Y., Ivison, H., Gupta, A., & Jaques, N. (2024). Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning. The Thirty-Eighth Annual Conference on Neural Information Processing Systems. https://openreview.net/forum?id=gRG6SzbW9p
Rainie, S. C., Kukutai, T., Walter, M., Figueroa-Rodrı́guez, O. L., Walker, J., & Axelsson, P. (2019). Indigenous data sovereignty.
Santurkar, S., Durmus, E., Ladhak, F., Lee, C., Liang, P., & Hashimoto, T. (2023). Whose Opinions Do Language Models Reflect? In A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, & J. Scarlett (Eds.), International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA (Vol. 202, pp. 29971–30004). PMLR. https://proceedings.mlr.press/v202/santurkar23a.html
Sharma, M., Tong, M., Korbak, T., Duvenaud, D., Askell, A., Bowman, S. R., DURMUS, E., Hatfield-Dodds, Z., Johnston, S. R., Kravec, S. M., Maxwell, T., McCandlish, S., Ndousse, K., Rausch, O., Schiefer, N., Yan, D., Zhang, M., & Perez, E. (2024). Towards Understanding Sycophancy in Language Models. The Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=tvhaxkMKAn
Sorensen, T., Jiang, L., Hwang, J. D., Levine, S., Pyatkin, V., West, P., Dziri, N., Lu, X., Rao, K., Bhagavatula, C., Sap, M., Tasioulas, J., & Choi, Y. (2024). Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties. In M. J. Wooldridge, J. G. Dy, & S. Natarajan (Eds.), Thirty-Eighth AAAI Conference on Artificial Intelligence, AAAI 2024, Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence, IAAI 2024, Fourteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2014, February 20-27, 2024, Vancouver, Canada (pp. 19937–19947). AAAI Press. https://doi.org/10.1609/AAAI.V38I18.29970
Sorensen, T., Moore, J., Fisher, J., Gordon, M. L., Mireshghallah, N., Rytting, C. M., Ye, A., Jiang, L., Lu, X., Dziri, N., Althoff, T., & Choi, Y. (2024). Position: A Roadmap to Pluralistic Alignment. ICML. https://openreview.net/forum?id=gQpBnRHwxM
Sorensen, T., Moore, J., Fisher, J., Gordon, M., Mireshghallah, N., Rytting, C. M., Ye, A., Jiang, L., Lu, X., Dziri, N., & others. (2024). A roadmap to pluralistic alignment. ArXiv Preprint, abs/2402.05070. https://arxiv.org/abs/2402.05070
Tao, Y., Viberg, O., Baker, R. S., & Kizilcec, R. F. (2024). Cultural bias and cultural alignment of large language models. PNAS Nexus, 3(9), pgae346.
Zhao, S., Dang, J., & Grover, A. (2023). Group Preference Optimization: Few-Shot Alignment of Large Language Models. R0-FoMo:Robustness of Few-Shot and Zero-Shot Learning in Large Foundation Models. https://openreview.net/forum?id=p6hzUHjQn1