Data Representation, Bias and Ethics

The story of LLM training data is a story about whose voices become computationally legible and whose are overwritten or erased. In Data Provenance, we considered how the story of data is shaped by its sources, filtering decisions, annotation pipelines, and synthetic generation practices. Imbalances or biases in the provenance of LLM pre- and post-training data can exacerbate representational, allocational, and quality-of-service harms for those who use these models. Representational harms include stereotyping, denigration, and misrecognition, when LLMs perpetuate and amplify distorted and harmful portrayals of personal identities and social groups (Blodgett, 2021). Allocational harms arise when LLMs reinforce or amplify inequality in the distribution of opportunities and resources (Barocas et al., 2017; Eubanks, 2018). Quality-of-service harms involve performance disparities across different user groups, which may cascade into both representational and allocational harms. For further discussion on how to define and measure these harms, see Bias and Fairness Evaluation.

Sociotechnical harms become harder to diagnose when data provenance is incomplete. Without visibility into the linguistic, cultural, and geographic origins of the data, as well as the filtering and curation pipelines, researchers cannot identify: (1) why the model stereotypes certain voices, (2) why specific groups are absent from generated outputs, or (3) how certain narrative tropes became dominant. We will briefly discuss the relationship between data provenance and each of these harm outcome categories, as well as data-based mitigation strategies.

Quality-of-Service Harms

Quality-of-service harms are disparities in model utility for users from different sociodemographic groups (Shelby et al., 2023). These disparities are often rooted in the composition and curation of data as discussed in Data Provenance. Pre-training data scraping, quality filtering, instruction-tuning templates, and alignment data collection tend to over-represent native English speakers from wealthy, Western nations and under-represent the language and perspectives of marginalized communities. As a result, LLMs introduce quality-of-service harms for individuals from these communities (Shah et al., 2020).

LLM performance degrades for speakers of non-standard language varieties or dialects on a wide range of tasks (Joshi et al., 2025; Kantharuban et al., 2023), from text classification (Lwowski & Rios, 2021) and machine translation (Ahia et al., 2023) to question answering (Fleisig et al., 2024; Ziems et al., 2023) and conversational AI (Artemova et al., 2024). This inequitable distribution of utility to LLM users can result in allocational harms like inequitable wages and quality of life, especially as LLMs are integrated into the workplace (Shao et al., 2025) and become general purpose technologies (Eloundou et al., 2024). Quality-of-service bias also contributes to the representational harm of erasure, and may derive in part from representational biases.

Representational Harms

LLMs demonstrate representational harms when they propagate negative or skewed representations of social groups, including cultural misrepresentation, stereotypes, essentialist language, and erasure (Chien & Danks, 2024). Representational harms can derive from pre-training data (Chu et al., 2024), not only from its explicitly harmful, stereotypical, and toxic language (Luccioni & Viviano, 2021), but also from implicitly biased language (Caliskan et al., 2016; Navigli et al., 2023), framing effects (Feng et al., 2023), and the sparsity of socioculturally representative data (Naous et al., 2024). Data quality filters exacerbate racial and linguistic biases that skew pre-training data away from in-group perspectives in favor of unrepresentative and misinformed out-group perspectives (Wang et al., 2025). Post-training data can further induce mode-collapse, effectively flattening their representational distributions to portray groups one-dimensionally (Bisbee et al., 2024; Durmus et al., 2024; Röttger et al., 2024). This kind of distributional flattening is a form of essentializing that is particularly harmful for groups historically portrayed as one-dimensional (Wang et al., 2025).

Unsurprisingly, LLMs are known to generate harmful stereotypes in question-answering (Naous et al., 2024), machine translation (Ghosh & Caliskan, 2023), and open-ended generation (Dhamala et al., 2021). These issues are only exacerbated when the prompts are written in non-standard dialects, which may trigger demeaning or condescending responses from models (Fleisig et al., 2024). LLM-simulated personas also collapse into stereotypical caricatures (Cheng, Durmus, et al., 2023; Cheng, Piccardi, et al., 2023; Gupta et al., 2023). These simulations systematically misrepresent, flatten, and essentialize the perspectives of underrepresented groups based on protected characteristics like age, gender, and disability (Wang et al., 2025).

Allocational harms

Allocational harms are disparities in individuals’ access to material resources like jobs, housing, credit, healthcare, childcare, education, and transportation (Cyberey et al., 2025). When LLMs are embedded in decision-making systems, they can introduce, amplify, or otherwise reinforce allocational disparities, in part as a result of representational biases in the training data (Chien & Danks, 2024; Sen et al., 2025). In pre-training, skewed representations can lead models to encode assumptions about who is qualified, creditworthy, employable, or deserving of services (Mehrabi et al., 2021). During post-training, alignment data privileges the annotators’ norms of professionalism, risk, and appropriate behavior (Conitzer et al., 2024), which appear in downstream allocational biases as follows.

LLMs used in hiring decisions can be more likely to recommend less-prestigious jobs to speakers of marginalized dialects (Hofmann et al., 2024). In content moderation, LLMs are prejudiced against speakers of African American English (Sap et al., 2019). In automated exam scoring, LLMs show disparate performance for students with backgrounds not represented in training data (Schaller et al., 2024). And more broadly, LLM decisions are biased against underrepresented groups across domains such as business (i.e., funding a startup), finance (i.e., approving a credit card), relationships (i.e., resolving conflicts), law (i.e., issuing a passport), science (i.e., approving a research study), and the arts (i.e., awarding a filmmaking prize) (Levy et al., 2024; Tamkin et al., 2023).

Mitigating Harms

Mitigating sociotechnical harms requires interventions across the data pipeline. The first step is to establish transparent data provenance through documentation practices like Datasheets for Datasets and Data Statements (Bender & Friedman, 2018; Gebru et al., 2021). By explicitly recording the linguistic, demographic, and geographic composition of datasets, as well as filtering and annotation decisions, making it easier to diagnose representational gaps and biases. Model cards and system cards further extend this transparency to downstream users by documenting intended use, performance disparities, and known limitations (Mitchell et al., 2019). Recent work argues that provenance-aware documentation should include not only source descriptions but also transformation histories, like filtering, deduplication, and synthetic augmentation (Scheuerman et al., 2021).

With transparent data provenance, a second mitigation step is to involve stakeholders in the process of data creation and diversification, using participatory methods (Vaughn & Jacquez, 2020), following Participatory Approaches. With community-level organization, it is possible to develop rich data resources for low-resource languages and underrepresented communities (Heidt, 2025; Orife et al., 2020). However, diversification alone may prove insufficient without governance structures that prevent extractive data practices and ensure ongoing community oversight (Benjamin, 2023).

A third mitigation approach is to collect learnable data from user interactions with LLMs at the individual level. Personalized alignment methods may be considered in which individual preference data is collected from user interactions and used to shape subsequent model behavior through prompt-based (Hebert et al., 2024), retrieval-based (Salemi et al., 2024), or alignment-based methods (Ryan et al., 2025). For more discussion on this direction, see Personalization.

Ahia, O., Kumar, S., Gonen, H., Kasai, J., Mortensen, D. R., Smith, N. A., & Tsvetkov, Y. (2023). Do All Languages Cost the Same? Tokenization in the Era of Commercial Language Models. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 9904–9923.

Artemova, E., Blaschke, V., & Plank, B. (2024). Exploring the Robustness of Task-oriented Dialogue Systems for Colloquial German Varieties. Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), 445–468.

Barocas, S., Crawford, K., Shapiro, A., & Wallach, H. (2017). The problem with bias: Allocative versus representational harms in machine learning. 9th Annual Conference of the Special Interest Group for Computing, Information and Society, 1.

Bender, E. M., & Friedman, B. (2018). Data statements for natural language processing: Toward mitigating system bias and enabling better science. Transactions of the Association for Computational Linguistics, 6, 587–604.

Benjamin, R. (2023). Race after technology. In Social Theory Re-Wired (pp. 405–415). Routledge.

Bisbee, J., Clinton, J. D., Dorff, C., Kenkel, B., & Larson, J. M. (2024). Synthetic replacements for human survey data? The perils of large language models. Political Analysis, 32(4), 401–416.

Blodgett, S. L. (2021). Sociolinguistically Driven Approaches for Just Natural Language Processing [Doctoral Dissertation, University of Massachusetts Amherst]. https://doi.org/10.7275/20410631

Caliskan, A., Bryson, J. J., & Narayanan, A. (2016). Semantics derived automatically from language corpora contain human-like biases. Science, 356, 183–186. https://api.semanticscholar.org/CorpusID:23163324

Cheng, M., Durmus, E., & Jurafsky, D. (2023). Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models. In A. Rogers, J. Boyd-Graber, & N. Okazaki (Eds.), Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 1504–1532). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.acl-long.84

Cheng, M., Piccardi, T., & Yang, D. (2023). CoMPosT: Characterizing and Evaluating Caricature in LLM Simulations. In H. Bouamor, J. Pino, & K. Bali (Eds.), Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (pp. 10853–10875). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.emnlp-main.669

Chien, J., & Danks, D. (2024). Beyond behaviorist representational harms: A plan for measurement and mitigation. Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, 933–946.

Chu, Z., Wang, Z., & Zhang, W. (2024). Fairness in Large Language Models: A Taxonomic Survey. SIGKDD Explor. Newsl., 26(1), 34–48. https://doi.org/10.1145/3682112.3682117

Conitzer, V., Freedman, R., Heitzig, J., Holliday, W. H., Jacobs, B. M., Lambert, N., Mossé, M., Pacuit, E., Russell, S., Schoelkopf, H., & others. (2024). Position: Social Choice Should Guide AI Alignment in Dealing with Diverse Human Feedback. Forty-First International Conference on Machine Learning.

Cyberey, H., Ji, Y., & Evans, D. K. (2025). Do Prevalent Bias Metrics Capture Allocational Harms from LLMs? The Sixth Workshop on Insights from Negative Results in NLP, 34–45.

Dhamala, J., Sun, T., Kumar, V., Krishna, S., Pruksachatkun, Y., Chang, K.-W., & Gupta, R. (2021). Bold: Dataset and metrics for measuring biases in open-ended language generation. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 862–872.

Durmus, E., Nguyen, K., Liao, T., Schiefer, N., Askell, A., Bakhtin, A., Chen, C., Hatfield-Dodds, Z., Hernandez, D., Joseph, N., Lovitt, L., McCandlish, S., Sikder, O., Tamkin, A., Thamkul, J., Kaplan, J., Clark, J., & Ganguli, D. (2024). Towards Measuring the Representation of Subjective Global Opinions in Language Models. First Conference on Language Modeling. https://openreview.net/forum?id=zl16jLb91v

Eloundou, T., Manning, S., Mishkin, P., & Rock, D. (2024). GPTs are GPTs: Labor market impact potential of LLMs. Science, 384(6702), 1306–1308.

Eubanks, V. (2018). Automating inequality: How high-tech tools profile, police, and punish the poor. St. Martin’s Press.

Feng, S., Park, C. Y., Liu, Y., & Tsvetkov, Y. (2023). From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models. In A. Rogers, J. Boyd-Graber, & N. Okazaki (Eds.), Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 11737–11762). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.acl-long.656

Fleisig, E., Smith, G., Bossi, M., Rustagi, I., Yin, X., & Klein, D. (2024). Linguistic Bias in ChatGPT: Language Models Reinforce Dialect Discrimination. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 13541–13564.

Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Iii, H. D., & Crawford, K. (2021). Datasheets for datasets. Communications of the ACM, 64(12), 86–92.

Ghosh, S., & Caliskan, A. (2023). Chatgpt perpetuates gender bias in machine translation and ignores non-gendered pronouns: Findings across bengali and five other low-resource languages. Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, 901–912.

Gupta, V., Venkit, P. N., Laurençon, H., Wilson, S., & Passonneau, R. J. (2023). CALM: A Multi-task Benchmark for Comprehensive Assessment of Language Model Bias. ArXiv Preprint, abs/2308.12539. https://arxiv.org/abs/2308.12539

Hebert, L., Sayana, K., Jash, A., Karatzoglou, A., Sodhi, S., Doddapaneni, S., Cai, Y., & Kuzmin, D. (2024). PERSOMA: PERsonalized SOft ProMpt Adapter Architecture for Personalized Language Prompting. In ArXiv preprint: Vol. abs/2408.00960. https://arxiv.org/abs/2408.00960

Heidt, A. (2025). Walking in two worlds: how an Indigenous computer scientist is using AI to preserve threatened languages. Nature, 641(8062), 548–550.

Hofmann, V., Kalluri, P. R., Jurafsky, D., & King, S. (2024). AI generates covertly racist decisions about people based on their dialect. Nature, 1–8.

Joshi, A., Dabre, R., Kanojia, D., Li, Z., Zhan, H., Haffari, G., & Dippold, D. (2025). Natural language processing for dialects of a language: A survey. ACM Computing Surveys, 57(6), 1–37.

Kantharuban, A., Vulić, I., & Korhonen, A. (2023). Quantifying the Dialect Gap and its Correlates Across Languages. Findings of the Association for Computational Linguistics: EMNLP 2023, 7226–7245.

Levy, S., Adler, W., Karver, T. S., Dredze, M., & Kaufman, M. R. (2024). Gender bias in decision-making with large language models: A study of relationship conflicts. Findings of the Association for Computational Linguistics: EMNLP 2024, 5777–5800.

Luccioni, A. S., & Viviano, J. D. (2021). What’s in the Box? A Preliminary Analysis of Undesirable Content in the Common Crawl Corpus. In ArXiv preprint: Vol. abs/2105.02732. https://arxiv.org/abs/2105.02732

Lwowski, B., & Rios, A. (2021). The risk of racial bias while tracking influenza-related content on social media using machine learning. Journal of the American Medical Informatics Association, 28(4), 839–849.

Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., & Galstyan, A. (2021). A survey on bias and fairness in machine learning. ACM Computing Surveys (CSUR), 54(6), 1–35.

Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L., Hutchinson, B., Spitzer, E., Raji, I. D., & Gebru, T. (2019). Model Cards for Model Reporting. Proceedings of the Conference on Fairness, Accountability, and Transparency, 220–229. https://doi.org/10.1145/3287560.3287596

Naous, T., Ryan, M. J., Ritter, A., & Xu, W. (2024). Having Beer after Prayer? Measuring Cultural Bias in Large Language Models. In L.-W. Ku, A. Martins, & V. Srikumar (Eds.), Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 16366–16393). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.acl-long.862

Navigli, R., Conia, S., & Ross, B. (2023). Biases in large language models: origins, inventory, and discussion. ACM Journal of Data and Information Quality, 15(2), 1–21.

Orife, I., Kreutzer, J., Sibanda, B., Whitenack, D., Siminyu, K., Martinus, L., Ali, J. T., Abbott, J., Marivate, V., Kabongo, S., & others. (2020). Masakhane–machine translation for Africa. arXiv Preprint arXiv:2003.11529.

Röttger, P., Hofmann, V., Pyatkin, V., Hinck, M., Kirk, H., Schütze, H., & Hovy, D. (2024). Political compass or spinning arrow? towards more meaningful evaluations for values and opinions in large language models. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 15295–15311.

Ryan, M. J., Shaikh, O., Bhagirath, A., Frees, D., Held, W. B., & Yang, D. (2025). Synthesizeme! inducing persona-guided prompts for personalized reward models in llms. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 8045–8078.

Salemi, A., Mysore, S., Bendersky, M., & Zamani, H. (2024). LaMP: When Large Language Models Meet Personalization. In L.-W. Ku, A. Martins, & V. Srikumar (Eds.), Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 7370–7392). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.acl-long.399

Sap, M., Card, D., Gabriel, S., Choi, Y., & Smith, N. A. (2019). The risk of racial bias in hate speech detection. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 1668–1678.

Schaller, N.-J., Ding, Y., Horbach, A., Meyer, J., & Jansen, T. (2024). Fairness in automated essay scoring: A comparative analysis of algorithms on German learner essays from secondary education. Proceedings of the 19th Workshop on Innovative Use of Nlp for Building Educational Applications (Bea 2024), 210–221.

Scheuerman, M. K., Hanna, A., & Denton, R. (2021). Do datasets have politics? Disciplinary values in computer vision dataset development. Proceedings of the ACM on Human-Computer Interaction, 5(CSCW2), 1–37.

Sen, I., Lutz, M., Rogers, E., Garcia, D., & Strohmaier, M. (2025). Missing the margins: A systematic literature review on the demographic representativeness of LLMs. Findings of the Association for Computational Linguistics: ACL 2025, 24263–24289.

Shah, D. S., Schwartz, H. A., & Hovy, D. (2020). Predictive biases in natural language processing models: A conceptual framework and overview. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 5248–5264.

Shao, Y., Zope, H., Jiang, Y., Pei, J., Nguyen, D., Brynjolfsson, E., & Yang, D. (2025). Future of Work with AI Agents: Auditing Automation and Augmentation Potential across the US Workforce. arXiv Preprint arXiv:2506.06576.

Shelby, R., Rismani, S., Henne, K., Moon, Aj., Rostamzadeh, N., Nicholas, P., Yilla-Akbari, N., Gallegos, J., Smart, A., Garcia, E., & others. (2023). Sociotechnical harms of algorithmic systems: Scoping a taxonomy for harm reduction. Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, 723–741.

Tamkin, A., Askell, A., Lovitt, L., Durmus, E., Joseph, N., Kravec, S., Nguyen, K., Kaplan, J., & Ganguli, D. (2023). Evaluating and mitigating discrimination in language model decisions. arXiv Preprint arXiv:2312.03689.

Vaughn, L. M., & Jacquez, F. (2020). Participatory research methods–choice points in the research process. Journal of Participatory Research Methods, 1(1).

Wang, A., Morgenstern, J., & Dickerson, J. P. (2025). Large language models that replace human participants can harmfully misportray and flatten identity groups. Nature Machine Intelligence, 7(3), 400–411.

Ziems, C., Held, W., Yang, J., Dhamala, J., Gupta, R., & Yang, D. (2023). Multi-VALUE: A framework for cross-dialectal English NLP. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 744–768.

Data Representation, Bias and Ethics

Quality-of-Service Harms

Representational Harms

Allocational harms

Mitigating Harms

Graph View

Table of Contents

Backlinks

Literature NotesLiterature NotesLiterature NotesLiterature NotesLiterature NotesLiterature Notes

Quality-of-Service Harms

Representational Harms

Allocational harms

Mitigating Harms

Graph View

Table of Contents

Backlinks