Supervised Fine-tuning for HCLLMs

Following pre-training, the subsequent stage in the pipeline involves some form of supervised fine-tuning (SFT), where models are trained on curated datasets to align their outputs with specific objectives or use cases. The goals of fine-tuning vary depending on the desired capabilities and target applications. For instance, existing models have employed SFT on step-by-step rationales and chain-of-thought reasoning examples to enhance their problem-solving and reasoning capabilities (Muennighoff et al., 2025; Olmo et al., 2025). In other cases, SFT can also be used to adapt models for more bespoke, domain-specific applications (Cheng et al., 2025; Yue et al., 2023). Our discussion here focuses specifically on instruction tuning—the supervised process of training LLMs on instruction-response pairs—where models learn to follow diverse user instructions and generate appropriate responses. Instruction-tuning has become central to creating usable, general-purpose conversational AI systems. We briefly survey current practices in the literature before examining key tensions and exploring emerging frontiers for instruction tuning HCLLMs.

Current Practices in Instruction Tuning

Instruction tuning is critical in achieving the instruction-adherance and generalized problem-solving capabilities that have helped popularized LLMs. By guiding the models to follow explicit instructions and domain-specific prompts, Instruction tuning improves LLMs’ capabilities to communicate in a human-centered and user-friendly manner. While massive pre-training on self-supervised tasks improves the model’s understanding of language conventions and semantics, an LLM with an assistant-like ability to respond conversationally and complete tasks makes LLMs far more useful as human tools.

Already, we have seen the many successes of instruction tuning. It has improved performance across diverse language models, from zero-shot reasoning to domain-specific tasks. Beyond general alignment with human preferences, studies have explored ethical and domain-specific challenges, highlighting the versatility of instruction tuning. For instance, Prabhumoye et al. (2023) demonstrated how simply attaching toxicity metadata as part of the instruction template significantly reduces toxicity present in model outputs. This approach suggests that explicitly encoding desirable or undesirable text qualities in instructions might enable the LLM to learn to promote or withhold similar texts with more ease. This demonstrates how instructions can guide models to better align with social norms and ethical considerations.

Moreover, domain-specific instruction tuning for coding (Muennighoff et al., 2023), dialogue systems (Ouyang et al., 2022), financial analysis (Xie et al., 2023), and multilingual translation (Zhu et al., 2024) shows that modest sets of targeted instructions can unlock robust capabilities without sacrificing general knowledge. By specifying instructions to meet the requirements of each domain, LLMs can provide more reliable and appropriate outputs. These successes underscore instruction fine-tuning’s vital role in shaping LLMs into reliable, human-oriented assistants.

Human-Centered Challenges with Instruction Tuning

While instruction tuning can generally improve the capabilities and usability of LLMs, there are still tensions that emerge in human-centered contexts. First, as mentioned before, instruction tuning helps shape models to respond more conversationally, making them more usable for users. However, instruction tuning can lead to superficial improvements, rather than improving the reasoning capabilities of models. For example, prior work found that models can merely learn to mimic the structure of the input data without heed to factual correctness or reliable reasoning and can fail to generalize to tasks outside of the training dataset (Gudibande et al., 2023; Kung & Peng, 2023). When users interact with models, these patterns become especially concerning, as users may be more inclined to trust outputs as the instruction-following style appears more polished, confident, and authoritative, creating a veneer of competence that masks underlying reasoning failures (Park et al., 2025; Rathi et al., 2025). This misalignment between surface-level fluency and actual reliability can lead users to over-rely on model outputs in high-stakes contexts where factual accuracy is critical.

A second tension with instruction tuning relates to safety concerns, which we discuss in more detail in Safety Evaluations. Since instruction-tuned models are trained to comply with the provided prompt, instruction tuning can make models more susceptible to backdoor or poisoning attacks that embed malicious behaviors in datasets (Prabhumoye et al., 2023; Shu et al., 2023; Wan et al., 2023). As a result, models are more likely to produce unsafe responses, such as offensive or disallowed content. Furthermore, other work has demonstrated that instruction tuning can make models more susceptible to jailbreaking attacks, as they have been trained to follow human requests (Zeng et al., 2024). Ostensibly, model designers can add more safety data to the training dataset or align models to avoid these harmful instructions. However, this practice can lead to overly cautious models that refuse to answer even benign queries (Bianchi et al., 2024). This resulting behavior is similarly unhelpful for users. Thus, drawing this line between what instructions are permissible to follow in service of helpfulness versus what instructions may lead to unsafe behavior remains a core design tension.

Future of Instruction Tuning for HCLLMs

What are the next frontiers for instruction tuning? A recurring theme when discussing tensions in instruction tuning is the role that data plays. As mentioned in Data Provenance, ensuring dataset diversity may be crucial for avoiding biases and enhancing generalization across varied tasks. To tackle this question, we can think about diversifying modalities of instruction tuning data and how data is sourced. In addition, there are new frontiers for evaluating how instruction tuned models along dimensions.

Multimodal data for instruction-tuning.

While our focus has centered on textual corpora for instruction-tuning, human-LLM interaction is not confined to text alone. For hands-free assistance, interacting via speech in addition to text is easier to manage for users (Udandarao et al., 2025). For visual editing (e.g., figure design, poster-making), models ought to operate in both visual and textual modalities (Pang et al., 2025; Si et al., 2025). Even in early demonstrations of intelligent assistants, our interactions are envisioned to seamlessly span multiple modalities. including speech, gesture, and images (Bolt, 1980). Advancements in multimodal instruction tuning and dataset creation (Li et al., 2024) are thus critical for developing HCLLMs. Moving beyond text introduces modality-specific factors that are essential for human-centered applications. For instance, prosody in speech can help disambiguate user intent, and deictic gestures can provide spatial grounding (Brooks & Breazeal, 2006; Sasu et al., 2025). Furthermore, these multimodal capabilities require new approaches to both pretraining and fine-tuning on interleaved multimodal data, enabling models to process the rich integrated signals that characterize human communication (Cui et al., 2025; Udandarao et al., 2025).

Synthetic data for instruction-tuning.

How we obtain the requisite data for instruction-tuning remains an important open question. Synthetic data generation is an important research area that presents potential challenges for HCLLMs. From a technical perspective, synthetic data generation offers benefits for scaling data collection by reducing reliance on human labor while maintaining data diversity. There are some arguable human-centered benefits: synthetic data can democratize model development by making fine-tuning more accessible to researchers and practitioners without large annotation budgets. Empirical work has demonstrated that synthetic instruction data can be particularly valuable in low-resource settings, enabling more data-efficient fine-tuning (Pengpun et al., 2024). However, synthetic data generation also poses new challenges, exacerbating tensions discussed in Human-Centered Challenges with Instruction Tuning. Models trained on synthetic instructions may overfit to specific patterns present in the generated data, and despite claims of increased diversity, synthetic datasets can paradoxically reduce the authentic variation found in human-generated instructions (Chen et al., 2024). Figuring out how to address these flaws in synthetic data is important for leveraging this suite of methods to ensure models can handle the full range of real-world user needs and interaction styles.

Human-centered evaluation frameworks.

Emerging research shows that instruction tuning aligns LLMs with human brain activity, particularly in larger models with extensive world knowledge (Aw et al., 2023). These findings open the door to more human-centered evaluation frameworks, where models are examined by how closely they mirror human-like reasoning, empathy, and context awareness. This direction has implications for designing systems that better respect ethical norms, cultural sensitivities, and user well-being.

Aw, K. L., Montariol, S., AlKhamissi, B., Schrimpf, M., & Bosselut, A. (2023). Instruction-tuning Aligns LLMs to the Human Brain. In ArXiv preprint: Vol. abs/2312.00575. https://arxiv.org/abs/2312.00575

Bianchi, F., Suzgun, M., Attanasio, G., Rottger, P., Jurafsky, D., Hashimoto, T., & Zou, J. (2024). Safety-Tuned LLaMAs: Lessons From Improving the Safety of Large Language Models that Follow Instructions. The Twelfth International Conference on Learning Representations.

Bolt, R. A. (1980). “Put-that-there” Voice and gesture at the graphics interface. Proceedings of the 7th Annual Conference on Computer Graphics and Interactive Techniques, 262–270.

Brooks, A. G., & Breazeal, C. (2006). Working with robots and objects: Revisiting deictic reference for achieving spatial common ground. Proceedings of the 1st ACM SIGCHI/SIGART Conference on Human-Robot Interaction, 297–304.

Chen, J., Zhang, Y., Wang, B., Zhao, W. X., Wen, J.-R., & Chen, W. (2024). Unveiling the Flaws: Exploring Imperfections in Synthetic Data and Mitigation Strategies for Large Language Models. Findings of the Association for Computational Linguistics: EMNLP 2024, 14855–14865.

Cheng, T., Wang, Y., He, W., Wang, Q., Cheng, Y., Zhang, Y., Feng, R., Zhang, X., & others. (2025). FineMedLM-o1: Enhancing Medical Knowledge Reasoning Ability of LLM from Supervised Fine-Tuning to Test-Time Training. Second Conference on Language Modeling.

Cui, Y., Chen, H., Deng, H., Huang, X., Li, X., Liu, J., Liu, Y., Luo, Z., Wang, J., Wang, W., & others. (2025). Emu3.5: Native multimodal models are world learners. arXiv Preprint arXiv:2510.26583.

Gudibande, A., Wallace, E., Snell, C., Geng, X., Liu, H., Abbeel, P., Levine, S., & Song, D. (2023). The False Promise of Imitating Proprietary LLMs. ArXiv Preprint, abs/2305.15717. https://arxiv.org/abs/2305.15717

Kung, P.-N., & Peng, N. (2023). Do Models Really Learn to Follow Instructions? An Empirical Study of Instruction Tuning. In A. Rogers, J. Boyd-Graber, & N. Okazaki (Eds.), Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (pp. 1317–1328). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.acl-short.113

Li, S., Lin, R., & Pei, S. (2024). Multi-modal Preference Alignment Remedies Degradation of Visual Instruction Tuning on Language Models. In L.-W. Ku, A. Martins, & V. Srikumar (Eds.), Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 14188–14200). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.acl-long.765

Muennighoff, N., Liu, Q., Zebaze, A., Zheng, Q., Hui, B., Zhuo, T. Y., Singh, S., Tang, X., von Werra, L., & Longpre, S. (2023). OctoPack: Instruction Tuning Code Large Language Models. In ArXiv preprint: Vol. abs/2308.07124. https://arxiv.org/abs/2308.07124

Muennighoff, N., Yang, Z., Shi, W., Li, X. L., Fei-Fei, L., Hajishirzi, H., Zettlemoyer, L., Liang, P., Candès, E., & Hashimoto, T. B. (2025). s1: Simple test-time scaling. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 20286–20332.

Olmo, T., Ettinger, A., Bertsch, A., Kuehl, B., Graham, D., Heineman, D., Groeneveld, D., Brahman, F., Timbers, F., Ivison, H., Morrison, J., Poznanski, J., Lo, K., Soldaini, L., Jordan, M., Chen, M., Noukhovitch, M., Lambert, N., Walsh, P., … Hajishirzi, H. (2025). Olmo 3. arXiv Preprint arXiv:2512.13961. https://doi.org/10.48550/arXiv.2512.13961

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P. F., Leike, J., & Lowe, R. (2022). Training language models to follow instructions with human feedback. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, & A. Oh (Eds.), Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022. http://papers.nips.cc/paper%5C_files/paper/2022/hash/b1efde53be364a73914f58805a001731-Abstract-Conference.html

Pang, W., Lin, K. Q., Jian, X., He, X., & Torr, P. (2025). Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers. arXiv Preprint arXiv:2505.21497.

Park, E., Deng, W. H., Varadarajan, V., Yan, M., Kim, G., Sap, M., & Eslami, M. (2025). Critical or Compliant? The Double-Edged Sword of Reasoning in Chain-of-Thought Explanations. arXiv Preprint arXiv:2511.12001.

Pengpun, P., Udomcharoenchaikit, C., Buaphet, W., & Limkonchotiwat, P. (2024). Seed-Free Synthetic Data Generation Framework for Instruction-Tuning LLMs: A Case Study in Thai. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop), 445–464.

Prabhumoye, S., Patwary, M., Shoeybi, M., & Catanzaro, B. (2023). Adding Instructions during Pretraining: Effective way of Controlling Toxicity in Language Models. In A. Vlachos & I. Augenstein (Eds.), Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (pp. 2636–2651). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.eacl-main.193

Rathi, N., Jurafsky, D., & Zhou, K. (2025). Humans overrely on overconfident language models, across languages. Second Conference on Language Modeling.

Sasu, D., Yamoah, K. A., Quartey, B., & Schluter, N. (2025). Enhancing Speech Instruction Understanding and Disambiguation in Robotics via Speech Prosody. arXiv Preprint arXiv:2506.02057.

Shu, M., Wang, J., Zhu, C., Geiping, J., Xiao, C., & Goldstein, T. (2023). On the exploitability of instruction tuning. Advances in Neural Information Processing Systems, 36, 61836–61856.

Si, C., Zhang, Y., Li, R., Yang, Z., Liu, R., & Yang, D. (2025). Design2code: Benchmarking multimodal code generation for automated front-end engineering. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 3956–3974.

Udandarao, V., Lu, Z., Chang, X., Wang, Y., Yao, V. Z., Jose, A. M., Faghri, F., Gardner, J., & Chiu, C.-C. (2025). Data-Centric Lessons To Improve Speech-Language Pretraining. arXiv Preprint arXiv:2510.20860.

Wan, A., Wallace, E., Shen, S., & Klein, D. (2023). Poisoning Language Models During Instruction Tuning. In A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, & J. Scarlett (Eds.), International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA (Vol. 202, pp. 35413–35425). PMLR. https://proceedings.mlr.press/v202/wan23b.html

Xie, Q., Han, W., Zhang, X., Lai, Y., Peng, M., Lopez-Lira, A., & Huang, J. (2023). PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark for Finance. In ArXiv preprint: Vol. abs/2306.05443. https://arxiv.org/abs/2306.05443

Yue, S., Chen, W., Wang, S., Li, B., Shen, C., Liu, S., Zhou, Y., Xiao, Y., Yun, S., Huang, X., & others. (2023). Disc-lawllm: Fine-tuning large language models for intelligent legal services. arXiv Preprint arXiv:2309.11325.

Zeng, Y., Lin, H., Zhang, J., Yang, D., Jia, R., & Shi, W. (2024). How johnny can persuade llms to jailbreak them: Rethinking persuasion to challenge ai safety by humanizing llms. ArXiv Preprint, abs/2401.06373. https://arxiv.org/abs/2401.06373

Zhu, D., Haddow, B., Chen, P., Shen, X., Zhang, M., & Klakow, D. (2024). Fine-Tuning Large Language Models to Translate: Will a Touch of Noisy Data in Misaligned Languages Suffice? ArXiv Preprint, abs/2404.14122. https://arxiv.org/abs/2404.14122

Supervised Fine-tuning for HCLLMs

Current Practices in Instruction Tuning

Human-Centered Challenges with Instruction Tuning

Future of Instruction Tuning for HCLLMs

Multimodal data for instruction-tuning.

Synthetic data for instruction-tuning.

Human-centered evaluation frameworks.

Graph View

Table of Contents

Backlinks

Literature NotesLiterature NotesLiterature NotesLiterature NotesLiterature NotesLiterature Notes

Current Practices in Instruction Tuning

Human-Centered Challenges with Instruction Tuning

Future of Instruction Tuning for HCLLMs

Multimodal data for instruction-tuning.

Synthetic data for instruction-tuning.

Human-centered evaluation frameworks.

Graph View

Table of Contents

Backlinks