Enhancing Model Responses with Direct Preference Optimization

Large language models (LLMs) have remarkable capabilities. Nevertheless, using them in customer-facing applications often requires tailoring their responses to align with your organization’s values and brand identity. In this post, we demonstrate how to use direct preference optimization (DPO), a technique that allows you to fine-tune an LLM with human preference data, together with Amazon SageMaker Studio and Amazon SageMaker Ground Truth to align the Meta Llama 3 8B Instruct model responses to your organization’s values.

Using SageMaker Studio and SageMaker Ground Truth for DPO

With DPO, you can fine-tune an LLM with human preference data such as ratings or rankings so that it generates outputs that align with end-user expectations. DPO is computationally efficient and helps enhance a model’s helpfulness, honesty, and harmlessness, divert the LLM from addressing specific subjects, and mitigate biases.

Whether you are fine-tuning a pre-trained LLM with supervised fine-tuning (SFT) or loading an existing fine-tuned model for DPO, you typically need powerful GPUs. With Amazon SageMaker, you can get started quickly and experiment rapidly by using managed Jupyter notebooks equipped with GPU instances.

Orchestrating the end-to-end data collection workflow and developing an application for annotators to rate or rank model responses for DPO fine-tuning can be time-consuming. SageMaker Ground Truth offers human-in-the-loop capabilities that help you set up workflows, manage annotators, and collect consistent, high-quality feedback.

Solution Overview

Below is an overview of the key steps involved:

Load the Meta Llama 3 8B Instruct model into SageMaker Studio and generate responses for a curated set of common and toxic questions.
Store the generated question-answer pairs in Amazon Simple Storage Service (Amazon S3).
Create a workflow in SageMaker Ground Truth to gather human preference data for the responses.
Human annotators interact with the labeling portal to evaluate and rank the model’s responses based on their alignment to the organization’s values.
Process the collected data to adhere to the DPOTrainer expected format.
Fine-tune the Llama 3 model using DPO and the processed data.
Test the fine-tuned model on a holdout evaluation dataset to assess its performance and verify it meets the desired standards.
Deploy the aligned model to a SageMaker endpoint for real-time inference at scale.

Prerequisites

To run the solution described in this post, you must have an AWS account set up, along with an AWS Identity and Access Management (IAM) role that grants you the necessary permissions to create and access the solution resources. If you are new to AWS and haven’t created an account yet, refer to Create a standalone AWS account.

To use SageMaker Studio, you need to have a SageMaker domain set up with a user profile that has the necessary permissions to launch the SageMaker Studio application. If you’re new to SageMaker Studio, the Quick Studio setup is the fastest way to get started.

Set up the notebook and environment

To get started, open SageMaker Studio and create a JupyterLab space. For Instance, choose ml.g5.48xlarge. Run the space, open JupyterLab, and clone the code in the following GitHub repository.

Let’s go through the notebook. First, install the necessary Python libraries.

…

Clean up

After you complete your tasks in the SageMaker Studio notebook, remember to stop your JupyterLab workspace to prevent incurring additional charges. You can do this by choosing Stop next to your JupyterLab space.

Conclusion

Amazon SageMaker offers tools to streamline the process of fine-tuning LLMs to align with human preferences. With SageMaker Studio, you can experiment interactively with different models, questions, and fine-tuning techniques. With SageMaker Ground Truth, you can set up workflows, manage teams, and collect consistent, high-quality human feedback.

In this post, we showed how to enhance the performance of Meta Llama 3 8B Instruct by fine-tuning it using DPO on data collected with SageMaker Ground Truth. To get started, launch SageMaker Studio and run the notebook available in the following GitHub repo. Share your thoughts in the comments section!

About the Authors

**Anastasia Tzeveleka** is a GenAI/ML Specialist Solutions Architect at AWS.

Pranav Murthy is an AI/ML Specialist Solutions Architect at AWS.

**Sundar Raghavan** is an AI/ML Specialist Solutions Architect at AWS.

Introducing AI for customer service

Top Stories

SugarCRM: Pricing, Features, and Alternatives Reviewed (2024)

Snapchat Inks EU Partnership for AI Development

Interactive NYC Taxi Data Visualization using Bokeh & Streamlit (Part 2) | Pratha Pawar

Maximize human preferences with DPO, Amazon SageMaker Studio & Ground Truth.

Enhancing Model Responses with Direct Preference Optimization

Using SageMaker Studio and SageMaker Ground Truth for DPO

Solution Overview

Prerequisites

Set up the notebook and environment

Clean up

Conclusion

About the Authors

Leave a Reply Cancel reply

Related Strories

Overcoming Position Bias: How It Impacts Your Machine

Project: Airbnb Price Prediction by Abhijat Sarari | Sep 2024

Data Science at Home: Nanny Schedule Puzzle with Monte Carlo & Genetic Algorithms | Courtney Perigo | Sep, 2024

Role of orphanages in Pakistan in supporting parentless children | by Rimsha | Oct 2024

Quick Links

Follow Socials

Introducing AI for customer service

Top Stories

SugarCRM: Pricing, Features, and Alternatives Reviewed (2024)

Snapchat Inks EU Partnership for AI Development

Interactive NYC Taxi Data Visualization using Bokeh & Streamlit (Part 2) | Pratha Pawar

Maximize human preferences with DPO, Amazon SageMaker Studio & Ground Truth.

Enhancing Model Responses with Direct Preference Optimization

Using SageMaker Studio and SageMaker Ground Truth for DPO

Solution Overview

Prerequisites

Set up the notebook and environment

Clean up

Conclusion

About the Authors

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Overcoming Position Bias: How It Impacts Your Machine

Project: Airbnb Price Prediction by Abhijat Sarari | Sep 2024

Data Science at Home: Nanny Schedule Puzzle with Monte Carlo & Genetic Algorithms | Courtney Perigo | Sep, 2024

Role of orphanages in Pakistan in supporting parentless children | by Rimsha | Oct 2024

Get Insider Tips and Tricks in Our Newsletter!