Our Goal will be to Create a custom model by fine-tuning the LLaMA 3 model using Unsloth — a free and faster approach, especially with Colab. Once fine-tuned, we’ll run the model locally using Ollama, right from your computer.
Fine-tuning means tweaking the parameters of an LLM on a specific dataset to make it perform better on a certain task or domain. It’s like giving the model a targeted upgrade.
Unsloth makes fine-tuning easier and faster. You’ll want to check it out here.
Ollama is your go-to tool for running LLMs locally. No cloud, no fuss — just quick and simple execution on your own machine.
Like in any ML problem, data collection is necessary. You can grab datasets from Hugging Face for ease, but I’m going the less easy route. We’ll start with a JSON file and format it into Alpaca format.
Steps:
- Clean the JSON file: Only keep the necessary features. Here’s an example of my cleaned data:
{'input': 'What is the number of significant figures in $0.0310 \\times 10^3$ ?',
'output': [{'question': 'The number of significant figures in $11.118 \\times 10^{-6} \\mathrm{~V}$ is',
'options': ['a) 3', 'b) 4', 'c) 5', 'd) 6'],
'correct_answer': 'c) 5'}
2. Convert the list of dictionaries to dataset:
from datasets import Dataset
dataset = Dataset.from_list(final_output)
print(dataset)
3. Save to disk:
dataset.save_to_disk("my_dataset_directory")
4. Zip the dataset and upload to Colab:
import os
print(os.listdir()) # Outputs: ['.config', 'my_dataset.zip', 'sample_data']
5. Extract the zip file:
import zipfile
with zipfile.ZipFile("my_dataset.zip", 'r') as zip_ref:
zip_ref.extractall("my_dataset_directory")
print(os.listdir("my_dataset_directory"))
6. Rename files to correct paths:
!mv "/content/my_dataset_directory/my_dataset_directory/data-00000-of-00001.arrow" "/content/my_dataset_directory/data-00000-of-00001.arrow"
!mv "/content/my_dataset_directory/my_dataset_directory\dataset_info.json" "/content/my_dataset_directory/dataset_info.json"
!mv "/content/my_dataset_directory/my_dataset_directory\state.json" "/content/my_dataset_directory/state.json"
7. Load the dataset:
from datasets import load_from_disk
dataset_path = "/content/my_dataset_directory"
dataset = load_from_disk(dataset_path)
print(dataset)
8. Format the data for Alpaca:
alpaca_format_prompt = """
### Instruction:
Generate 7 similar questions with their respective options and correct answers based on the given input question for an entrance exam.### Input:
input:{}
### Response:
output:{}
"""
EOS_TOKEN = tokenizer.eos_token
def formatting_prompts_func(examples):
texts = []
inputs = examples['input']
outputs = examples['output']
for input, output in zip(inputs, outputs):
text = alpaca_format_prompt.format(input, output) + EOS_TOKEN
texts.append(text)
return {"text": texts}
dataset = dataset.map(formatting_prompts_func, batched=True)
Note: Full code will be provided later.
- Select GPU:
Go toRuntime > Change runtime type
, and choose T4 GPU—it’s free and works perfectly for this. - Install Unsloth:
Run the following commands to install and update Unsloth:
%%capture
!pip install unsloth
3. Get the latest nightly version of Unsloth:
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
This sets up Unsloth for faster and more efficient fine-tuning.
For model selection, we’ll use Unsloth’s FastLanguageModel. It supports 4-bit quantization, reducing memory usage and ensuring faster performance without running into out-of-memory (OOM) issues. Here’s how to set it up:
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Set your desired sequence length
dtype = None # Auto-detect or use Float16 for T4 GPUs, Bfloat16 for Ampere+
load_in_4bit = True # Use 4-bit quantizationfourbit_models = [
"unsloth/llama-3-8b-bnb-4bit",
# Other supported models...
]model, tokenizer = FastLanguageModel.from_pretrained(
model_name="unsloth/llama-3-8b-bnb-4bit",
max_seq_length=max_seq_length,
dtype=dtype,
load_in_4bit=load_in_4bit
)