ホームニューステックニュースUnsloth で始める gpt-oss のファインチューニング

Unsloth で始める gpt-oss のファインチューニング

2025年8月10日

0

はじめに

2025年8月5日に OpenAI から Open Weight な Reasoning Model である gpt-oss-120b と gpt-oss-20b が公開され、LLM においては GPT-2 が公開された 2019年ぶりにオープンな態度を見せてくれました。

上記の2モデル(以下、gpt-oss) について、他のライブラリよりも2倍高速かつGPUのVRAMを70~80%節約した状態で学習可能 と謳うライブラリ Unsloth の公式ドキュメントから、gpt-oss の実行方法とファインチューニングについての詳細なガイドが公開されているので意訳してまとめます。

https://unsloth.ai/

Unsloth について

Unsloth は OpenAI の Triton による独自のCUDAカーネル実装や、PyTorch の自動微分より効率的な勾配計算を実装することで、従来よりも高速かつ70~80%少ないVRAMの消費量でLLMの学習を可能にするライブラリです。

https://github.com/unslothai/unsloth

gpt-oss を始めとして、Gemma, Qwen, Phi, Llama などの多くの LLM の学習に対応しており、2025年8月現在、以下のようなパフォーマンスで従来の学習を効率化しています。

（上記、Github の README.md を一部日本語に翻訳して引用）

Unsloth の良い点としては、学習の高速さやVRAM消費量が小さいことに加え、学習の効率化において近似計算を一切使用していないことから、精度低下がゼロと自信を持って書かれていることや、Huggingface において様々な動的量子化/GGUFモデル等を公開していることが挙げられます。

https://huggingface.co/unsloth

また、Unsloth を用いてLLMのダウンロードを行うと、謎の仕組みで高速に進みます。(Unsloth: Fast downloading is enabled と出てくるので多分早くなっている)

インストールは以下のコマンドで行えます。

余談ですが、Unsloth の公式ドキュメントに書かれた LLM のファインチューニングガイド、どのモデルを使うべきか？、LoRA のハイパラのガイドは情報が綺麗にまとまっていて個人的にかなり有用なので、ぜひ一読することをオススメします。

gpt-oss の推論にまつわる Tips

Unsloth では、gpt-oss の新しい文解析/トークン化ライブラリである OpenAI の Harmony が推論に使われる Jinja テンプレートと一致しない問題や、モデルが BF16 の精度で学習されているために Tesla T4 などの FP16 しか使えない環境で上手く推論ができない問題を解決しています。

Unsloth を使った際に、6 token/sec の推論スピードを実現するためには、gpt-oss-20B では Unsloth の動的4bit量子化モデルで 14GB以上のメモリが、gpt-oss-120B では Unsloth の 1bit量子化モデルで 66GB以上のメモリが必要です。

推論時には、以下のような OpenAIの推奨設定に従うと良いです。

temperature = 1.0
top_k = 0 (100でも良い結果が得られる可能性があるため要実験)
top_p = 1.0
推奨の最小コンテキスト長: 16,384
最大コンテキスト長: 131,072

また、gpt-oss では Reasoning の強度を選択するために reasoning_effort パラメータがありますが、low にすると応答速度が向上する一方で回答の品質低下の可能性があり、high にするとその逆の事象が起こり得ます。

Chat Template は以下のようになっています。

systemYou are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06\nCurrent date: 2025-08-05

Reasoning: medium

# Valid channels: analysis, commentary, final. Channel must be included for every message.userHelloassistantfinalHi there!userWhat is 1+1?assistant

(上記 \n を改行させています)

Chat Template を見ると、Valid channels: analysis, commentary, final. という見慣れない表記があります。チャネルは analysis, commentary, final の 3つが存在し、それぞれに役割があり analysis などという表記で使われます。

ここで、analysis は Chain-of-Thought などのユーザに送信することを意図していない文書が入り、final は実際にユーザに表示されるテキストが含まれます。

Unsloth で gpt-oss (20B) をファインチューニングする

ここからは、Unsloth を使って gpt-oss-20b を 16-bit の LoRA でファインチューニングしていきます。Unsloth によると、gpt-oss のファインチューニングを 1.5倍高速化しつつ 70% もの VRAM消費量の節約、10倍長いコンテキスト長での学習を実現したとのことです。例えば、gpt-oss-20b の LoRA の学習には 14GB の VRAM で、gpt-oss-120b の学習には 65GB の VRAM があれば可能になっています。

参考までに、他のライブラリでは 20B モデルの学習に最低でも 65GB の VRAM が必要らしいので、Unsloth によって 14GB で学習が行えるのはかなり効率的だと言えます。

以下のコードは Unsloth 公式の Notebook を大きく参考にしています。Kaggle や Colab の T4 GPU でも動くと書いてありますが、推論途中でエラーを吐くことがあるので、以下 Runpod の RTX 5090 を用いて実行確認をしました。

はじめに必要なライブラリをインストールします。unsloth は最新版を入れておきましょう。筆者は次のようなコマンドを実行しました。

uv pip install --system -qqq \
    "torch>=2.8.0" "triton>=3.4.0" "numpy==2.1.2" \
    "unsloth_zoo[base] @ git+https://github.com/unslothai/unsloth-zoo" \
    "unsloth[base] @ git+https://github.com/unslothai/unsloth" \
    torchvision bitsandbytes \
    git+https://github.com/huggingface/transformers \
    git+https://github.com/triton-lang/triton.git@main

次に、モデルを Unsloth を使って読み込みます。今回は unsloth/gpt-oss-20bを用います。お好みに合わせて unsloth が出している gpt-oss コレクションから 120Bモデル、Bitsandbytes 4bit、GGUF形式など好きなモデルを探してみてください。

from unsloth import FastLanguageModel
import torch

max_seq_length = 4096
dtype = None

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/gpt-oss-20b",
    dtype = dtype, 
    max_seq_length = max_seq_length, 
    load_in_4bit = True,  
    full_finetuning = False, 
)

出力

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
Unsloth: We'll be using `/tmp/unsloth_compiled_cache` for temporary Unsloth patches.
Standard import failed for UnslothCPOTrainer: No module named 'UnslothCPOTrainer'. Using tempfile instead!
==((====))==  Unsloth 2025.8.4: Fast Gpt_Oss patching. Transformers: 4.56.0.dev0.
   \\   /|    NVIDIA GeForce RTX 5090. Num GPUs = 1. Max memory: 31.367 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu128. CUDA: 12.0. CUDA Toolkit: 12.8. Triton: 3.4.0
\        /    Bfloat16 = TRUE. FA [Xformers = None. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!

Unsloth で LoRA の学習をするときは Unsloth で読み込んだモデルのメソッドで更にラップしてやる必要があります。

model = FastLanguageModel.get_peft_model(
    model,
    r = 8, 
    target_modules = [
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj",
    ],
    lora_alpha = 16,
    lora_dropout = 0, 
    bias = "none",    
    
    use_gradient_checkpointing = "unsloth", 
    random_state = 3407,
    use_rslora = False,  
    loftq_config = None, 
)

学習前の推論性能を見てみます。 $x^5 + 3x^4 - 10 = 3.$ の方程式を解かせます。ちなみに WolframAlpha によると、実解は $x \approx -2.78, -1.82, 1.32$ とのことです。

from transformers import TextStreamer

messages = [
    {"role": "system", "content": "reasoning language: Japanese\n\nYou are a helpful assistant that can solve mathematical problems."},
    {"role": "user", "content": "Solve x^5 + 3x^4 - 10 = 3."},
]
inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt = True,
    return_tensors = "pt",
    return_dict = True,
    reasoning_effort = "medium", 
).to(model.device)

_ = model.generate(**inputs, max_new_tokens = 1024, streamer = TextStreamer(tokenizer))

出力の全文 (token数の関係で途中で切れています)

systemYou are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-08-09

Reasoning: medium

# Valid channels: analysis, commentary, final. Channel must be included for every message.
Calls to these tools must go to the commentary channel: 'functions'.developer# Instructions

reasoning language: Japanese

You are a helpful assistant that can solve mathematical problems.userSolve x^5 + 3x^4 - 10 = 3.assistantanalysisThe user says: "Solve x^5 + 3x^4 - 10 = 3." So equation: x^5 + 3x^4 - 10 = 3. That is, find x such that x^5 + 3x^4 - 10 = 3. Let's bring RHS: x^5 + 3x^4 - 10 = 3 => x^5 + 3x^4 - 13 = 0. Solve polynomial equation: x^5 + 3x^4 - 13 = 0. Need to find roots. It's a quintic; perhaps integer root? Let's test integer values: For x=1, 1 + 3 - 13 = -9. x=2: 32 + 48 - 13 = 67. So root between 1 and 2. For x=√? Maybe factor? Let's try possible rational roots: factors of 13: ±1, ±13. Try x=1 gives -9. x= -1 gives -1 + 3 -13 = -11. x=13? huge positive. Not rational. Could be no rational root. Solve numerically: equation f(x)=x^5 + 3x^4 -13=0. For x=1: -9; x=1.5: (1.5)^5=7.59375 + 3*(1.5)^4=3*5.0625=15.1875 sum=22.78125 -13 =9.78125 positive. So root between 1 and 1.5. Try x=1.2: 1.2^5= (1.2^2=1.44,1.2^4=1.44^2=2.0736, times 1.2 => 2.48832) Actually let's compute precise: 1.2^5 = 1.2^4 *1.2. 1.2^2=1.44, 1.2^4=1.44^2=2.0736, times 1.2 = 2.48832. So x^5 ≈ 2.4883. 3x^4 = 3*(1.2)^4 = 3*2.0736 = 6.2208. Sum = 8.70912. minus 13 => -4.29088 negative. So root between 1.2 and 1.5.

Try x=1.3: 1.3^5: 1.3^2=1.69, 1.3^4=1.69^2=2.8561, times 1.3 => 3.71393. 3x^4=3*2.8561=8.5683. Sum=12.2822-13= -0.7178. So still negative.

x=1.4: 1.4^5? 1.4^2=1.96, 1.4^4=1.96^2=3.8416, times 1.4 => 5.37824. 3x^4=3*3.8416=11.5248. Sum=16.90304-13=3.90304 positive. So root between 1.3 and 1.4. Let's approximate more.

x=1.33: 1.33^2=1.7689, 1.33^4=1.7689^2=3.1332? Wait compute: 1.7689^2 = 3.1328. times 1.33 gives 4.166? Actually 3.1328*1.33 = 4.16582. Let's do more precisely: 1.33^5 = 4.1658. 1.33^4=3.1328. 3x^4 = 9.3984. Sum=4.1658+9.3984=13.5642. minus 13 => 0.5642 positive. So root between 1.3 and 1.33.

x=1.32: 1.32^2=1.7424, 1.32^4=1.7424^2=3.0367. times 1.32 => 4.0046. 3x^4=9.1101. Sum=13.1147-13=0.1147 positive. So near 1.31 perhaps negative.

x=1.31: 1.31^2=1.7161, 1.31^4= 1.7161^2=2.9475. times 1.31 =>

結果を見ると、元のモデルは指示通り日本語で思考することに失敗しています。

英語以外の言語で Reasoning を可能にするために、学習データとして多言語用 Reasoning データセットの HuggingFaceH4/Multilingual-Thinkingを用います。テキストはstandardize_sharegpt() 関数で gpt-oss 用の形式にしておきます。

ここで、Reasoning 用でないデータセットを用いてファインチューニングをすることもできますが、モデルの Reasoning 能力に影響を及ぼす可能性があるため、データセット内で少なくとも 75% の Reasoning タスクと 25% の Non-Reasoning タスクを含めることが推奨されています。

from datasets import load_dataset
from unsloth.chat_templates import standardize_sharegpt

def formatting_prompts_func(examples):
    convos = examples["messages"]
    texts = [tokenizer.apply_chat_template(convo, tokenize = False, add_generation_prompt = False) for convo in convos]
    return { "text" : texts, }
pass

dataset = load_dataset("HuggingFaceH4/Multilingual-Thinking", split="train")
dataset = standardize_sharegpt(dataset)
dataset = dataset.map(formatting_prompts_func, batched = True,)

print(dataset[0]['text'])

サンプルはこんな感じです。

データの例

systemYou are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-08-09

Reasoning: medium

# Valid channels: analysis, commentary, final. Channel must be included for every message.
Calls to these tools must go to the commentary channel: 'functions'.developer# Instructions

reasoning language: French

You are an AI chatbot with a lively and energetic personality.userCan you show me the latest trends on Twitter right now?assistantanalysisD'accord, l'utilisateur demande les tendances Twitter les plus récentes. Tout d'abord, je dois vérifier si j'ai accès à des données en temps réel. Étant donné que je ne peux pas naviguer sur Internet ou accéder directement à l'API de Twitter, je ne peux pas fournir des tendances en direct. Cependant, je peux donner quelques conseils généraux sur la façon de les trouver.

Je devrais préciser que les tendances Twitter évoluent rapidement et sont spécifiques à chaque région. Je pourrais suggérer de consulter la section « En vogue » sur l'application ou le site web. Aussi, l'utilisation de hashtags et le suivi d'utilisateurs pertinents pourraient être utiles. Il est important de souligner que les tendances varient selon la région et l'heure de la journée. Je devrais garder un ton amical et bienveillant, peut-être ajouter un emoji pour rester léger. Je vais structurer ma réponse étape par étape pour faciliter la lecture. Je dois m'excuser de ne pas pouvoir fournir des données en temps réel et proposer d'autres méthodes. Je conserverai un langage simple et convivial, en évitant les termes techniques.assistantfinalHey there!  While I can't check Twitter (X) in real-time or access live data, I can share some tips to help you spot the latest trends:

1. **Open the "Trending" tab** on the Twitter app or website – it updates constantly!  
2. **Search for hashtags** like #Trending or #Viral to see what’s blowing up.  
3. **Follow accounts** that curate trends (e.g., @TrendingNow, @ViralThreads).  
4. **Check regional trends** – they often differ by location!  

Remember, trends are *super fast-moving* and often tied to pop culture, memes, or breaking news. For example, recent trends have included viral challenges (like the "Distracted Boyfriend" meme revival), celebrity drama, or unexpected events (hello, weather disasters!).  

Want me to brainstorm *what* might trend next? I’ve got ideas!

学習用データが用意できたので、Huggingface TRL の SFTTrainer でファインチューニングを行います。この例では 60ステップだけ学習を回しますが、全データで学習する際は max_steps=None かつ num_train_epochs=1 を指定してください。

from trl import SFTConfig, SFTTrainer
from transformers import DataCollatorForSeq2Seq

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    args = SFTConfig(
        per_device_train_batch_size = 4,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        
        max_steps = 60,
        learning_rate = 2e-4,
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none", 
    ),
)

trainer.train()

学習が無事終了したら、もう一度日本語での Reasoning を試してみましょう。

messages = [
    {"role": "system", "content": "reasoning language: Japanese\n\nYou are a helpful assistant that can solve mathematical problems."},
    {"role": "user", "content": "Solve x^5 + 3x^4 - 10 = 3."},
]
inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt = True,
    return_tensors = "pt",
    return_dict = True,
    reasoning_effort = "medium",
).to(model.device)

_ = model.generate(**inputs, max_new_tokens = 2048, streamer = TextStreamer(tokenizer))

学習後の出力(token数の関係で途中で切れています)

systemYou are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-08-09

Reasoning: medium

# Valid channels: analysis, commentary, final. Channel must be included for every message.
Calls to these tools must go to the commentary channel: 'functions'.developer# Instructions

reasoning language: Japanese

You are a helpful assistant that can solve mathematical problems.userSolve x^5 + 3x^4 - 10 = 3.assistantanalysisまず、方程式を整理させていただきます。与えられた式は:

x^5 + 3x^4 - 10 = 3

左右の項を整理すると:

x^5 + 3x^4 - 10 - 3 = 0

簡略化すると:

x^5 + 3x^4 - 13 = 0

このように、x^5 + 3x^4 - 13 = 0 という多項式方程式が得られます。

この方程式を代数的に解くのは難しいため、数値的なアプローチや近似法、あるいは数値的手法（例えばニュートン法）を使って解を求めるのが適切です。代数的に1次元の高い多項式がある場合、解が簡単に見つからないことが多いため、計算機や数値解析ソフトウェアが必要です。

しかし、整数解の可能性があるかを確認することが可能です。整数解を試してみると簡略化できる可能性があります。例えばx=1、-1、2、-2などを代入します。

まずx=1を試すと:

1^5 + 3*1^4 - 13 = 1 + 3 - 13 = -9

x=1は解にならないようです。次にx=2を試します。

2^5 + 3*2^4 - 13 = 32 + 48 - 13 = 70

x=2では解にはなりません。x=3を試します。

3^5 + 3*3^4 - 13 = 243 + 243 - 13 = 473

非常に大きな数で、正解ではありません。x= -1を試みます。

(-1)^5 + 3*(-1)^4 - 13 = -1 + 3 - 13 = -11

x= -1も解にはなりません。次にx= -2を試します。

(-2)^5 + 3*(-2)^4 - 13 = -32 + 48 - 13 = 3

x= -2が答えとなります。この方程式の1つの解は -2 です。

もし他にも解があるか確認するには数値解析や他の方法が必要ですが、少なくともx= -2が1つの解です。計算機や数値解析ソフトを使ってさらに確認できます。')assistantfinal先に方程式を整理しましょう:

\[
x^5 + 3x^4 - 13 = 0 \quad \text{(右辺を左側に移した結果)}
\]

このような高次の多項式を代数的に解くのは一般的に困難です。しかし、整数解がないか簡易的に確認するために、いくつか代入していきましょう。具体的に数値を試し、実際にその値が方程式を満たすか確認します。

### 整数解の試行

1. \( x = 1 \) を代入:

   \[
   1^5 + 3 \times 1^4 - 13 = 1 + 3 - 13 = -9 \quad (\text{解ではない})
   \]

2. \( x = 2 \) を代入:

   \[
   2^5 + 3 \times 2^4 - 13 = 32 + 48 - 13 = 70 \quad (\text{解ではない})
   \]

3. \( x = 3 \) を代入:

   \[
   3^5 + 3 \times 3^4 - 13 = 243 + 243 - 13 = 473 \quad (\text{解ではない})
   \]

4. \( x = -1 \) を代入:

   \[
   (-1)^5 + 3 \times (-1)^4 - 13 = -1 + 3 - 13 = -11 \quad (\text{解ではない})
   \]

5. \( x = -2 \) を代入:

   \[
   (-2)^5 + 3 \times (-2)^4 - 13 = -32 + 48 - 13 = 3 - 13 = -10 \quad (\text{実際に解})
   \]

このように、\(x = -2\) が方程式 \(x^5 + 3x^4 - 13 = 0\) の解となります。

### さらに確認

ただし、これが唯一の解ではない可能性もあります。他の整数解を探すには、さらに試行または数値的手法（ニュートン法など）を使うと良いです。しかし、今回は \( x = -2 \) が解かつ直感的に見つかります。この方程式は数値解析やグラフ描画で他の解も確認できます。' md

## 🧪 Problem Solving Tips

| ⚡️ Tip | 📌 Example |
|--------|-----------|
| **Check for Integer Roots** | Try plugging small integers (e.g., \(-1, 0, 1, 2\)) into the equation to see if they satisfy it. It's often a quick way to spot solutions. |
| **Use Newton's Method** | Start with an initial guess (e.g., \(-2\)) and iteratively improve it using: \[x_{\text{new}} = x - \frac{f(x)}{f'(x)}\] for \(f(x) = x^5 + 3x^4 - 13\). |
| **Plot the Function** | Graphing tools (e.g., Desmos, Graphing Calculator) can help you see where the function crosses the x-axis, indicating real roots. |
| **Use Factor Theorem** | If you suspect \(x = n\) is a root, try polynomial division to factor out \((x - n)\) from the equation. |
| **Check for Extraneous Solutions** | Especially in equations with radicals or fractions, ensure each step doesn't introduce invalid solutions. |
| **Use Bounds and IVT** | Employ the Intermediate Value Theorem to bracket roots: if \(f(a)  0\), a root exists between \(a\) and \(b\). |
| **Solve Smaller Equations** | For higher-degree equations, try transforming or simplifying the equation (e.g., factoring or substituting). |
| **Understand Limitations** | Some equations may have no algebraic solutions; numerical methods or iterative approaches are necessary. |
| **Practice Regularly** | The more you solve equations, the better you become at spotting patterns and techniques. |

>**🛠️ Tools for Verification**  
>Using a calculator or software (e.g., WolframAlpha, GeoGebra) can help confirm if a solution is correct or if there are additional roots.  

>**🎓 Tip of the Day**  
>Keep a notebook of common equations and their solutions (e.g., \(x^2 - 1 = 0\) has roots \(\pm 1\)). It can speed up problem-solving!

（途中から英語モードが起動してしまいましたが）細かいことに目を瞑れば日本語で Reasoning をすることに成功しました！元々日本語の性能がそこまで高くない、パラメータ数/学習データ（学習ステップ）が少ない、元モデルのReasoning が英語のみで学習されている可能性があるなど、満足した結果が得られない原因はいくつか考えられます。

まとめ

本記事では、Unsloth のドキュメントとファインチューニングの実装を元に Unsloth/gpt-oss の簡単な概要と多言語 Reasoning データのファインチューニングを実行しました。多様にLLMが公開されている今となっては gpt-oss の公開で何かが大きく変わることは無いのかな、と思います。モデルの性能というよりかは OpenAI が Open なことをしている方の驚きが大きそうです。（元々モデル公開の話はありましたが）

ぜひ、どなたか gpt-oss の面白い使い方をされた方はちくわぶ (@prgckwb) までご連絡ください。

https://runpod.io?ref=kyomzfh2

参考文献

Source link

返事を書く返事をキャンセル

あなたのコメントを入力してください。

ここにあなたの名前を入力してください

間違ったメールアドレスを入力しました。

ここにあなたのEメールアドレスを入力してください

Unsloth で始める gpt-oss のファインチューニング

はじめに

Unsloth について

gpt-oss の推論にまつわる Tips

Unsloth で gpt-oss (20B) をファインチューニングする

まとめ

参考文献

いいね:

関連

Iframe 許可属性のサーガ – CodePen

Chris のコーナー: カーソル – CodePen

ブラウザ機能テスト – CodePen

返事を書く返事をキャンセル

ABOUT US

FOLLOW US

ヒカキンさんはうますｷﾞｨ！ #ヒカキン #hikakintv #ナルトダンス

ヒカキンさん、、、ついにやりますねぇ

「金曜の静かな退職、リモートで機能不全に」

Unsloth で始める gpt-oss のファインチューニング

はじめに

Unsloth について

gpt-oss の推論にまつわる Tips

Unsloth で gpt-oss (20B) をファインチューニングする

まとめ

参考文献

共有:

いいね:

関連

Iframe 許可属性のサーガ – CodePen

Chris のコーナー: カーソル – CodePen

ブラウザ機能テスト – CodePen

返事を書く 返事をキャンセル

ABOUT US

FOLLOW US

返事を書く返事をキャンセル