ローカルで各種AIモデルを実行できる無料ソフト「llama.cpp」がマルチモーダル入力をサポートし画像の説明などが可能に – GIGAZINE

2025年5月12日

0

ローカルで各種AIモデルを実行できる無料ソフト「llama.cpp」がマルチモーダル入力をサポートし画像の説明などが可能に - GIGAZINE

AIモデルをローカルで実行できるオープンソースソフトウェア「llama.cpp」が画像の入力に対応しました。画像とテキストを同時に入力して「この画像には何が写っている？」といった質問に回答させられます。

server : vision support via libmtmd by ngxson · Pull Request #12898 · ggml-org/llama.cpp · GitHub
https://github.com/ggml-org/llama.cpp/pull/12898

llama.cpp/docs/multimodal.md at master · ggml-org/llama.cpp · GitHub
https://github.com/ggml-org/llama.cpp/blob/master/docs/multimodal.md

Trying out llama.cpp’s new vision support
https://simonwillison.net/2025/May/10/llama-cpp-vision/

llama.cppはGGUF形式に変換されたAIモデルを実行できるソフトウェアで、Llama以外にGemmaシリーズやDeepSeek-R1など多用なAIモデルをローカル環境で動かすことができます。また、Windows・macOS・Linuxに対応しているほか、MtalやCUDA、OpenCLなど数多くのバックエンドに対応しており、多様な環境でAIモデルを実行できます。

そんなllama.cppに画像入力機能を追加する変更が2025年5月10日にマージされました。llama.cppのWeb UIでポスター画像とともに「これは何？(what is this?)」と入力した例が以下。どんなポスターなのか説明する文章が生成されています。

Radeon RX 7900 XTXを搭載したPCでMistral Small 2504の4ビット量子化版を実行して画像とテキストの同時入力に成功したことも報告されています。以下の例では「イチゴがたくさん盛り付けられた皿」の写真を入力しつつ「『R』という文字を形成しているイチゴは何個？(How many strawberries are there in the letter “R”?)」という難しめの質問を投げかけています。報告者によると、正解することもあれば、間違えることもあったそうです。

ペリカンを使ったAIの性能測定に定評のあるエンジニアのサイモン・ウィリソン氏は、ノートPCでllama.cppを使ってGemma 3の4ビット量子化版を実行し、以下のペリカンの画像とともに「説明して(describe it)」というテキストを入力しました。

回答は以下のとおり。写真の特徴を詳しく説明できています。

It’s a striking photograph of a brown pelican standing in shallow water on a sandy beach. Here’s a breakdown of what I see:

Subject: The main focus is a large brown pelican. It’s standing with its wings fully extended, creating a dramatic pose.

Pose: The pelican is in a slightly awkward, almost comical stance, with its legs planted firmly in the wet sand and its wings spread wide. It looks like it might be preparing to take off or is simply enjoying the water.

Color & Texture: The pelican is a rich, dark brown with lighter markings on its wings and head. The sand is a muted beige color, and the water reflects the sunlight, creating a shimmering effect. You can see the texture of the sand and the ripples in the water.

Lighting: The lighting appears to be bright and sunny, casting shadows and highlighting the details of the pelican’s feathers.

Background: The background is blurred, suggesting a shallow depth of field, which keeps the focus on the pelican. It appears to be a beach scene.

Overall Impression: The image has a dynamic feel due to the pelican’s pose and the movement implied by the water. It’s a well-composed shot that captures a moment of wildlife in its natural habitat.

Do you want me to zoom in on a particular detail, or perhaps tell you more about pelicans?

llama.cppの詳細情報は以下のリンク先で公開されており、ソースコードやインストール手順などを確認できます。

GitHub – ggml-org/llama.cpp: LLM inference in C/C++
https://github.com/ggml-org/llama.cpp

この記事のタイトルとURLをコピーする

Source link

返事を書く返事をキャンセル

あなたのコメントを入力してください。

ここにあなたの名前を入力してください

間違ったメールアドレスを入力しました。

ここにあなたのEメールアドレスを入力してください

ローカルで各種AIモデルを実行できる無料ソフト「llama.cpp」がマルチモーダル入力をサポートし画像の説明などが可能に – GIGAZINE

いいね:

関連

音楽クリエイター育成アカデミーCK Music Academyに、ULTRASONEとqdcが協賛 – AV Watch

霞町音楽堂にて、フルート奏者・有田正広のコンサート＆オーディオイベントが5/31に開催

[Oracle Cloud] Autonomous Database ) SELECT AI（自然言語によるクエリ実行）を活用するコツ 2025年5月分 (2024/05/12) #oci

返事を書く返事をキャンセル

インモビ転職

【2025年版】事務・人事・経理など「裏方業務」に特化！オンライン転職支援【Backup Career】

【2025年版】赤ペンESとは？ES添削が“ずっと無料”の就活支援サービス！

【2025年版】在宅・通勤もOK！高収入を目指せるチャットレディ求人【CHATSTYLE】

【2025年版】“自分らしく働く”を支える在宅×ITの就労支援【manaby】

POPULAR POSTS

『テレネットシューティングコレクション Ii』が8月28日に発売。『キアイダン00』『ファイナルゾーンIi』などPcエンジンの4タイトルが収録

「学マス」のアイテム付きムック「学園アイドルマスター初星学園ステーショナリーセットBOOK」本日発売 – GAME Watch

【本日みつけたお買い得品】リード1,050MB/sのポータブルSSD 1TBが2千円引き – PC Watch

POPULAR CATEGORY

ABOUT US

FOLLOW US

ローカルで各種AIモデルを実行できる無料ソフト「llama.cpp」がマルチモーダル入力をサポートし画像の説明などが可能に – GIGAZINE

共有:

いいね:

関連

返事を書く 返事をキャンセル

インモビ転職

POPULAR POSTS

POPULAR CATEGORY

ABOUT US

FOLLOW US

返事を書く返事をキャンセル