Share this article

Latest news

With KB5043178 to Release Preview Channel, Microsoft advises Windows 11 users to plug in when the battery is low

Copilot in Outlook will generate personalized themes for you to customize the app

Microsoft will raise the price of its 365 Suite to include AI capabilities

Death Stranding Director’s Cut is now Xbox X|S at a huge discount

Outlook will let users create custom account icons so they can tell their accounts apart easier

GPT-4 with Vision is available, allowing JSON mode and function calling

Coders will be happy to analyze images inside GPT-4 Turbo

2 min. read

Published onApril 10, 2024

published onApril 10, 2024

Share this article

Read our disclosure page to find out how can you help Windows Report sustain the editorial teamRead more

Recently, Open AI releasedGPT-4 to everyone in Copilot. Now, we have great news for coders. The LLM developer announced that they have launched GPT-4 Turbo with Vision, or GPT-4V in the API.

GPT-4 Turbo with Vision is now generally available in the API. Vision requests can now also use JSON mode and function calling.https://t.co/cbvJjij3uLBelow are some great ways developers are building with vision. Drop yours in a reply ?

According to thedocumentation, GPT-4V comes with a JSON mode and function calling that will help coders with visual data processing. It also features 128,000 tokens in the context window, just like GPT-4 Turbo.

Basically, GPT-4V will be able to process images from a link, or by passing the base 64 encoded image directly in the request. Microsoft also provided a python code example of usage:

Open AI points out that there are some important limitations to GPT-4V. Within this example, the model is prompted about what are the contents in a certain image. The LLM understands the relationship between certain objects from the image but can’t provide further information about their location. You can ask it for instance what color is a pencil on the table but you’ll probably get no answer or a wrong one if you ask it to find the chair or the table.

TheGPT-4V guide, also explains how to upload 64 encoded images or multiple images. Another interesting mention is that the model is not ready to process medical images such as CT scans. If there is a warning about that particular usage type, probably someone tried doing that for diagnosis.

You will also find a guide on how to calculate token costs. For instance, the high mode cost for a 1024 x 1024 square image in detail is 765 tokens.

What do you think about the new GPT-4 Turbo with Vision? If you’ve tried it, tell us about your experience in the comments section below.

More about the topics:ChatGPT,OpenAI

Claudiu Andone

Windows Toubleshooting Expert

Oldtimer in the tech and science press, Claudiu is focused on whatever comes new from Microsoft.

His abrupt interest in computers started when he saw the first Home Computer as a kid. However, his passion for Windows and everything related became obvious when he became a sys admin in a computer science high school.

With 14 years of experience in writing about everything there is to know about science and technology, Claudiu also likes rock music, chilling in the garden, and Star Wars. May the force be with you, always!

User forum

0 messages

Sort by:LatestOldestMost Votes

Comment*

Name*

Email*

Commenting as.Not you?

Save information for future comments

Comment

Δ

Claudiu Andone

Windows Toubleshooting Expert

Oldtimer in the tech and science press, with 14 years of experience in writing on everything there is to know about science, technology, and Microsoft