Share this article
Latest news
With KB5043178 to Release Preview Channel, Microsoft advises Windows 11 users to plug in when the battery is low
Copilot in Outlook will generate personalized themes for you to customize the app
Microsoft will raise the price of its 365 Suite to include AI capabilities
Death Stranding Director’s Cut is now Xbox X|S at a huge discount
Outlook will let users create custom account icons so they can tell their accounts apart easier
GPT-4 with Vision is available, allowing JSON mode and function calling
Coders will be happy to analyze images inside GPT-4 Turbo
2 min. read
Published onApril 10, 2024
published onApril 10, 2024
Share this article
Read our disclosure page to find out how can you help Windows Report sustain the editorial teamRead more
Recently, Open AI releasedGPT-4 to everyone in Copilot. Now, we have great news for coders. The LLM developer announced that they have launched GPT-4 Turbo with Vision, or GPT-4V in the API.
GPT-4 Turbo with Vision is now generally available in the API. Vision requests can now also use JSON mode and function calling.https://t.co/cbvJjij3uLBelow are some great ways developers are building with vision. Drop yours in a reply ?
According to thedocumentation, GPT-4V comes with a JSON mode and function calling that will help coders with visual data processing. It also features 128,000 tokens in the context window, just like GPT-4 Turbo.
Basically, GPT-4V will be able to process images from a link, or by passing the base 64 encoded image directly in the request. Microsoft also provided a python code example of usage:
Open AI points out that there are some important limitations to GPT-4V. Within this example, the model is prompted about what are the contents in a certain image. The LLM understands the relationship between certain objects from the image but can’t provide further information about their location. You can ask it for instance what color is a pencil on the table but you’ll probably get no answer or a wrong one if you ask it to find the chair or the table.
TheGPT-4V guide, also explains how to upload 64 encoded images or multiple images. Another interesting mention is that the model is not ready to process medical images such as CT scans. If there is a warning about that particular usage type, probably someone tried doing that for diagnosis.
You will also find a guide on how to calculate token costs. For instance, the high mode cost for a 1024 x 1024 square image in detail is 765 tokens.
What do you think about the new GPT-4 Turbo with Vision? If you’ve tried it, tell us about your experience in the comments section below.
More about the topics:ChatGPT,OpenAI
Claudiu Andone
Windows Toubleshooting Expert
Oldtimer in the tech and science press, Claudiu is focused on whatever comes new from Microsoft.
His abrupt interest in computers started when he saw the first Home Computer as a kid. However, his passion for Windows and everything related became obvious when he became a sys admin in a computer science high school.
With 14 years of experience in writing about everything there is to know about science and technology, Claudiu also likes rock music, chilling in the garden, and Star Wars. May the force be with you, always!
User forum
0 messages
Sort by:LatestOldestMost Votes
Comment*
Name*
Email*
Commenting as.Not you?
Save information for future comments
Comment
Δ
Claudiu Andone
Windows Toubleshooting Expert
Oldtimer in the tech and science press, with 14 years of experience in writing on everything there is to know about science, technology, and Microsoft