GPT-4 Turbo: OpenAI Integrates Vision in ChatGPT

OpenAI, the renowned AI research and deployment company, has announced the launch of GPT-4 Turbo with Vision in its popular AI chatbot ChatGPT. This feature is also available through the Application Programming Interface (API), which means users with a paid account can use it to get responses for images. Let’s look at what GPT-4 Turbo with Vision can do.

What’s New With GPT-4 Turbo With Vision?

The new model can handle up to 128,000 tokens, which is significantly larger than the 8x increase over GPT-3.5 Turbo. Tokens are chunks of data that are fed into a model for processing. For context, OpenAI writes: “This allows us to include more information per prompt and improves performance on downstream tasks.” The company has trained this model using data up until December 2021.

GPT-4 Turbo with Vision can “infer answers about images,” according to OpenAI’s blog post. It also supports media input via URLs. Although this is their most advanced model yet, it still has limitations.

Limitations And Scope

The system may have trouble understanding or analyzing images containing non-English text; small text; scientific notation or formulas; unusual fonts or stylings; low-resolution text; text in other alphabets or writing systems besides Latin; handwritten notes; screenshots of code from programming languages not covered in the fine-tuning corpus (e.g., Cobol or Fortran); photographs of pages rather than scans/file uploads (e.g., taken by a cellphone); poorly cropped images; medical images like CT scans and X-rays; x-ray film photos without backlighting showing through bone and tissue approximately correctly enough to pass a radiologist’s visual inspection for conventional diagnosis purposes in humans or animals (“approximately” means here that training on similar input would likely improve things due to fine-tuning); graphs without some form of row-and-column structure; or tables without row-and-column markings—basically, anything that would be difficult to comprehend from a layman’s point of view.

They also note that the system is not trained to handle CAPTCHAs and cannot generate them. Valid image file types include PNG (.png), JPEG (.jpeg and .jpg), WEBP (.webp), and non-animated GIF (.gif) files. The size limit for each image is 20MB.

“After processing an uploaded image, we will automatically delete it,” OpenAI says. In addition to the ability to process images, GPT-4 Turbo can also help users with website creation from scratch; coding in Python, JavaScript, TypeScript, Ruby, PHP, Go, Rust, Shell; generating LaTeX math expressions; translating between languages; and simulating characters for video games.

Post Views: 682

What’s New With GPT-4 Turbo With Vision?

Limitations And Scope

Leave a Comment Cancel reply