POST
/
v1
/
images
/
{image_id}
/
visualize
curl --request POST \
  --url https://api.img-processing.com/v1/images/{image_id}/visualize \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --data '{
  "prompt": "What is in this image?",
  "model": "uform-gen"
}'
{
  "response": "A beautiful sunset over the mountains with a clear sky and vibrant colors."
}

Description

This endpoint returns a response based on the content of an image and a base prompt.

The prompt can be a question, statement, or any text that you want to ask about the image. The API will analyze the content of the image and generate a response based on the prompt using a pre-trained model.

Right now there are two models available for this endpoint:

  • Uform-Gen: UForm-Gen is a small generative vision-language model primarily designed for Image Captioning and Visual Question Answering.
  • Llava: LLaVA is a large multimodal model that can generate text based on images and text prompts.
  • Gemini: Gemini is a multimodal model with advanced capabilities for understanding and generating text based on images.

Test images may return accurate results due the test watermarks applied to them. If you want to get better results, please use live images. If you just want to test this feature, contact support to temporarily upgrade your account.


Authorizations

x-api-key
string
header
required

API Key for authentication

Path Parameters

image_id
string
required

The unique identifier of the image. This identifier is used to reference the image in subsequent requests.

Body

application/json

Response

201
application/json

The API will return the Image object in the response body.

Response object for the visualize endpoint.