POST
/
v1
/
images
/
{image_id}
/
visualize
Visualize Image
curl --request POST \
  --url https://api.img-processing.com/v1/images/{image_id}/visualize \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --data '{
  "prompt": "What is in this image?",
  "model": "uform-gen"
}'
{
  "response": "A beautiful sunset over the mountains with a clear sky and vibrant colors."
}

Description

This endpoint returns a response based on the content of an image and a base prompt. The prompt can be a question, statement, or any text that you want to ask about the image. The API will analyze the content of the image and generate a response based on the prompt using a pre-trained model. Right now there are two models available for this endpoint:
  • Uform-Gen: UForm-Gen is a small generative vision-language model primarily designed for Image Captioning and Visual Question Answering.
  • Llava: LLaVA is a large multimodal model that can generate text based on images and text prompts.
  • Gemini: Gemini is a multimodal model with advanced capabilities for understanding and generating text based on images.
Test images may return accurate results due the test watermarks applied to them. If you want to get better results, please use live images. If you just want to test this feature, contact support to temporarily upgrade your account.

Authorizations

x-api-key
string
header
required

API Key for authentication

Path Parameters

image_id
string
required

The unique identifier of the image. This identifier is used to reference the image in subsequent requests.

Body

application/json
prompt
string
required

The prompt to answer based on the content of the image. This is a natural language question or instruction that the model will respond to.

Required string length: 1 - 1000
model
enum<string>

The model to use for the visualization. Supported models are uform-gen, llava, and gemini. If not provided, the default model will be used.

Available options:
uform-gen,
llava,
gemini

Response

The API will return the Image object in the response body.

Response object for the visualize endpoint.

response
string
required

The response from the AI model. This is the description of the image based on the prompt provided.