Visualize Image
Answer a prompt based on the content of an image.
Description
This endpoint returns a response based on the content of an image and a base prompt.
The prompt can be a question, statement, or any text that you want to ask about the image. The API will analyze the content of the image and generate a response based on the prompt using a pre-trained model.
Right now there are two models available for this endpoint:
- Uform-Gen: UForm-Gen is a small generative vision-language model primarily designed for Image Captioning and Visual Question Answering.
- Llava: LLaVA is a large multimodal model that can generate text based on images and text prompts.
- Gemini: Gemini is a multimodal model with advanced capabilities for understanding and generating text based on images.
Test images may return accurate results due the test
watermarks applied to them. If you want to get
better results, please use live images. If you just want to test this feature, contact support to temporarily
upgrade your account.
Authorizations
API Key for authentication
Path Parameters
The unique identifier of the image. This identifier is used to reference the image in subsequent requests.
Body
Response
The API will return the Image object in the response body.
Response object for the visualize endpoint.