Analysis Methods
Analyze images using different methods
Classify an image using a pre-trained model. At the moment, the only supported model is the ResNet50 model, a deep learning model that excels at image classification tasks.
Extract text from an image using OCR (Optical Character Recognition).
This method is not available in test mode, since test watermarks prevent accurate text extraction. If you want to test the capabilities of this endpoint, please switch to live mode to use this feature, or contact support to temporarily upgrade your account.
Returns a response based on the content of an image and a base prompt.
The prompt can be a question, statement, or any text that you want to ask about the image. The API will analyze the content of the image and generate a response based on the prompt using a pre-trained model.
Right now there are two models available for this endpoint:
- Uform-Gen: UForm-Gen is a small generative vision-language model primarily designed for Image Captioning and Visual Question Answering.
- Llava: LLaVA is a large multimodal model that can generate text based on images and text prompts.