Hi everyone.
Currently the vision models from Mistral are incompatible with the ImageContentBlock format supported by Langchain.
Mistral vision models require that the image inputs be in the following formats:
Image URL:
{
"type": "image_url",
"image_url": "https://docs.mistral.ai/img/eiffel-tower-paris.jpg"
}
or Base64 Image:
{
"type": "image_url",
"image_url": f"data:image/jpeg;base64,{base64_image}"
}
But the ImageContentBlock & the create_image_block() in Langchain end up creating images in the following formats:
With Image URL:
{
"type": "image",
"id": "lc_5d1ebab5-63cd-4437-8e87-024202ffe31e",
"url": "https://docs.mistral.ai/img/eiffel-tower-paris.jpg"
}
Or Base64 Image:
{
"type": "image",
"id": "lc_16e07a2d-3b41-40fe-a677-0f3e3d6a6ba4",
"base64": "base64_image_data",
"mime_type": "image/png"
}
& so the mistral models end up rejecting these payloads.
I know we can still create the correct payloads manually but since ContentBlocks are supposed to be model agnostic, so I was wondering if we should add a conversion layer in ChatMistralAI model to convert the image content blocks to the format accepted by the Mistral vision models.
If you folks think this is worth pursuing, Iād be happy to create an issue on the repo & raise a PR for this.