Feature Request: Image Input support for ChatMistralAI

Hi everyone.

Currently the vision models from Mistral are incompatible with the ImageContentBlock format supported by Langchain.

Mistral vision models require that the image inputs be in the following formats:

Image URL:

{
    "type": "image_url",
    "image_url": "https://docs.mistral.ai/img/eiffel-tower-paris.jpg"
}

or Base64 Image:

{
  "type": "image_url",
  "image_url": f"data:image/jpeg;base64,{base64_image}"
}

But the ImageContentBlock & the create_image_block() in Langchain end up creating images in the following formats:

With Image URL:


{
    "type": "image", 
    "id": "lc_5d1ebab5-63cd-4437-8e87-024202ffe31e", 
    "url": "https://docs.mistral.ai/img/eiffel-tower-paris.jpg"
}

Or Base64 Image:

{
    "type": "image",
    "id": "lc_16e07a2d-3b41-40fe-a677-0f3e3d6a6ba4",
    "base64": "base64_image_data",
    "mime_type": "image/png"
}

& so the mistral models end up rejecting these payloads.

I know we can still create the correct payloads manually but since ContentBlocks are supposed to be model agnostic, so I was wondering if we should add a conversion layer in ChatMistralAI model to convert the image content blocks to the format accepted by the Mistral vision models.

If you folks think this is worth pursuing, I’d be happy to create an issue on the repo & raise a PR for this.

@choudhary-akash thank you for identifying it, HumanMessage and ChatMessage content is passed completely unchanged to the Mistral API. There is zero image block handling there. I have been using custom wrapper so having this feature will be great.