Adding rate limiters API to embeddings

Hey langchain! I noticed that embeddings don’t have a rate_limiter API like chat models do, when many cloud providers set rate limits in the same way. I see two options: 1) add a rate limiter to only init_embeddings or 2) add a rate limiter directly to the base Embeddings class. I’m not knowledgeable enough to be certain but the latter option seems like it could break more things.

Adding rate limiter support would be helpful to anyone using embedding models in the same way it is helpful to anyone using chat models. Perhaps more, as there are many applications that generate embeddings as background jobs in bulk.

When I was developing a knowledge base bot, I had to attach my own rate limiter to my embedding model.

I would be very interested to work on this feature, what do you think?