langchain
RateLimiters (Python only)langchain
Python ChatModels in your application or evaluators, you can add rate limiters to your model(s) that will add client-side control of the frequency with which requests are sent to the model provider API to avoid rate limit errors.
langchain
langchain
components you can add retries to all model calls with the .with_retry(...)
/ .withRetry()
method:
langchain
Python and JS API references for more.
langchain
langchain
you can use other libraries like tenacity
(Python) or backoff
(Python) to implement retries with exponential backoff, or you can implement it from scratch. See some examples of how to do this in the OpenAI docs.
max_concurrency
can be set directly on the evaluate() / aevaluate() functions. This parallelizes evaluation by effectively splitting the dataset across threads.