Skip to main content

Proxy Config.yaml

Set model list, api_base, api_key, temperature & proxy server settings (master-key) on the config.yaml.

Param NameDescription
model_listList of supported models on the server, with model-specific configs
router_settingslitellm Router settings, example routing_strategy="least-busy" see all
litellm_settingslitellm Module settings, example litellm.drop_params=True, litellm.set_verbose=True, litellm.api_base, litellm.cache see all
general_settingsServer settings, example setting master_key: sk-my_special_key
environment_variablesEnvironment Variables example, REDIS_HOST, REDIS_PORT

Complete List: Check the Swagger UI docs on <your-proxy-url>/#/config.yaml (e.g. http://0.0.0.0:4000/#/config.yaml), for everything you can pass in the config.yaml.

Quick Start

Set a model alias for your deployments.

In the config.yaml the model_name parameter is the user-facing name to use for your deployment.

In the config below:

  • model_name: the name to pass TO litellm from the external client
  • litellm_params.model: the model string passed to the litellm.completion() function

E.g.:

  • model=vllm-models will route to openai/facebook/opt-125m.
  • model=gpt-3.5-turbo will load balance between azure/gpt-turbo-small-eu and azure/gpt-turbo-small-ca
model_list:
- model_name: gpt-3.5-turbo ### RECEIVED MODEL NAME ###
litellm_params: # all params accepted by litellm.completion() - https://docs.litellm.ai/docs/completion/input
model: azure/gpt-turbo-small-eu ### MODEL NAME sent to `litellm.completion()` ###
api_base: https://my-endpoint-europe-berri-992.openai.azure.com/
api_key: "os.environ/AZURE_API_KEY_EU" # does os.getenv("AZURE_API_KEY_EU")
rpm: 6 # [OPTIONAL] Rate limit for this deployment: in requests per minute (rpm)
- model_name: bedrock-claude-v1
litellm_params:
model: bedrock/anthropic.claude-instant-v1
- model_name: gpt-3.5-turbo
litellm_params:
model: azure/gpt-turbo-small-ca
api_base: https://my-endpoint-canada-berri992.openai.azure.com/
api_key: "os.environ/AZURE_API_KEY_CA"
rpm: 6
- model_name: anthropic-claude
litellm_params:
model: bedrock/anthropic.claude-instant-v1
### [OPTIONAL] SET AWS REGION ###
aws_region_name: us-east-1
- model_name: vllm-models
litellm_params:
model: openai/facebook/opt-125m # the `openai/` prefix tells litellm it's openai compatible
api_base: http://0.0.0.0:4000/v1
api_key: none
rpm: 1440
model_info:
version: 2

# Use this if you want to make requests to `claude-3-haiku-20240307`,`claude-3-opus-20240229`,`claude-2.1` without defining them on the config.yaml
# Default models
# Works for ALL Providers and needs the default provider credentials in .env
- model_name: "*"
litellm_params:
model: "*"

litellm_settings: # module level litellm settings - https://github.com/BerriAI/litellm/blob/main/litellm/__init__.py
drop_params: True
success_callback: ["langfuse"] # OPTIONAL - if you want to start sending LLM Logs to Langfuse. Make sure to set `LANGFUSE_PUBLIC_KEY` and `LANGFUSE_SECRET_KEY` in your env

general_settings:
master_key: sk-1234 # [OPTIONAL] Only use this if you to require all calls to contain this key (Authorization: Bearer sk-1234)
alerting: ["slack"] # [OPTIONAL] If you want Slack Alerts for Hanging LLM requests, Slow llm responses, Budget Alerts. Make sure to set `SLACK_WEBHOOK_URL` in your env
info

For more provider-specific info, go here

Step 2: Start Proxy with config

$ litellm --config /path/to/config.yaml
tip

Run with --detailed_debug if you need detailed debug logs

$ litellm --config /path/to/config.yaml --detailed_debug

Step 3: Test it

Sends request to model where model_name=gpt-3.5-turbo on config.yaml.

If multiple with model_name=gpt-3.5-turbo does Load Balancing

Langchain, OpenAI SDK Usage Examples

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--data ' {
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
],
}
'

LLM configs model_list

Model-specific params (API Base, Keys, Temperature, Max Tokens, Organization, Headers etc.)

You can use the config to save model-specific information like api_base, api_key, temperature, max_tokens, etc.

All input params

Step 1: Create a config.yaml file

model_list:
- model_name: gpt-4-team1
litellm_params: # params for litellm.completion() - https://docs.litellm.ai/docs/completion/input#input---request-body
model: azure/chatgpt-v-2
api_base: https://openai-gpt-4-test-v-1.openai.azure.com/
api_version: "2023-05-15"
azure_ad_token: eyJ0eXAiOiJ
seed: 12
max_tokens: 20
- model_name: gpt-4-team2
litellm_params:
model: azure/gpt-4
api_key: sk-123
api_base: https://openai-gpt-4-test-v-2.openai.azure.com/
temperature: 0.2
- model_name: openai-gpt-3.5
litellm_params:
model: openai/gpt-3.5-turbo
extra_headers: {"AI-Resource Group": "ishaan-resource"}
api_key: sk-123
organization: org-ikDc4ex8NB
temperature: 0.2
- model_name: mistral-7b
litellm_params:
model: ollama/mistral
api_base: your_ollama_api_base

Step 2: Start server with config

$ litellm --config /path/to/config.yaml

Expected Logs:

Look for this line in your console logs to confirm the config.yaml was loaded in correctly.

LiteLLM: Proxy initialized with Config, Set models:

Embedding Models - Use Sagemaker, Bedrock, Azure, OpenAI, XInference

See supported Embedding Providers & Models here

model_list:
- model_name: bedrock-cohere
litellm_params:
model: "bedrock/cohere.command-text-v14"
aws_region_name: "us-west-2"
- model_name: bedrock-cohere
litellm_params:
model: "bedrock/cohere.command-text-v14"
aws_region_name: "us-east-2"
- model_name: bedrock-cohere
litellm_params:
model: "bedrock/cohere.command-text-v14"
aws_region_name: "us-east-1"

Start Proxy

litellm --config config.yaml

Make Request

Sends Request to bedrock-cohere

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--data ' {
"model": "bedrock-cohere",
"messages": [
{
"role": "user",
"content": "gm"
}
]
}'

Multiple OpenAI Organizations

Add all openai models across all OpenAI organizations with just 1 model definition

  - model_name: *
litellm_params:
model: openai/*
api_key: os.environ/OPENAI_API_KEY
organization:
- org-1
- org-2
- org-3

LiteLLM will automatically create separate deployments for each org.

Confirm this via

curl --location 'http://0.0.0.0:4000/v1/model/info' \
--header 'Authorization: Bearer ${LITELLM_KEY}' \
--data ''

Provider specific wildcard routing

Proxy all models from a provider

Use this if you want to proxy all models from a specific provider without defining them on the config.yaml

Step 1 - define provider specific routing on config.yaml

model_list:
# provider specific wildcard routing
- model_name: "anthropic/*"
litellm_params:
model: "anthropic/*"
api_key: os.environ/ANTHROPIC_API_KEY
- model_name: "groq/*"
litellm_params:
model: "groq/*"
api_key: os.environ/GROQ_API_KEY
- model_name: "fo::*:static::*" # all requests matching this pattern will be routed to this deployment, example: model="fo::hi::static::hi" will be routed to deployment: "openai/fo::*:static::*"
litellm_params:
model: "openai/fo::*:static::*"
api_key: os.environ/OPENAI_API_KEY

Step 2 - Run litellm proxy

$ litellm --config /path/to/config.yaml

Step 3 Test it

Test with anthropic/ - all models with anthropic/ prefix will get routed to anthropic/*

curl http://localhost:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-d '{
"model": "anthropic/claude-3-sonnet-20240229",
"messages": [
{"role": "user", "content": "Hello, Claude!"}
]
}'

Test with groq/ - all models with groq/ prefix will get routed to groq/*

curl http://localhost:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-d '{
"model": "groq/llama3-8b-8192",
"messages": [
{"role": "user", "content": "Hello, Claude!"}
]
}'

Test with fo::*::static::* - all requests matching this pattern will be routed to openai/fo::*:static::*

curl http://localhost:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-d '{
"model": "fo::hi::static::hi",
"messages": [
{"role": "user", "content": "Hello, Claude!"}
]
}'

Load Balancing

info

For more on this, go to this page

Use this to call multiple instances of the same model and configure things like routing strategy.

For optimal performance:

  • Set tpm/rpm per model deployment. Weighted picks are then based on the established tpm/rpm.
  • Select your optimal routing strategy in router_settings:routing_strategy.

LiteLLM supports

["simple-shuffle", "least-busy", "usage-based-routing","latency-based-routing"], default="simple-shuffle"`

When tpm/rpm is set + routing_strategy==simple-shuffle litellm will use a weighted pick based on set tpm/rpm. In our load tests setting tpm/rpm for all deployments + routing_strategy==simple-shuffle maximized throughput

  • When using multiple LiteLLM Servers / Kubernetes set redis settings router_settings:redis_host etc
model_list:
- model_name: zephyr-beta
litellm_params:
model: huggingface/HuggingFaceH4/zephyr-7b-beta
api_base: http://0.0.0.0:8001
rpm: 60 # Optional[int]: When rpm/tpm set - litellm uses weighted pick for load balancing. rpm = Rate limit for this deployment: in requests per minute (rpm).
tpm: 1000 # Optional[int]: tpm = Tokens Per Minute
- model_name: zephyr-beta
litellm_params:
model: huggingface/HuggingFaceH4/zephyr-7b-beta
api_base: http://0.0.0.0:8002
rpm: 600
- model_name: zephyr-beta
litellm_params:
model: huggingface/HuggingFaceH4/zephyr-7b-beta
api_base: http://0.0.0.0:8003
rpm: 60000
- model_name: gpt-3.5-turbo
litellm_params:
model: gpt-3.5-turbo
api_key: <my-openai-key>
rpm: 200
- model_name: gpt-3.5-turbo-16k
litellm_params:
model: gpt-3.5-turbo-16k
api_key: <my-openai-key>
rpm: 100

litellm_settings:
num_retries: 3 # retry call 3 times on each model_name (e.g. zephyr-beta)
request_timeout: 10 # raise Timeout error if call takes longer than 10s. Sets litellm.request_timeout
fallbacks: [{"zephyr-beta": ["gpt-3.5-turbo"]}] # fallback to gpt-3.5-turbo if call fails num_retries
context_window_fallbacks: [{"zephyr-beta": ["gpt-3.5-turbo-16k"]}, {"gpt-3.5-turbo": ["gpt-3.5-turbo-16k"]}] # fallback to gpt-3.5-turbo-16k if context window error
allowed_fails: 3 # cooldown model if it fails > 1 call in a minute.

router_settings: # router_settings are optional
routing_strategy: simple-shuffle # Literal["simple-shuffle", "least-busy", "usage-based-routing","latency-based-routing"], default="simple-shuffle"
model_group_alias: {"gpt-4": "gpt-3.5-turbo"} # all requests with `gpt-4` will be routed to models with `gpt-3.5-turbo`
num_retries: 2
timeout: 30 # 30 seconds
redis_host: <your redis host> # set this when using multiple litellm proxy deployments, load balancing state stored in redis
redis_password: <your redis password>
redis_port: 1992

You can view your cost once you set up Virtual keys or custom_callbacks

Load API Keys / config values from Environment

If you have secrets saved in your environment, and don't want to expose them in the config.yaml, here's how to load model-specific keys from the environment. This works for ANY value on the config.yaml

os.environ/<YOUR-ENV-VAR> # runs os.getenv("YOUR-ENV-VAR")
model_list:
- model_name: gpt-4-team1
litellm_params: # params for litellm.completion() - https://docs.litellm.ai/docs/completion/input#input---request-body
model: azure/chatgpt-v-2
api_base: https://openai-gpt-4-test-v-1.openai.azure.com/
api_version: "2023-05-15"
api_key: os.environ/AZURE_NORTH_AMERICA_API_KEY # 👈 KEY CHANGE

See Code

s/o to @David Manouchehri for helping with this.

Load API Keys from Secret Managers (Azure Vault, etc)

Using Secret Managers with LiteLLM Proxy

Set Supported Environments for a model - production, staging, development

Use this if you want to control which model is exposed on a specific litellm environment

Supported Environments:

  • production
  • staging
  • development
  1. Set LITELLM_ENVIRONMENT="<environment>" in your environment. Can be one of production, staging or development
  1. For each model set the list of supported environments in model_info.supported_environments
model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: openai/gpt-3.5-turbo
api_key: os.environ/OPENAI_API_KEY
model_info:
supported_environments: ["development", "production", "staging"]
- model_name: gpt-4
litellm_params:
model: openai/gpt-4
api_key: os.environ/OPENAI_API_KEY
model_info:
supported_environments: ["production", "staging"]
- model_name: gpt-4o
litellm_params:
model: openai/gpt-4o
api_key: os.environ/OPENAI_API_KEY
model_info:
supported_environments: ["production"]

Set Custom Prompt Templates

LiteLLM by default checks if a model has a prompt template and applies it (e.g. if a huggingface model has a saved chat template in it's tokenizer_config.json). However, you can also set a custom prompt template on your proxy in the config.yaml:

Step 1: Save your prompt template in a config.yaml

# Model-specific parameters
model_list:
- model_name: mistral-7b # model alias
litellm_params: # actual params for litellm.completion()
model: "huggingface/mistralai/Mistral-7B-Instruct-v0.1"
api_base: "<your-api-base>"
api_key: "<your-api-key>" # [OPTIONAL] for hf inference endpoints
initial_prompt_value: "\n"
roles: {"system":{"pre_message":"<|im_start|>system\n", "post_message":"<|im_end|>"}, "assistant":{"pre_message":"<|im_start|>assistant\n","post_message":"<|im_end|>"}, "user":{"pre_message":"<|im_start|>user\n","post_message":"<|im_end|>"}}
final_prompt_value: "\n"
bos_token: " "
eos_token: " "
max_tokens: 4096

Step 2: Start server with config

$ litellm --config /path/to/config.yaml

General Settings general_settings (DB Connection, etc)

Configure DB Pool Limits + Connection Timeouts

general_settings: 
database_connection_pool_limit: 100 # sets connection pool for prisma client to postgres db at 100
database_connection_timeout: 60 # sets a 60s timeout for any connection call to the db

All settings

environment_variables: {}

model_list:
- model_name: string
litellm_params: {}
model_info:
id: string
mode: embedding
input_cost_per_token: 0
output_cost_per_token: 0
max_tokens: 2048
base_model: gpt-4-1106-preview
additionalProp1: {}

litellm_settings:
# Logging/Callback settings
success_callback: ["langfuse"] # list of success callbacks
failure_callback: ["sentry"] # list of failure callbacks
callbacks: ["otel"] # list of callbacks - runs on success and failure
service_callbacks: ["datadog", "prometheus"] # logs redis, postgres failures on datadog, prometheus
turn_off_message_logging: boolean # prevent the messages and responses from being logged to on your callbacks, but request metadata will still be logged.
redact_user_api_key_info: boolean # Redact information about the user api key (hashed token, user_id, team id, etc.), from logs. Currently supported for Langfuse, OpenTelemetry, Logfire, ArizeAI logging.
langfuse_default_tags: ["cache_hit", "cache_key", "proxy_base_url", "user_api_key_alias", "user_api_key_user_id", "user_api_key_user_email", "user_api_key_team_alias", "semantic-similarity", "proxy_base_url"] # default tags for Langfuse Logging

request_timeout: 10 # (int) llm requesttimeout in seconds. Raise Timeout error if call takes longer than 10s. Sets litellm.request_timeout

set_verbose: boolean # sets litellm.set_verbose=True to view verbose debug logs. DO NOT LEAVE THIS ON IN PRODUCTION
json_logs: boolean # if true, logs will be in json format

# Fallbacks, reliability
default_fallbacks: ["claude-opus"] # set default_fallbacks, in case a specific model group is misconfigured / bad.
content_policy_fallbacks: [{"gpt-3.5-turbo-small": ["claude-opus"]}] # fallbacks for ContentPolicyErrors
context_window_fallbacks: [{"gpt-3.5-turbo-small": ["gpt-3.5-turbo-large", "claude-opus"]}] # fallbacks for ContextWindowExceededErrors



# Caching settings
cache: true
cache_params: # set cache params for redis
type: redis # type of cache to initialize

# Optional - Redis Settings
host: "localhost" # The host address for the Redis cache. Required if type is "redis".
port: 6379 # The port number for the Redis cache. Required if type is "redis".
password: "your_password" # The password for the Redis cache. Required if type is "redis".
namespace: "litellm.caching.caching" # namespace for redis cache

# Optional - Redis Cluster Settings
redis_startup_nodes: [{"host": "127.0.0.1", "port": "7001"}]

# Optional - Redis Sentinel Settings
service_name: "mymaster"
sentinel_nodes: [["localhost", 26379]]

# Optional - Qdrant Semantic Cache Settings
qdrant_semantic_cache_embedding_model: openai-embedding # the model should be defined on the model_list
qdrant_collection_name: test_collection
qdrant_quantization_config: binary
similarity_threshold: 0.8 # similarity threshold for semantic cache

# Optional - S3 Cache Settings
s3_bucket_name: cache-bucket-litellm # AWS Bucket Name for S3
s3_region_name: us-west-2 # AWS Region Name for S3
s3_aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID # us os.environ/<variable name> to pass environment variables. This is AWS Access Key ID for S3
s3_aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY # AWS Secret Access Key for S3
s3_endpoint_url: https://s3.amazonaws.com # [OPTIONAL] S3 endpoint URL, if you want to use Backblaze/cloudflare s3 bucket

# Common Cache settings
# Optional - Supported call types for caching
supported_call_types: ["acompletion", "atext_completion", "aembedding", "atranscription"]
# /chat/completions, /completions, /embeddings, /audio/transcriptions
mode: default_off # if default_off, you need to opt in to caching on a per call basis
ttl: 600 # ttl for caching


callback_settings:
otel:
message_logging: boolean # OTEL logging callback specific settings

general_settings:
completion_model: string
disable_spend_logs: boolean # turn off writing each transaction to the db
disable_master_key_return: boolean # turn off returning master key on UI (checked on '/user/info' endpoint)
disable_retry_on_max_parallel_request_limit_error: boolean # turn off retries when max parallel request limit is reached
disable_reset_budget: boolean # turn off reset budget scheduled task
disable_adding_master_key_hash_to_db: boolean # turn off storing master key hash in db, for spend tracking
enable_jwt_auth: boolean # allow proxy admin to auth in via jwt tokens with 'litellm_proxy_admin' in claims
enforce_user_param: boolean # requires all openai endpoint requests to have a 'user' param
allowed_routes: ["route1", "route2"] # list of allowed proxy API routes - a user can access. (currently JWT-Auth only)
key_management_system: google_kms # either google_kms or azure_kms
master_key: string
database_url: string
database_connection_pool_limit: 0 # default 100
database_connection_timeout: 0 # default 60s
custom_auth: string
max_parallel_requests: 0 # the max parallel requests allowed per deployment
global_max_parallel_requests: 0 # the max parallel requests allowed on the proxy all up
infer_model_from_keys: true
background_health_checks: true
health_check_interval: 300
alerting: ["slack", "email"]
alerting_threshold: 0
use_client_credentials_pass_through_routes: boolean # use client credentials for all pass through routes like "/vertex-ai", /bedrock/. When this is True Virtual Key auth will not be applied on these endpoints

litellm_settings - Reference

NameTypeDescription
success_callbackarray of stringsList of success callbacks. Doc Proxy logging callbacks, Doc Metrics
failure_callbackarray of stringsList of failure callbacks Doc Proxy logging callbacks, Doc Metrics
callbacksarray of stringsList of callbacks - runs on success and failure Doc Proxy logging callbacks, Doc Metrics
service_callbacksarray of stringsSystem health monitoring - Logs redis, postgres failures on specified services (e.g. datadog, prometheus) Doc Metrics
turn_off_message_loggingbooleanIf true, prevents messages and responses from being logged to callbacks, but request metadata will still be logged Proxy Logging
modify_paramsbooleanIf true, allows modifying the parameters of the request before it is sent to the LLM provider
enable_preview_featuresbooleanIf true, enables preview features - e.g. Azure O1 Models with streaming support.
redact_user_api_key_infobooleanIf true, redacts information about the user api key from logs Proxy Logging
langfuse_default_tagsarray of stringsDefault tags for Langfuse Logging. Use this if you want to control which LiteLLM-specific fields are logged as tags by the LiteLLM proxy. By default LiteLLM Proxy logs no LiteLLM-specific fields as tags. Further docs
set_verbosebooleanIf true, sets litellm.set_verbose=True to view verbose debug logs. DO NOT LEAVE THIS ON IN PRODUCTION
json_logsbooleanIf true, logs will be in json format. If you need to store the logs as JSON, just set the litellm.json_logs = True. We currently just log the raw POST request from litellm as a JSON Further docs
default_fallbacksarray of stringsList of fallback models to use if a specific model group is misconfigured / bad. Further docs
request_timeoutintegerThe timeout for requests in seconds. If not set, the default value is 6000 seconds. For reference OpenAI Python SDK defaults to 600 seconds.
content_policy_fallbacksarray of objectsFallbacks to use when a ContentPolicyViolationError is encountered. Further docs
context_window_fallbacksarray of objectsFallbacks to use when a ContextWindowExceededError is encountered. Further docs
cachebooleanIf true, enables caching. Further docs
cache_paramsobjectParameters for the cache. Further docs
cache_params.typestringThe type of cache to initialize. Can be one of ["local", "redis", "redis-semantic", "s3", "disk", "qdrant-semantic"]. Defaults to "redis". Furher docs
cache_params.hoststringThe host address for the Redis cache. Required if type is "redis".
cache_params.portintegerThe port number for the Redis cache. Required if type is "redis".
cache_params.passwordstringThe password for the Redis cache. Required if type is "redis".
cache_params.namespacestringThe namespace for the Redis cache.
cache_params.redis_startup_nodesarray of objectsRedis Cluster Settings. Further docs
cache_params.service_namestringRedis Sentinel Settings. Further docs
cache_params.sentinel_nodesarray of arraysRedis Sentinel Settings. Further docs
cache_params.ttlintegerThe time (in seconds) to store entries in cache.
cache_params.qdrant_semantic_cache_embedding_modelstringThe embedding model to use for qdrant semantic cache.
cache_params.qdrant_collection_namestringThe name of the collection to use for qdrant semantic cache.
cache_params.qdrant_quantization_configstringThe quantization configuration for the qdrant semantic cache.
cache_params.similarity_thresholdfloatThe similarity threshold for the semantic cache.
cache_params.s3_bucket_namestringThe name of the S3 bucket to use for the semantic cache.
cache_params.s3_region_namestringThe region name for the S3 bucket.
cache_params.s3_aws_access_key_idstringThe AWS access key ID for the S3 bucket.
cache_params.s3_aws_secret_access_keystringThe AWS secret access key for the S3 bucket.
cache_params.s3_endpoint_urlstringOptional - The endpoint URL for the S3 bucket.
cache_params.supported_call_typesarray of stringsThe types of calls to cache. Further docs
cache_params.modestringThe mode of the cache. Further docs

general_settings - Reference

NameTypeDescription
completion_modelstringThe default model to use for completions when model is not specified in the request
disable_spend_logsbooleanIf true, turns off writing each transaction to the database
disable_master_key_returnbooleanIf true, turns off returning master key on UI. (checked on '/user/info' endpoint)
disable_retry_on_max_parallel_request_limit_errorbooleanIf true, turns off retries when max parallel request limit is reached
disable_reset_budgetbooleanIf true, turns off reset budget scheduled task
disable_adding_master_key_hash_to_dbbooleanIf true, turns off storing master key hash in db
enable_jwt_authbooleanallow proxy admin to auth in via jwt tokens with 'litellm_proxy_admin' in claims. Doc on JWT Tokens
enforce_user_parambooleanIf true, requires all OpenAI endpoint requests to have a 'user' param. Doc on call hooks
allowed_routesarray of stringsList of allowed proxy API routes a user can access Doc on controlling allowed routes
key_management_systemstringSpecifies the key management system. Doc Secret Managers
master_keystringThe master key for the proxy Set up Virtual Keys
database_urlstringThe URL for the database connection Set up Virtual Keys
database_connection_pool_limitintegerThe limit for database connection pool Setting DB Connection Pool limit
database_connection_timeoutintegerThe timeout for database connections in seconds Setting DB Connection Pool limit, timeout
custom_authstringWrite your own custom authentication logic Doc Custom Auth
max_parallel_requestsintegerThe max parallel requests allowed per deployment
global_max_parallel_requestsintegerThe max parallel requests allowed on the proxy overall
infer_model_from_keysbooleanIf true, infers the model from the provided keys
background_health_checksbooleanIf true, enables background health checks. Doc on health checks
health_check_intervalintegerThe interval for health checks in seconds Doc on health checks
alertingarray of stringsList of alerting methods Doc on Slack Alerting
alerting_thresholdintegerThe threshold for triggering alerts Doc on Slack Alerting
use_client_credentials_pass_through_routesbooleanIf true, uses client credentials for all pass-through routes. Doc on pass through routes
health_check_detailsbooleanIf false, hides health check details (e.g. remaining rate limit). Doc on health checks
public_routesList[str](Enterprise Feature) Control list of public routes
alert_typesList[str]Control list of alert types to send to slack (Doc on alert types)[./alerting.md]
enforced_paramsList[str](Enterprise Feature) List of params that must be included in all requests to the proxy
enable_oauth2_authboolean(Enterprise Feature) If true, enables oauth2.0 authentication
use_x_forwarded_forstrIf true, uses the X-Forwarded-For header to get the client IP address
service_account_settingsList[Dict[str, Any]]Set service_account_settings if you want to create settings that only apply to service account keys (Doc on service accounts)[./service_accounts.md]
image_generation_modelstrThe default model to use for image generation - ignores model set in request
store_model_in_dbbooleanIf true, allows /model/new endpoint to store model information in db. Endpoint disabled by default. Doc on /model/new endpoint
max_request_size_mbintThe maximum size for requests in MB. Requests above this size will be rejected.
max_response_size_mbintThe maximum size for responses in MB. LLM Responses above this size will not be sent.
proxy_budget_rescheduler_min_timeintThe minimum time (in seconds) to wait before checking db for budget resets.
proxy_budget_rescheduler_max_timeintThe maximum time (in seconds) to wait before checking db for budget resets.
proxy_batch_write_atintTime (in seconds) to wait before batch writing spend logs to the db.
alerting_argsdictArgs for Slack Alerting Doc on Slack Alerting
custom_key_generatestrCustom function for key generation Doc on custom key generation
allowed_ipsList[str]List of IPs allowed to access the proxy. If not set, all IPs are allowed.
embedding_modelstrThe default model to use for embeddings - ignores model set in request
default_team_disabledbooleanIf true, users cannot create 'personal' keys (keys with no team_id).
alert_to_webhook_urlDict[str]Specify a webhook url for each alert type.
key_management_settingsList[Dict[str, Any]]Settings for key management system (e.g. AWS KMS, Azure Key Vault) Doc on key management
allow_user_authboolean(Deprecated) old approach for user authentication.
user_api_key_cache_ttlintThe time (in seconds) to cache user api keys in memory.
disable_prisma_schema_updatebooleanIf true, turns off automatic schema updates to DB
litellm_key_header_namestrIf set, allows passing LiteLLM keys as a custom header. Doc on custom headers
moderation_modelstrThe default model to use for moderation.
custom_ssostrPath to a python file that implements custom SSO logic. Doc on custom SSO
allow_client_side_credentialsbooleanIf true, allows passing client side credentials to the proxy. (Useful when testing finetuning models) Doc on client side credentials
admin_only_routesList[str](Enterprise Feature) List of routes that are only accessible to admin users. Doc on admin only routes
use_azure_key_vaultbooleanIf true, load keys from azure key vault
use_google_kmsbooleanIf true, load keys from google kms
spend_report_frequencystrSpecify how often you want a Spend Report to be sent (e.g. "1d", "2d", "30d") More on this
ui_access_modeLiteral["admin_only"]If set, restricts access to the UI to admin users only. Docs
litellm_jwtauthDict[str, Any]Settings for JWT authentication. Docs
litellm_licensestrThe license key for the proxy. Docs
oauth2_config_mappingsDict[str, str]Define the OAuth2 config mappings
pass_through_endpointsList[Dict[str, Any]]Define the pass through endpoints. Docs
enable_oauth2_proxy_authboolean(Enterprise Feature) If true, enables oauth2.0 authentication
forward_openai_org_idbooleanIf true, forwards the OpenAI Organization ID to the backend LLM call (if it's OpenAI).
forward_client_headers_to_llm_apibooleanIf true, forwards the client headers (any x- headers) to the backend LLM call

router_settings - Reference

router_settings:
routing_strategy: usage-based-routing-v2 # Literal["simple-shuffle", "least-busy", "usage-based-routing","latency-based-routing"], default="simple-shuffle"
redis_host: <your-redis-host> # string
redis_password: <your-redis-password> # string
redis_port: <your-redis-port> # string
enable_pre_call_check: true # bool - Before call is made check if a call is within model context window
allowed_fails: 3 # cooldown model if it fails > 1 call in a minute.
cooldown_time: 30 # (in seconds) how long to cooldown model if fails/min > allowed_fails
disable_cooldowns: True # bool - Disable cooldowns for all models
enable_tag_filtering: True # bool - Use tag based routing for requests
retry_policy: { # Dict[str, int]: retry policy for different types of exceptions
"AuthenticationErrorRetries": 3,
"TimeoutErrorRetries": 3,
"RateLimitErrorRetries": 3,
"ContentPolicyViolationErrorRetries": 4,
"InternalServerErrorRetries": 4
}
allowed_fails_policy: {
"BadRequestErrorAllowedFails": 1000, # Allow 1000 BadRequestErrors before cooling down a deployment
"AuthenticationErrorAllowedFails": 10, # int
"TimeoutErrorAllowedFails": 12, # int
"RateLimitErrorAllowedFails": 10000, # int
"ContentPolicyViolationErrorAllowedFails": 15, # int
"InternalServerErrorAllowedFails": 20, # int
}
content_policy_fallbacks=[{"claude-2": ["my-fallback-model"]}] # List[Dict[str, List[str]]]: Fallback model for content policy violations
fallbacks=[{"claude-2": ["my-fallback-model"]}] # List[Dict[str, List[str]]]: Fallback model for all errors
NameTypeDescription
routing_strategystringThe strategy used for routing requests. Options: "simple-shuffle", "least-busy", "usage-based-routing", "latency-based-routing". Default is "simple-shuffle". More information here
redis_hoststringThe host address for the Redis server. Only set this if you have multiple instances of LiteLLM Proxy and want current tpm/rpm tracking to be shared across them
redis_passwordstringThe password for the Redis server. Only set this if you have multiple instances of LiteLLM Proxy and want current tpm/rpm tracking to be shared across them
redis_portstringThe port number for the Redis server. Only set this if you have multiple instances of LiteLLM Proxy and want current tpm/rpm tracking to be shared across them
enable_pre_call_checkbooleanIf true, checks if a call is within the model's context window before making the call. More information here
content_policy_fallbacksarray of objectsSpecifies fallback models for content policy violations. More information here
fallbacksarray of objectsSpecifies fallback models for all types of errors. More information here
enable_tag_filteringbooleanIf true, uses tag based routing for requests Tag Based Routing
cooldown_timeintegerThe duration (in seconds) to cooldown a model if it exceeds the allowed failures.
disable_cooldownsbooleanIf true, disables cooldowns for all models. More information here
retry_policyobjectSpecifies the number of retries for different types of exceptions. More information here
allowed_failsintegerThe number of failures allowed before cooling down a model. More information here
allowed_fails_policyobjectSpecifies the number of allowed failures for different error types before cooling down a deployment. More information here

environment variables - Reference

NameDescription
ACTIONS_ID_TOKEN_REQUEST_TOKENToken for requesting ID in GitHub Actions
ACTIONS_ID_TOKEN_REQUEST_URLURL for requesting ID token in GitHub Actions
AISPEND_ACCOUNT_IDAccount ID for AI Spend
AISPEND_API_KEYAPI Key for AI Spend
ALLOWED_EMAIL_DOMAINSList of email domains allowed for access
ARIZE_API_KEYAPI key for Arize platform integration
ARIZE_SPACE_KEYSpace key for Arize platform
ARGILLA_BATCH_SIZEBatch size for Argilla logging
ARGILLA_API_KEYAPI key for Argilla platform
ARGILLA_SAMPLING_RATESampling rate for Argilla logging
ARGILLA_DATASET_NAMEDataset name for Argilla logging
ARGILLA_BASE_URLBase URL for Argilla service
ATHINA_API_KEYAPI key for Athina service
AUTH_STRATEGYStrategy used for authentication (e.g., OAuth, API key)
AWS_ACCESS_KEY_IDAccess Key ID for AWS services
AWS_PROFILE_NAMEAWS CLI profile name to be used
AWS_REGION_NAMEDefault AWS region for service interactions
AWS_ROLE_NAMERole name for AWS IAM usage
AWS_SECRET_ACCESS_KEYSecret Access Key for AWS services
AWS_SESSION_NAMEName for AWS session
AWS_WEB_IDENTITY_TOKENWeb identity token for AWS
AZURE_API_VERSIONVersion of the Azure API being used
AZURE_AUTHORITY_HOSTAzure authority host URL
AZURE_CLIENT_IDClient ID for Azure services
AZURE_CLIENT_SECRETClient secret for Azure services
AZURE_FEDERATED_TOKEN_FILEFile path to Azure federated token
AZURE_KEY_VAULT_URIURI for Azure Key Vault
AZURE_TENANT_IDTenant ID for Azure Active Directory
BERRISPEND_ACCOUNT_IDAccount ID for BerriSpend service
BRAINTRUST_API_KEYAPI key for Braintrust integration
CIRCLE_OIDC_TOKENOpenID Connect token for CircleCI
CIRCLE_OIDC_TOKEN_V2Version 2 of the OpenID Connect token for CircleCI
CONFIG_FILE_PATHFile path for configuration file
CUSTOM_TIKTOKEN_CACHE_DIRCustom directory for Tiktoken cache
DATABASE_HOSTHostname for the database server
DATABASE_NAMEName of the database
DATABASE_PASSWORDPassword for the database user
DATABASE_PORTPort number for database connection
DATABASE_SCHEMASchema name used in the database
DATABASE_URLConnection URL for the database
DATABASE_USERUsername for database connection
DATABASE_USERNAMEAlias for database user
DATABRICKS_API_BASEBase URL for Databricks API
DD_BASE_URLBase URL for Datadog integration
DATADOG_BASE_URL(Alternative to DD_BASE_URL) Base URL for Datadog integration
_DATADOG_BASE_URL(Alternative to DD_BASE_URL) Base URL for Datadog integration
DD_API_KEYAPI key for Datadog integration
DD_SITESite URL for Datadog (e.g., datadoghq.com)
DD_SOURCESource identifier for Datadog logs
DEBUG_OTELEnable debug mode for OpenTelemetry
DIRECT_URLDirect URL for service endpoint
DISABLE_ADMIN_UIToggle to disable the admin UI
DISABLE_SCHEMA_UPDATEToggle to disable schema updates
DOCS_DESCRIPTIONDescription text for documentation pages
DOCS_FILTEREDFlag indicating filtered documentation
DOCS_TITLETitle of the documentation pages
EMAIL_SUPPORT_CONTACTSupport contact email address
GCS_BUCKET_NAMEName of the Google Cloud Storage bucket
GCS_PATH_SERVICE_ACCOUNTPath to the Google Cloud service account JSON file
GENERIC_AUTHORIZATION_ENDPOINTAuthorization endpoint for generic OAuth providers
GENERIC_CLIENT_IDClient ID for generic OAuth providers
GENERIC_CLIENT_SECRETClient secret for generic OAuth providers
GENERIC_CLIENT_STATEState parameter for generic client authentication
GENERIC_INCLUDE_CLIENT_IDInclude client ID in requests for OAuth
GENERIC_SCOPEScope settings for generic OAuth providers
GENERIC_TOKEN_ENDPOINTToken endpoint for generic OAuth providers
GENERIC_USER_DISPLAY_NAME_ATTRIBUTEAttribute for user's display name in generic auth
GENERIC_USER_EMAIL_ATTRIBUTEAttribute for user's email in generic auth
GENERIC_USER_FIRST_NAME_ATTRIBUTEAttribute for user's first name in generic auth
GENERIC_USER_ID_ATTRIBUTEAttribute for user ID in generic auth
GENERIC_USER_LAST_NAME_ATTRIBUTEAttribute for user's last name in generic auth
GENERIC_USER_PROVIDER_ATTRIBUTEAttribute specifying the user's provider
GENERIC_USER_ROLE_ATTRIBUTEAttribute specifying the user's role
GENERIC_USERINFO_ENDPOINTEndpoint to fetch user information in generic OAuth
GALILEO_BASE_URLBase URL for Galileo platform
GALILEO_PASSWORDPassword for Galileo authentication
GALILEO_PROJECT_IDProject ID for Galileo usage
GALILEO_USERNAMEUsername for Galileo authentication
GREENSCALE_API_KEYAPI key for Greenscale service
GREENSCALE_ENDPOINTEndpoint URL for Greenscale service
GOOGLE_APPLICATION_CREDENTIALSPath to Google Cloud credentials JSON file
GOOGLE_CLIENT_IDClient ID for Google OAuth
GOOGLE_CLIENT_SECRETClient secret for Google OAuth
GOOGLE_KMS_RESOURCE_NAMEName of the resource in Google KMS
HF_API_BASEBase URL for Hugging Face API
HELICONE_API_KEYAPI key for Helicone service
HUGGINGFACE_API_BASEBase URL for Hugging Face API
IAM_TOKEN_DB_AUTHIAM token for database authentication
JSON_LOGSEnable JSON formatted logging
JWT_AUDIENCEExpected audience for JWT tokens
JWT_PUBLIC_KEY_URLURL to fetch public key for JWT verification
LAGO_API_BASEBase URL for Lago API
LAGO_API_CHARGE_BYParameter to determine charge basis in Lago
LAGO_API_EVENT_CODEEvent code for Lago API events
LAGO_API_KEYAPI key for accessing Lago services
LANGFUSE_DEBUGToggle debug mode for Langfuse
LANGFUSE_FLUSH_INTERVALInterval for flushing Langfuse logs
LANGFUSE_HOSTHost URL for Langfuse service
LANGFUSE_PUBLIC_KEYPublic key for Langfuse authentication
LANGFUSE_RELEASERelease version of Langfuse integration
LANGFUSE_SECRET_KEYSecret key for Langfuse authentication
LANGSMITH_API_KEYAPI key for Langsmith platform
LANGSMITH_BASE_URLBase URL for Langsmith service
LANGSMITH_BATCH_SIZEBatch size for operations in Langsmith
LANGSMITH_DEFAULT_RUN_NAMEDefault name for Langsmith run
LANGSMITH_PROJECTProject name for Langsmith integration
LANGSMITH_SAMPLING_RATESampling rate for Langsmith logging
LANGTRACE_API_KEYAPI key for Langtrace service
LITERAL_API_KEYAPI key for Literal integration
LITERAL_API_URLAPI URL for Literal service
LITERAL_BATCH_SIZEBatch size for Literal operations
LITELLM_DONT_SHOW_FEEDBACK_BOXFlag to hide feedback box in LiteLLM UI
LITELLM_DROP_PARAMSParameters to drop in LiteLLM requests
LITELLM_EMAILEmail associated with LiteLLM account
LITELLM_GLOBAL_MAX_PARALLEL_REQUEST_RETRIESMaximum retries for parallel requests in LiteLLM
LITELLM_GLOBAL_MAX_PARALLEL_REQUEST_RETRY_TIMEOUTTimeout for retries of parallel requests in LiteLLM
LITELLM_HOSTED_UIURL of the hosted UI for LiteLLM
LITELLM_LICENSELicense key for LiteLLM usage
LITELLM_LOCAL_MODEL_COST_MAPLocal configuration for model cost mapping in LiteLLM
LITELLM_LOGEnable detailed logging for LiteLLM
LITELLM_MODEOperating mode for LiteLLM (e.g., production, development)
LITELLM_SALT_KEYSalt key for encryption in LiteLLM
LITELLM_SECRET_AWS_KMS_LITELLM_LICENSEAWS KMS encrypted license for LiteLLM
LITELLM_TOKENAccess token for LiteLLM integration
LOGFIRE_TOKENToken for Logfire logging service
MICROSOFT_CLIENT_IDClient ID for Microsoft services
MICROSOFT_CLIENT_SECRETClient secret for Microsoft services
MICROSOFT_TENANTTenant ID for Microsoft Azure
NO_DOCSFlag to disable documentation generation
NO_PROXYList of addresses to bypass proxy
OAUTH_TOKEN_INFO_ENDPOINTEndpoint for OAuth token info retrieval
OPENAI_API_BASEBase URL for OpenAI API
OPENAI_API_KEYAPI key for OpenAI services
OPENAI_ORGANIZATIONOrganization identifier for OpenAI
OPENID_BASE_URLBase URL for OpenID Connect services
OPENID_CLIENT_IDClient ID for OpenID Connect authentication
OPENID_CLIENT_SECRETClient secret for OpenID Connect authentication
OPENMETER_API_ENDPOINTAPI endpoint for OpenMeter integration
OPENMETER_API_KEYAPI key for OpenMeter services
OPENMETER_EVENT_TYPEType of events sent to OpenMeter
OTEL_ENDPOINTOpenTelemetry endpoint for traces
OTEL_ENVIRONMENT_NAMEEnvironment name for OpenTelemetry
OTEL_EXPORTERExporter type for OpenTelemetry
OTEL_HEADERSHeaders for OpenTelemetry requests
OTEL_SERVICE_NAMEService name identifier for OpenTelemetry
OTEL_TRACER_NAMETracer name for OpenTelemetry tracing
PREDIBASE_API_BASEBase URL for Predibase API
PRESIDIO_ANALYZER_API_BASEBase URL for Presidio Analyzer service
PRESIDIO_ANONYMIZER_API_BASEBase URL for Presidio Anonymizer service
PROMETHEUS_URLURL for Prometheus service
PROMPTLAYER_API_KEYAPI key for PromptLayer integration
PROXY_ADMIN_IDAdmin identifier for proxy server
PROXY_BASE_URLBase URL for proxy service
PROXY_LOGOUT_URLURL for logging out of the proxy service
PROXY_MASTER_KEYMaster key for proxy authentication
QDRANT_API_BASEBase URL for Qdrant API
QDRANT_API_KEYAPI key for Qdrant service
QDRANT_URLConnection URL for Qdrant database
REDIS_HOSTHostname for Redis server
REDIS_PASSWORDPassword for Redis service
REDIS_PORTPort number for Redis server
SERVER_ROOT_PATHRoot path for the server application
SET_VERBOSEFlag to enable verbose logging
SLACK_DAILY_REPORT_FREQUENCYFrequency of daily Slack reports (e.g., daily, weekly)
SLACK_WEBHOOK_URLWebhook URL for Slack integration
SMTP_HOSTHostname for the SMTP server
SMTP_PASSWORDPassword for SMTP authentication
SMTP_PORTPort number for SMTP server
SMTP_SENDER_EMAILEmail address used as the sender in SMTP transactions
SMTP_SENDER_LOGOLogo used in emails sent via SMTP
SMTP_TLSFlag to enable or disable TLS for SMTP connections
SMTP_USERNAMEUsername for SMTP authentication
SPEND_LOGS_URLURL for retrieving spend logs
SSL_CERTIFICATEPath to the SSL certificate file
SSL_VERIFYFlag to enable or disable SSL certificate verification
SUPABASE_KEYAPI key for Supabase service
SUPABASE_URLBase URL for Supabase instance
TEST_EMAIL_ADDRESSEmail address used for testing purposes
UI_LOGO_PATHPath to the logo image used in the UI
UI_PASSWORDPassword for accessing the UI
UI_USERNAMEUsername for accessing the UI
UPSTREAM_LANGFUSE_DEBUGFlag to enable debugging for upstream Langfuse
UPSTREAM_LANGFUSE_HOSTHost URL for upstream Langfuse service
UPSTREAM_LANGFUSE_PUBLIC_KEYPublic key for upstream Langfuse authentication
UPSTREAM_LANGFUSE_RELEASERelease version identifier for upstream Langfuse
UPSTREAM_LANGFUSE_SECRET_KEYSecret key for upstream Langfuse authentication
USE_AWS_KMSFlag to enable AWS Key Management Service for encryption
WEBHOOK_URLURL for receiving webhooks from external services

Extras

Disable Swagger UI

To disable the Swagger docs from the base url, set

NO_DOCS="True"

in your environment, and restart the proxy.

Use CONFIG_FILE_PATH for proxy (Easier Azure container deployment)

  1. Setup config.yaml
model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: gpt-3.5-turbo
api_key: os.environ/OPENAI_API_KEY
  1. Store filepath as env var
CONFIG_FILE_PATH="/path/to/config.yaml"
  1. Start Proxy
$ litellm 

# RUNNING on http://0.0.0.0:4000

Providing LiteLLM config.yaml file as a s3, GCS Bucket Object/url

Use this if you cannot mount a config file on your deployment service (example - AWS Fargate, Railway etc)

LiteLLM Proxy will read your config.yaml from an s3 Bucket or GCS Bucket

Set the following .env vars

LITELLM_CONFIG_BUCKET_TYPE = "gcs"                              # set this to "gcs"         
LITELLM_CONFIG_BUCKET_NAME = "litellm-proxy" # your bucket name on GCS
LITELLM_CONFIG_BUCKET_OBJECT_KEY = "proxy_config.yaml" # object key on GCS

Start litellm proxy with these env vars - litellm will read your config from GCS

docker run --name litellm-proxy \
-e DATABASE_URL=<database_url> \
-e LITELLM_CONFIG_BUCKET_NAME=<bucket_name> \
-e LITELLM_CONFIG_BUCKET_OBJECT_KEY="<object_key>> \
-e LITELLM_CONFIG_BUCKET_TYPE="gcs" \
-p 4000:4000 \
ghcr.io/berriai/litellm-database:main-latest --detailed_debug