Skip to content

NRP-Managed LLMs

The NRP provides several hosted open-weights LLM for either API access, or use with our hosted chat interfaces.

Chat Interfaces

If you are looking to chat with an LLM model similar to the interface provided by ChatGPT, we provide LibreChat, based on the LibreChat project. This is a simple chat interface for all of the NRP hosted models. You can use it to chat with the models, or to test out the models.

Visit the LibreChat interface

API Access to LLMs

API access to the LLM is provided through an LiteLLM proxy. In order to access our LLMs, you need to:

  1. Login to NRP’s LiteLLM instance.

  2. Create an API key. During key creation, you will select the models that the key is allowed to access (or all models).

  3. With the API key, you are able to access the API through the endpoint https://llm.nrp-nautilus.io/. An example of how to use the API is below Python code.

Visit the NRP LiteLLM interface

Example Python Code

To access the NRP LLMs, you can use the OpenAI Python client. Below is an example of how to use the OpenAI Python client to access the NRP LLMs.

nrp-llm.py
import os
from openai import OpenAI
client = OpenAI(
# This is the default and can be omitted
api_key = os.environ.get("OPENAI_API_KEY"),
base_url = "https://llm.nrp-nautilus.io/"
)
completion = client.chat.completions.create(
model="gemma3",
messages=[
{"role": "developer", "content": "Talk like a pirate."},
{
"role": "user",
"content": "How do I check if a Python object is an instance of a class?",
},
],
)
print(completion.choices[0].message.content)

Available Models

main - Model is generally supported. You can report issues with the service.

dep - LLM is deprecated and is likely to go away soon.

eval - The LLM is added for testing and we’re evaluating it’s capabilities. Can be unavailable sometimes and change configuration without notifications.

You can follow all updates in our Matrix Machine Learning channel.

LiteLLM nameModelFeatures
deepseek-r1
main
RedHatAI/DeepSeek-R1-0528-quantized.w4a16685B parameters, INT4 quantization, 163,840 tokens, tool calling, Claude and o3 performance
gemma3
main
google/gemma-3-27b-itAgentic AI workflows, 131,072 tokens, speaks 140+ languages
llama3
main
meta-llama/Llama-3.2-90B-Vision-Instructmultimodal (vision), 131,072 tokens
llama3-sdsc
dep
meta-llama/Llama-3.3-70B-Instruct8 languages, 131,072 tokens, tool use
embed-mistral
main
intfloat/e5-mistral-7b-instructembeddings
qwen3
eval
Qwen/Qwen3-235B-A22B-Thinking-2507-FP8235B parameters, FP8 quantization, 262,144 tokens, tool calling, Claude and o3 performance
gorilla
eval
gorilla-llm/gorilla-openfunctions-v2function calling
llava-onevision
eval
llava-hf/llava-onevision-qwen2-7b-ov-hfvision
olmo
eval
allenai/OLMo-2-0325-32B-Instructopen source
phi3
eval
microsoft/Phi-3.5-vision-instructvision
watt
eval
watt-ai/watt-tool-8Bfunction calling
NSF Logo
This work was supported in part by National Science Foundation (NSF) awards CNS-1730158, ACI-1540112, ACI-1541349, OAC-1826967, OAC-2112167, CNS-2100237, CNS-2120019.