NRP-Managed LLMs
The NRP provides several hosted open-weights LLM for either API access, or use with our hosted chat interfaces.
Chat Interfaces
If you are looking to chat with an LLM model similar to the interface provided by ChatGPT, we provide LibreChat, based on the LibreChat project. This is a simple chat interface for all of the NRP hosted models. You can use it to chat with the models, or to test out the models.
Visit the LibreChat interface
API Access to LLMs
API access to the LLM is provided through an LiteLLM proxy. In order to access our LLMs, you need to:
Login to NRP’s LiteLLM instance.
Create an API key. During key creation, you will select the models that the key is allowed to access (or all models).
With the API key, you are able to access the API through the endpoint
https://llm.nrp-nautilus.io/
. An example of how to use the API is below Python code.
Visit the NRP LiteLLM interface
Example Python Code
To access the NRP LLMs, you can use the OpenAI Python client. Below is an example of how to use the OpenAI Python client to access the NRP LLMs.
import osfrom openai import OpenAI
client = OpenAI( # This is the default and can be omitted api_key = os.environ.get("OPENAI_API_KEY"), base_url = "https://llm.nrp-nautilus.io/")
completion = client.chat.completions.create( model="gemma3", messages=[ {"role": "developer", "content": "Talk like a pirate."}, { "role": "user", "content": "How do I check if a Python object is an instance of a class?", }, ],)
print(completion.choices[0].message.content)
Available Models
main - Model is generally supported. You can report issues with the service.
dep - LLM is deprecated and is likely to go away soon.
eval - The LLM is added for testing and we’re evaluating it’s capabilities. Can be unavailable sometimes and change configuration without notifications.
You can follow all updates in our Matrix Machine Learning channel.
LiteLLM name | Model | Features |
---|---|---|
deepseek-r1 main | RedHatAI/DeepSeek-R1-0528-quantized.w4a16 | 685B parameters, INT4 quantization, 163,840 tokens, tool calling, Claude and o3 performance |
gemma3 main | google/gemma-3-27b-it | Agentic AI workflows, 131,072 tokens, speaks 140+ languages |
llama3 main | meta-llama/Llama-3.2-90B-Vision-Instruct | multimodal (vision), 131,072 tokens |
llama3-sdsc dep | meta-llama/Llama-3.3-70B-Instruct | 8 languages, 131,072 tokens, tool use |
embed-mistral main | intfloat/e5-mistral-7b-instruct | embeddings |
qwen3 eval | Qwen/Qwen3-235B-A22B-Thinking-2507-FP8 | 235B parameters, FP8 quantization, 262,144 tokens, tool calling, Claude and o3 performance |
gorilla eval | gorilla-llm/gorilla-openfunctions-v2 | function calling |
llava-onevision eval | llava-hf/llava-onevision-qwen2-7b-ov-hf | vision |
olmo eval | allenai/OLMo-2-0325-32B-Instruct | open source |
phi3 eval | microsoft/Phi-3.5-vision-instruct | vision |
watt eval | watt-ai/watt-tool-8B | function calling |
