The Full Toolkit

You’re comfortable with AI. Here’s everything VT gives you to work with.

If you landed here without going through the earlier pages, that’s fine — the previous tiers cover the everyday tools (Gemini, ARC web chat, Copilot) and the next-step tools (Copilot in Office, NotebookLM, document upload, the AI Pro upgrade). This page focuses on the things you only need if you want to run AI locally on your own machine, build your own tools against an API, or wire AI into your development environment.

Most people in AAD never need anything on this page. If you’re curious anyway, read on — nothing here is mandatory and nothing here will break your computer.

Already Covered

Quick reference for tools from the previous tiers — follow the links for full details.

Google Gemini — Chat, summarize, Gemini Live voice mode (Flash model on standard VT accounts; higher-end models via AI Pro add-on). Getting Started
VT ARC LLM (llm.arc.vt.edu) — Browser-based chat, image gen/analysis, VT-hosted and ITSO-approved. Getting Started
Microsoft Copilot (free) — Standalone web chat, no Office integration. Getting Started
Microsoft Copilot (paid) — Copilot in Word, Excel, PowerPoint, Outlook, Teams; Copilot Studio. $216/year. Doing More
ARC Document Upload — Upload a PDF or doc to ARC and ask questions across it. Private, on VT servers. Doing More
Google NotebookLM — Multiple sources, AI summaries, audio overviews. Doing More
Groq Cloud — Fast open source model testing. Third-party — no sensitive data. Doing More
HuggingFace Chat — Try dozens of open source models. Third-party — no sensitive data. Doing More
Gemini AI Pro for Education — 1M context, 20 Deep Research reports/day, Docs/Drive integration, full Gemini Live on higher-end models. ~$15–$24/user/month. Doing More

Ollama — Run Models Locally

Download from ollama.com, install it, pull a model, and run it. That’s it.

ollama pull llama3.2
ollama run llama3.2

100% offline. Nothing leaves your machine. 16GB RAM is the practical baseline — most AAD-managed machines qualify.

ARC already gives you open source models in a browser. Ollama is for when you want offline access, a wider model selection, or full control over which AI you use and how it runs.

Your first Ollama session, step by step:

Download from ollama.com and run the installer. On Mac, the installer drops an icon in your menu bar that means it’s running in the background.
Open Terminal (Mac) or PowerShell (Windows).
Type ollama pull llama3.2 and hit enter. This downloads a model — around 2 GB. Grab a coffee.
Type ollama run llama3.2 and hit enter. You’re now chatting with a model running entirely on your machine — no internet required after this point.
Type /bye when you’re done.

If 2 GB feels like a lot, try llama3.2:1b instead — it’s smaller and faster, and still good enough for most chat tasks. If you have a powerful machine and want something bigger, browse the model library on ollama.com.

ARC API Access

ARC runs an OpenAI-compatible endpoint. It’s free to use with your VT credentials — no separate ARC account required for API access.

Get your API key. Sign in at llm.arc.vt.edu, then go to your profile → Settings → Account → API Keys and generate one. It will start with sk-. Treat it like a password — sharing it is a policy violation and gets reported to ITSO. For any headless or server-side use, store it in an environment variable rather than hardcoding it in a script:

export VT_ARC_API_KEY="sk-your-key-here"

Base URL: https://llm-api.arc.vt.edu/api/v1/

Rate limits (shared between web UI and API): 60 requests/minute · 1,000/hour · 3,000 per 3-hour sliding window. If that’s not enough, see Open OnDemand below.

Current models. ARC rotates its model catalog, but as of April 2026 the following three are available via the API:

Model ID	Notes
`gpt-oss-120b`	OpenAI’s open-weight reasoning model. Fast, good for general tasks, supports the `reasoning_effort` control described below.
`Kimi-K2.5`	Moonshot AI’s multimodal model. Handles text + images. Good for complex or vision-heavy tasks.
`MiniMax-M2.5`	General-purpose text model that leans toward coding, reasoning, search-heavy workflows, and office-task automation.

Always treat the portal at llm.arc.vt.edu as authoritative — models rotate in and out. Use the exact model ID in the model field of your API request. To list models programmatically:

curl https://llm-api.arc.vt.edu/api/v1/models \
  -H "Authorization: Bearer $VT_ARC_API_KEY"

Python — the standard starting point

Any library that speaks OpenAI’s /v1/ protocol will work — point it at ARC’s base URL and pass your key.

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ["VT_ARC_API_KEY"],
    base_url="https://llm-api.arc.vt.edu/api/v1"
)

response = client.chat.completions.create(
    model="gpt-oss-120b",
    messages=[{"role": "user", "content": "Summarize this for me."}]
)
print(response.choices[0].message.content)

Everything more interesting — reasoning effort control, web search, document upload, vision — is just additional fields on the same request. ARC’s docs publish the current field names and example payloads; model support can change, so treat the ARC docs and the ARC portal as authoritative.

What the API lets you do beyond basic chat

Document Q&A (RAG). Upload a PDF, policy doc, or contract, then pass the returned file ID into a normal chat completion:

file_id=$(curl -s -X POST \
  -H "Authorization: Bearer $VT_ARC_API_KEY" \
  -H "Accept: application/json" \
  -F "file=@document.pdf" \
  https://llm-api.arc.vt.edu/api/v1/files/ | jq -r '.id')

curl -X POST https://llm-api.arc.vt.edu/api/v1/chat/completions \
  -H "Authorization: Bearer $VT_ARC_API_KEY" \
  -H "Content-Type: application/json" \
  -d "{
    \"model\": \"gpt-oss-120b\",
    \"messages\": [{\"role\": \"user\", \"content\": \"Create a summary of the document.\"}],
    \"files\": [{\"type\": \"file\", \"id\": \"${file_id}\"}]
  }"

Web search. ARC exposes web search as a tool. Add tool_ids when the question needs current information:

curl -X POST https://llm-api.arc.vt.edu/api/v1/chat/completions \
  -H "Authorization: Bearer $VT_ARC_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-oss-120b",
    "messages": [{"role": "user", "content": "Who is the US president right now?"}],
    "tool_ids": ["server:websearch"]
  }'

Reasoning effort. gpt-oss-120b currently supports low, medium, and high:

curl -X POST https://llm-api.arc.vt.edu/api/v1/chat/completions \
  -H "Authorization: Bearer $VT_ARC_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-oss-120b",
    "messages": [{"role": "user", "content": "Compare three ways to organize this project plan."}],
    "reasoning_effort": "high"
  }'

Vision / image understanding. Kimi-K2.5 accepts text plus an inline base64 image in the same message:

import base64
import os
from pathlib import Path
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["VT_ARC_API_KEY"],
    base_url="https://llm-api.arc.vt.edu/api/v1",
)

image_b64 = base64.b64encode(Path("screenshot.jpg").read_bytes()).decode("utf-8")

response = client.chat.completions.create(
    model="Kimi-K2.5",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe the chart in this image."},
            {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_b64}"}},
        ],
    }],
)

print(response.choices[0].message.content)

Those are the concrete mechanics. The higher-level use cases are the same: batch summarization, department-specific chatbots, classification/tagging workflows, and structured extraction from messy source material.

What does this actually let you build? A few practical examples:

A script that takes a folder of meeting notes and produces a one-page summary of each
A custom chatbot for your department’s policies, hosted internally
A nightly process that reads through new tickets and tags them by topic
A research tool that pulls structured data out of free-form survey responses

If you’ve never written code against an API before, the ARC docs have copy-paste starters in Python and JavaScript. You don’t need to be a developer — if you can edit a script and run it from a terminal, you can use this. If you’d rather have someone help you scope a project, contact AAD IT.

IDE Integration

ARC’s API plugs into your IDE as an OpenAI-compatible backend.

VS Code: GitHub Copilot Chat extension, Insiders v1.104+. Point it at the ARC API endpoint instead of OpenAI’s. Full setup at docs.arc.vt.edu/ai/040_ides.html.

IntelliJ IDEA: Native AI plugin, v2025.3.2+. Same deal — configure the endpoint URL and your API key.

Config notes:

Endpoint: https://llm-api.arc.vt.edu/api/v1/
Model names rotate — check the ARC portal for current available models
Token limits vary by model (example: 65,536 input / 32,768 output for some current models)

The ARC docs have current model names and full config walkthroughs.

OpenClaw — point a local agent at ARC

OpenClaw is an open-source autonomous agent framework that runs headlessly and gives agents actual “hands” — shell execution, file I/O, web browsing. Because ARC exposes an OpenAI-compatible endpoint, you can point OpenClaw at it as a custom provider and run fully local agents backed by VT-hosted models.

Put your ARC key in OpenClaw’s environment as VT_ARC_API_KEY, then add a provider entry in your OpenClaw config. The current custom-provider shape looks like this:

{
  "models": {
    "providers": {
      "arc": {
        "baseUrl": "https://llm-api.arc.vt.edu/api/v1",
        "apiKey": "${VT_ARC_API_KEY}",
        "api": "openai-completions",
        "models": [
          {
            "id": "gpt-oss-120b",
            "name": "ARC GPT-OSS 120B",
            "reasoning": true,
            "input": ["text"],
            "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
            "contextWindow": 65536,
            "maxTokens": 32768
          },
          {
            "id": "Kimi-K2.5",
            "name": "ARC Kimi K2.5",
            "reasoning": false,
            "input": ["text", "image"],
            "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
            "contextWindow": 65536,
            "maxTokens": 32768
          }
        ]
      }
    }
  },
  "agents": {
    "defaults": {
      "model": { "primary": "arc/gpt-oss-120b" }
    }
  }
}

Then run the onboarding flow with openclaw onboard --install-daemon, check the gateway with openclaw gateway status, and open the UI with openclaw dashboard. The VT-specific pieces are the base URL, your VT_ARC_API_KEY, and the current ARC model IDs. If ARC rotates models or token limits, update the models list to match what the portal shows.

Open OnDemand — when rate limits aren’t enough

The shared API is rate-limited. For batch jobs, large-scale automation, or anything that would burn through 3,000 requests in 3 hours, use Open OnDemand instead — dedicated sessions with no rate limits, running directly on ARC hardware.

Requires an ARC HPC account: arc.vt.edu
Access at ood.arc.vt.edu; connect via VPN if you’re off-campus
Each launched session generates its own scoped API key
Sessions are exclusive, time-limited, and auto-expire after a period of inactivity
Model selection is broader than the shared endpoint

Researcher track: deploy your own vLLM on Falcon

For research workloads that need a specific model, custom parameters, or full control over the serving stack, ARC lets you deploy your own vLLM instance as a Slurm job on the Falcon cluster, then SSH-tunnel to it locally. The result is an OpenAI-compatible endpoint you own for the duration of your job.

This is a power-user path with real prerequisites — HPC account, Slurm basics, familiarity with model weights on /common/data/models/. See the ARC docs for current job templates.

Local, cloud, or API — which should I use?

Three ways to interact with these models, each with tradeoffs:

ARC web interface (llm.arc.vt.edu). The easiest by far. Browser, VT login, done. Use this for most day-to-day work.
Ollama (local). Runs on your machine. More limited model selection (depending on your hardware), but completely offline. Use this when you want to experiment without internet, or when you’re somewhere without reliable connectivity.
ARC API. For when you want to build something — a script, a tool, an integration. Same models as the web interface, accessed programmatically. Use this when you’ve outgrown copy-pasting into a chat window and you want to automate something.

Most people never need anything beyond the ARC web interface. The other two are there for when the web interface stops being enough.

Open Source Model Landscape

There are hundreds of open source models. They vary wildly in capability, size, and what they’re good at. A few things worth knowing:

Hugging Face is the central hub. Model cards, leaderboards, and the HuggingFace Chat interface (which lets you test models in a browser without installing anything) all live there.

For a solid 17-minute orientation to the different ways to run open source models, Tina Huang’s video is worth your time: Every Way To Run Open Source AI Models.

The landscape changes fast. Models rotate on ARC, new ones drop monthly, benchmarks shift. Don’t get attached to specific model names — treat ARC and Ollama as services, not as “the model I use.”

Security & compliance — the short version

ARC runs fully on-premises at VT. Nothing you send through it leaves the university, which is why it’s the right place for work-related data that you couldn’t put into a third-party service. Anyone with VT credentials can use the shared API — no separate ARC account required.

If you work with export-controlled or similarly regulated data, check with AAD IT before processing it through ARC — there are data categories that need additional review even on VT-owned infrastructure. And API keys: don’t share them, don’t commit them to a repo, treat them exactly like passwords.

Questions? Open a ticket or email aadithelp@vt.edu.

Last updated: April 2026