News

OpenAI's Cutting-Edge AI Models Now Powering Locally on NVIDIA RTX GPUs: A New Era for Personal AI

Published: Tuesday, August 5, 2025 | Posted By: Dennis

The world of artificial intelligence is rapidly evolving, and NVIDIA is at the forefront, accelerating innovation from the cloud to the personal computer. In a significant development, NVIDIA and OpenAI are collaborating to bring OpenAI’s groundbreaking new open-source gpt-oss models – specifically gpt-oss-20b and gpt-oss-120b – to NVIDIA GeForce RTX and RTX PRO GPUs. This means developers and AI enthusiasts can now experience the power of state-of-the-art AI reasoning capabilities directly on their own hardware, unlocking a new era of local AI applications. This move not only democratizes access to powerful AI but also strengthens U.S. technology leadership in a rapidly competitive landscape.

These aren't your average language models. The gpt-oss models boast "chain-of-thought" capabilities and adjustable reasoning effort levels, leveraging a popular "mixture-of-experts" architecture. They are designed to support advanced features like instruction-following and tool use, initially trained on NVIDIA’s powerful H100 GPUs. Crucially, they can handle impressively long context lengths – up to 131,072 tokens – making them ideal for complex tasks like web search, coding assistance, document comprehension, and in-depth research. The release also marks the debut of MXFP4 models on NVIDIA RTX, a precision type that delivers high model quality with fast, efficient performance while minimizing resource consumption. This is a significant step towards making powerful AI more accessible and practical for a wider range of users.

Getting Started: Easy Integration with Ollama and Beyond

So, how can you experience this firsthand? The easiest way to jump in is through the popular Ollama app. Optimized specifically for RTX GPUs with at least 24GB of VRAM, Ollama provides a user-friendly interface for quick and easy chatting with the OpenAI models. No complex configurations are needed – simply select the desired model from the dropdown menu and start interacting. Ollama’s new UI also includes features like PDF and text file support within chats, multimodal capabilities for image prompts, and customizable context lengths, further enhancing the user experience. For developers, Ollama offers a command-line interface and SDK for integrating these models into applications and workflows.

Beyond Ollama, NVIDIA continues to collaborate with the open-source community, actively optimizing performance on RTX GPUs through tools like llama.cpp and the GGML tensor library. Windows developers can also leverage Microsoft AI Foundry Local, currently in public preview, which integrates into workflows via the command line, SDK, or APIs. Foundry Local utilizes ONNX Runtime, optimized through CUDA, with upcoming support for NVIDIA TensorRT. This broad ecosystem of tools ensures that developers have multiple avenues to harness the power of OpenAI’s models on their RTX PCs.

The Future of AI: Local Inference and Accelerated Innovation

This collaboration between NVIDIA and OpenAI signifies a pivotal shift towards local AI inference. By bringing these powerful models to the RTX platform, NVIDIA is empowering developers and enthusiasts to build innovative applications without relying solely on cloud-based services. This unlocks new possibilities for privacy-focused AI, offline functionality, and personalized experiences. As Jensen Huang, founder and CEO of NVIDIA, stated, this move "strengthens U.S. technology leadership in AI." With ongoing community contributions and continuous optimization, the future of AI looks brighter than ever, fueled by the combined power of OpenAI’s models and NVIDIA’s RTX GPUs.

To learn more and explore the possibilities, check out the NVIDIA Technical Blog and join the NVIDIA community on Discord. Stay tuned for more updates and innovations in the rapidly evolving world of AI!

Return to Hardware Asylum News Home