{"id":1511,"date":"2025-05-10T12:00:43","date_gmt":"2025-05-10T12:00:43","guid":{"rendered":"https:\/\/violethoward.com\/new\/openai-introduces-reinforcement-fine-tuning-for-o4-model\/"},"modified":"2025-05-10T12:00:43","modified_gmt":"2025-05-10T12:00:43","slug":"openai-introduces-reinforcement-fine-tuning-for-o4-model","status":"publish","type":"post","link":"https:\/\/violethoward.com\/new\/openai-introduces-reinforcement-fine-tuning-for-o4-model\/","title":{"rendered":"OpenAI introduces reinforcement fine-tuning for o4 model"},"content":{"rendered":" \r\n<br><div>\n\t\t\t\t<div id=\"boilerplate_2682874\" class=\"post-boilerplate boilerplate-before\">\n<p><em>Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More<\/em><\/p>\n\n\n\n<hr class=\"wp-block-separator has-css-opacity is-style-wide\"\/>\n<\/div><p>OpenAI today announced on its developer-focused account on the social network X that third-party software developers outside the company can now access reinforcement fine-tuning (RFT) for its new o4-mini language reasoning model. This enables them to customize a new, private version of it based on their enterprise\u2019s unique products, internal terminology, goals, employees, processes and more.<\/p>\n\n\n\n<p>Essentially, this capability lets developers take the model available to the general public and tweak it to better fit their needs using OpenAI\u2019s platform dashboard.<\/p>\n\n\n\n<p>Then, they can deploy it through OpenAI\u2019s application programming interface (API), another part of its developer platform, and connect it to their internal employee computers, databases, and applications.<\/p>\n\n\n\n<p>Once deployed, if an employee or leader at the company wants to use it through a custom internal chatbot or custom OpenAI GPT to pull up private, proprietary company knowledge, answer specific questions about company products and policies, or generate new communications and collateral in the company\u2019s voice, they can do so more easily with their RFT version of the model.<\/p>\n\n\n\n<p>However, one cautionary note: research has shown that fine-tuned models may be more prone to jailbreaks and hallucinations, so proceed cautiously!<\/p>\n\n\n\n<p>This launch expands the company\u2019s model optimization tools beyond supervised fine-tuning (SFT) and introduces more flexible control for complex, domain-specific tasks. <\/p>\n\n\n\n<p>Additionally, OpenAI announced that supervised fine-tuning is now supported for its GPT-4.1 nano model, the company\u2019s most affordable and fastest offering to date.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-how-does-reinforcement-fine-tuning-rft-help-organizations-and-enterprises\">How does Reinforcement Fine-Tuning (RFT) help organizations and enterprises?<\/h2>\n\n\n\n<p>RFT creates a new version of OpenAI\u2019s o4-mini reasoning model that is automatically adapted to the user\u2019s or their enterprise\/organization\u2019s goals.<\/p>\n\n\n\n<p>It does so by applying a feedback loop during training, which developers at large enterprises (or even independent developers working independently) can now initiate relatively simply, easily and affordably through OpenAI\u2019s online developer platform.<\/p>\n\n\n\n<p>Instead of training on a set of questions with fixed correct answers \u2014 which is what traditional supervised learning does \u2014 RFT uses a grader model to score multiple candidate responses per prompt.<\/p>\n\n\n\n<p>The training algorithm then adjusts model weights to make high-scoring outputs more likely.<\/p>\n\n\n\n<p>This structure allows customers to align models with nuanced objectives such as an enterprise\u2019s \u201chouse style\u201d of communication and terminology, safety rules, factual accuracy, or internal policy compliance.<\/p>\n\n\n\n<p>To perform RFT, users need to:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Define a grading function or use OpenAI model-based graders.<\/li>\n\n\n\n<li>Upload a dataset with prompts and validation splits.<\/li>\n\n\n\n<li>Configure a training job via API or the fine-tuning dashboard.<\/li>\n\n\n\n<li>Monitor progress, review checkpoints and iterate on data or grading logic.<\/li>\n<\/ol>\n\n\n\n<p>RFT currently supports only o-series reasoning models and is available for the o4-mini model.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-early-enterprise-use-cases\">Early enterprise use cases<\/h2>\n\n\n\n<p>On its platform, OpenAI highlighted several early customers who have adopted RFT across diverse industries:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Accordance AI<\/strong> used RFT to fine-tune a model for complex tax analysis tasks, achieving a 39% improvement in accuracy and outperforming all leading models on tax reasoning benchmarks.<\/li>\n\n\n\n<li><strong>Ambience Healthcare<\/strong> applied RFT to ICD-10 medical code assignment, raising model performance by 12 points over physician baselines on a gold-panel dataset.<\/li>\n\n\n\n<li><strong>Harvey<\/strong> used RFT for legal document analysis, improving citation extraction F1 scores by 20% and matching GPT-4o in accuracy while achieving faster inference.<\/li>\n\n\n\n<li><strong>Runloop<\/strong> fine-tuned models for generating Stripe API code snippets, using syntax-aware graders and AST validation logic, achieving a 12% improvement.<\/li>\n\n\n\n<li><strong>Milo<\/strong> applied RFT to scheduling tasks, boosting correctness in high-complexity situations by 25 points.<\/li>\n\n\n\n<li><strong>SafetyKit<\/strong> used RFT to enforce nuanced content moderation policies and increased model F1 from 86% to 90% in production.<\/li>\n\n\n\n<li><strong>ChipStack<\/strong>, <strong>Thomson Reuters<\/strong>, and other partners also demonstrated performance gains in structured data generation, legal comparison tasks and verification workflows.<\/li>\n<\/ul>\n\n\n\n<p>These cases often shared characteristics: clear task definitions, structured output formats and reliable evaluation criteria\u2014all essential for effective reinforcement fine-tuning.<\/p>\n\n\n\n<p>RFT is available now to verified organizations. To help improve future models, OpenAI offers teams that share their training datasets with OpenAI a 50% discount. Interested developers can get started using OpenAI\u2019s RFT documentation and dashboard.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-pricing-and-billing-structure\">Pricing and billing structure<\/h2>\n\n\n\n<p>Unlike supervised or preference fine-tuning, which is billed per token, RFT is billed based on time spent actively training. Specifically:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>$100 per hour of core training time (wall-clock time during model rollouts, grading, updates and validation).<\/li>\n\n\n\n<li>Time is prorated by the second, rounded to two decimal places (so 1.8 hours of training would cost the customer $180).<\/li>\n\n\n\n<li>Charges apply only to work that modifies the model. Queues, safety checks, and idle setup phases are not billed.<\/li>\n\n\n\n<li>If the user employs OpenAI models as graders (e.g., GPT-4.1), the inference tokens consumed during grading are billed separately at OpenAI\u2019s standard API rates. Otherwise, the company can use outside models, including open source ones, as graders.<\/li>\n<\/ul>\n\n\n\n<p>Here is an example cost breakdown:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th><strong>Scenario<\/strong><\/th><th><strong>Billable Time<\/strong><\/th><th><strong>Cost<\/strong><\/th><\/tr><\/thead><tbody><tr><td>4 hours training<\/td><td>4 hours<\/td><td>$400<\/td><\/tr><tr><td>1.75 hours (prorated)<\/td><td>1.75 hours<\/td><td>$175<\/td><\/tr><tr><td>2 hours training + 1 hour lost (due to failure)<\/td><td>2 hours<\/td><td>$200<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>This pricing model provides transparency and rewards efficient job design. To control costs, OpenAI encourages teams to:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Use lightweight or efficient graders where possible.<\/li>\n\n\n\n<li>Avoid overly frequent validation unless necessary.<\/li>\n\n\n\n<li>Start with smaller datasets or shorter runs to calibrate expectations.<\/li>\n\n\n\n<li>Monitor training with API or dashboard tools and pause as needed.<\/li>\n<\/ul>\n\n\n\n<p>OpenAI uses a billing method called \u201ccaptured forward progress,\u201d meaning users are only billed for model training steps that were successfully completed and retained.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-so-should-your-organization-invest-in-rfting-a-custom-version-of-openai-s-o4-mini-or-not\">So should your organization invest in RFTing a custom version of OpenAI\u2019s o4-mini or not?<\/h2>\n\n\n\n<p>Reinforcement fine-tuning introduces a more expressive and controllable method for adapting language models to real-world use cases. <\/p>\n\n\n\n<p>With support for structured outputs, code-based and model-based graders, and full API control, RFT enables a new level of customization in model deployment. OpenAI\u2019s rollout emphasizes thoughtful task design and robust evaluation as keys to success.<\/p>\n\n\n\n<p>Developers interested in exploring this method can access documentation and examples via OpenAI\u2019s fine-tuning dashboard. <\/p>\n\n\n\n<p>For organizations with clearly defined problems and verifiable answers, RFT offers a compelling way to align models with operational or compliance goals \u2014 without building RL infrastructure from scratch.<\/p>\n<div id=\"boilerplate_2660155\" class=\"post-boilerplate boilerplate-after\"><div class=\"Boilerplate__newsletter-container vb\">\n<div class=\"Boilerplate__newsletter-main\">\n<p><strong>Daily insights on business use cases with VB Daily<\/strong><\/p>\n<p class=\"copy\">If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.<\/p>\n<p class=\"Form__newsletter-legal\">Read our Privacy Policy<\/p>\n<p class=\"Form__success\" id=\"boilerplateNewsletterConfirmation\">\n\t\t\t\t\tThanks for subscribing. Check out more VB newsletters here.\n\t\t\t\t<\/p>\n<p class=\"Form__error\">An error occured.<\/p>\n<\/p><\/div>\n<div class=\"image-container\">\n\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/venturebeat.com\/wp-content\/themes\/vb-news\/brand\/img\/vb-daily-phone.png\" alt=\"\"\/>\n\t\t\t\t<\/div>\n<\/p><\/div>\n<\/div>\t\t\t<\/div>\r\n<br>\r\n<br><a href=\"https:\/\/venturebeat.com\/ai\/you-can-now-fine-tune-your-enterprises-own-version-of-openais-o4-mini-reasoning-model-with-reinforcement-learning\/\">Source link <\/a>","protected":false},"excerpt":{"rendered":"<p>Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI today announced on its developer-focused account on the social network X that third-party software developers outside the company can now access reinforcement fine-tuning (RFT) for its new o4-mini language reasoning model. This enables them to [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":812,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[33],"tags":[],"class_list":["post-1511","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-automation"],"aioseo_notices":[],"jetpack_featured_media_url":"https:\/\/violethoward.com\/new\/wp-content\/uploads\/2025\/03\/vb-daily-phone.png","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts\/1511","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/comments?post=1511"}],"version-history":[{"count":0,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts\/1511\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/media\/812"}],"wp:attachment":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/media?parent=1511"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/categories?post=1511"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/tags?post=1511"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}<!-- This website is optimized by Airlift. Learn more: https://airlift.net. Template:. Learn more: https://airlift.net. Template: 69e302c146fa5c92dc28ac12. Config Timestamp: 2026-04-18 04:04:16 UTC, Cached Timestamp: 2026-04-29 05:52:00 UTC -->