pulumi
diff --git a/‎content/blog/aws-reinvent-2025-roundup/agentcore-blocks.jpeg‎
334 KB b/‎content/blog/aws-reinvent-2025-roundup/agentcore-blocks.jpeg‎
334 KB
diff --git a/‎content/blog/aws-reinvent-2025-roundup/aws-ai-stack.jpeg‎
1.34 MB b/‎content/blog/aws-reinvent-2025-roundup/aws-ai-stack.jpeg‎
1.34 MB
diff --git a/‎content/blog/aws-reinvent-2025-roundup/index.md‎
Lines changed: 109 additions & 0 deletions b/‎content/blog/aws-reinvent-2025-roundup/index.md‎
Lines changed: 109 additions & 0 deletions
diff --git a/‎content/blog/aws-reinvent-2025-roundup/meta.png‎
1.25 MB b/‎content/blog/aws-reinvent-2025-roundup/meta.png‎
1.25 MB
diff --git a/‎content/blog/aws-reinvent-2025-roundup/training-stages.jpeg‎
541 KB b/‎content/blog/aws-reinvent-2025-roundup/training-stages.jpeg‎
541 KB
diff --git a/‎content/blog/aws-reinvent-2025-roundup/trainium-flywheel.jpeg‎
885 KB b/‎content/blog/aws-reinvent-2025-roundup/trainium-flywheel.jpeg‎
885 KB
@@ -0,0 +1,109 @@
+---
+title: "AWS built an integrated AI Agent training pipeline and they want you to rent it"
+allow_long_title: true
+date: 2025-12-10
+draft: false
+meta_desc: "A roundup of the most exciting announcements from AWS re:Invent 2025 and how to use them with Pulumi."
+meta_image: meta.png
+authors:
+    - adam-gordon-bell
+tags:
+    - aws
+    - reinvent
+---
+Re:Invent 2025 had the usual AWS firehose. Covering each announcement separately (Trainium specs, Nova benchmarks, AgentCore features) is easy, but you'll miss what's happening underneath. **The AI announcements fit together into a single bet. Understand it and you'll see what AWS thinks the future of enterprise AI looks like, and why not everyone's convinced enterprises want it.**
+
+Here's the first hint: AWS dropped four new foundation models (Lite, Pro, Sonic, Omni) spanning text, multimodal, and speech. They won't publish standard benchmarks. The previous Nova Pro hit 85.9% on MMLU[^2], GPT-4o territory, and Nova 2 is better. How much better? AWS won't say, and they're pushing back on general LLM benchmarks mattering.
+
+That's a strange thing for a foundation model provider to say, unless the models aren't really the product. The real announcement is Nova Forge.
+
+**Try it with Neo:** [Set up Nova 2 with Pulumi](https://app.pulumi.com/neo?prefer_signup=true&prompt=Show%20me%20how%20to%20use%20Pulumi%20in%20Python%20to%20set%20up%20AWS%20Bedrock%20permissions%20and%20call%20the%20Nova%202%20Pro%20model)
+
+## Rent the lab: Nova Forge
+
+![The LLM training pipeline: pre-training, SFT, RLHF, fine-tuning, narrowing down to prompt/context](training-stages.jpeg)
+
+[Nova Forge](https://aws.amazon.com/nova/forge/) is a managed way to run continued pretraining, fine-tuning, and reward-based alignment on Amazon's [Nova](https://aws.amazon.com/nova/) models using *your* data and your reinforcement loops. Instead of a finished, frozen model plus a thin fine-tuning API, you feed your corpora into earlier training stages while AWS handles the ugly parts: large-scale training runs, cluster management, and hosting. Access is $100,000 per year[^3], plus compute costs.
+
+Concretely, you bring big proprietary datasets (code, tickets, logs, documents) and AWS splices them into the same pipeline they use to train Nova itself. They keep doing next-token pretraining on a mix of their data and yours, then instruction tuning (SFT), then "RL-style" preference optimization, but with your domain and reward signals in the loop. You get a private Nova-based model variant for your organization (what they call a "Novella"[^4]), deployed as a managed endpoint on [Amazon Bedrock](https://aws.amazon.com/bedrock/). Your domain knowledge and reward functions are baked into the weights, not just bolted on via RAG or a LoRA adapter.
+
+Why does this exist? Costs. Training a GPT-4-class frontier model from scratch runs tens of millions to $100M+ in compute alone[^1]. That's Big Tech or nation-state territory. Even starting from a strong open-weights model, continued pretraining plus fine-tuning plus reward alignment gets expensive[^5], and requires ML engineering and RL expertise that's scarce and hard to retain.
+
+[Nova Forge](https://aws.amazon.com/nova/forge/) sidesteps both problems. You pay AWS a subscription to rent their training pipeline, the same infrastructure they use for [Nova](https://aws.amazon.com/nova/), inject your data, and get back a private model variant running on [Bedrock](https://aws.amazon.com/bedrock/). You don't own the weights and you're locked into their stack, but you get frontier-level capabilities with your data baked in, without building a mini frontier lab or staffing an ML team.
+
+Think of it as frontier-lab-as-a-service. No one else offers anything this close to a public, end-to-end training pipeline. If AWS built it, someone's asking for it. The reason they can offer it is the next announcement.
+
+## The margin weapon: Trainium
+
+<figure style="width: 40%; float: right; margin-left: 20px; margin-bottom: 10px;">
+<img src="trainium-flywheel.jpeg" alt="The Trainium flywheel: cheaper training leads to more custom models, more inference revenue, funding the next chip">
+<figcaption><i>The idealized Trainium flywheel: each generation should decrease training costs.</i></figcaption>
+</figure>
+
+AWS built their own AI accelerator so they don't have to live entirely on Nvidia. Trainium is that chip. You don't buy it; you rent it as a cloud box. This year: their third-gen chip (Trainium3) and new rack-scale 'Trn3 UltraServers' are out, with ~4× the performance and big energy/cost gains over the previous gen, positioned as a serious alternative to high-end GPUs for training and serving big models.
+
+Everyone's first take on Trainium3 is obvious: AWS wants to stop handing Nvidia half its AI revenue. Fair enough. The new 3nm chip delivers 4.4x the compute and 4x better energy efficiency than Trainium2. UltraServers pack up to 144 chips; clusters scale to a million. Trainium4 is already in the works, and it'll play nice with Nvidia hardware.
+
+But the real story is bigger than cost-cutting. Trainium is the quiet machinery that makes AWS's model-factory ambitions economically viable. You can only rent a frontier training pipeline if you can afford it, and Trainium makes it cheaper (if that word applies to six-figure entry costs).
+
+Trainium's cheaper tokens make multiple training cycles feasible. That's what makes Forge usable. Without it, co-training would cost millions per run, limiting you to one shot. With Trainium, iterative experimentation is possible. You can tune, test, and retrain until you converge on something useful.
+
+Trainium3 is the foundation for owning the entire stack, from transistor to inference endpoint, and selling that stack to the world.
+
+**Try it with Neo:** [Provision Trainium instances with Pulumi](https://app.pulumi.com/neo?prefer_signup=true&prompt=Show%20me%20how%20to%20use%20Pulumi%20in%20Python%20to%20provision%20AWS%20Trn1%20EC2%20instances%20for%20ML%20training)
+
+## The data moat play
+
+For most companies, this whole stack is overkill. If your AI roadmap is “add a chatbot and maybe summarize some tickets,” you don’t need Nova Forge, and you definitely don’t need Trainium. Hosted models plus RAG will get you 90% of the way there.
+
+The interesting case: a data moat turns into a behavior moat. If LLMs behave like the distributions they're trained on, then getting your proprietary mess (logs, incident reports, claims histories, deal flows, call transcripts) into the core training loop means the model doesn't just know your docs; it behaves like someone who's lived inside your systems for years. That’s qualitatively different from “we stuffed a PDF into the context window.”
+
+Latency and cost at scale matter too. For high-volume workflows like support triage, routing, code review, and fraud checks, "generic frontier model + giant prompt + RAG + tools" is slow and expensive. A model that has your world baked into the weights can run with smaller contexts, simpler prompts, and fewer tool calls. That's where Trainium's cheaper tokens matter: they make it plausible to iterate through multiple training cycles instead of burning your whole budget on one terrifying run.
+
+Product differentiation is the third case. If your real moat is “20 years of weird, high-signal domain data,” the only way that moat survives in an LLM world is if it shows up in the weights and in the reward signal. Otherwise you’re just another UI on top of the same public APIs everyone else is calling.
+
+Even if you get that far, a custom Nova model sitting in Bedrock is only half the story. You still need somewhere for it to act: a runtime, tools, policies, and an audit trail. That’s the gap AgentCore is meant to fill.
+
+## Where the models work: AgentCore
+
+<figure style="width: 40%; float: right; margin-left: 20px; margin-bottom: 10px;">
+<img src="agentcore-blocks.jpeg" alt="AgentCore components as Lego blocks: Runtime, Memory, Policy, Evals">
+<figcaption><i>AgentCore: building blocks so you don't have to wire agents from scratch.</i></figcaption>
+</figure>
+
+If Nova is the brain and Trainium is the muscle to build the brain, AgentCore is the nervous system.
+
+[AgentCore](https://aws.amazon.com/bedrock/agentcore/) is a managed runtime for AI agents: instead of you wiring LLMs, tools, memory, auth, and logging together on Lambda or Fargate, AWS gives you a sticky per-session microVM, a standard way to call tools (Gateway), built-in long- and short-term memory, identity/permissions, and observability/evals. You package your agent (LangGraph/Strands/custom Python, often calling Nova or Claude), deploy it as an AgentCore runtime, and AWS handles the ugly parts: session isolation, scaling, policy guardrails, and tracing. You pay Fargate-ish per-vCPU/GB-hour pricing for the runtime plus normal Bedrock token and tool-call costs.
+
+At re:Invent 2025, [AgentCore](https://aws.amazon.com/bedrock/agentcore/) picked up the missing "production" pieces: **Policy**, **Evaluations**, and **episodic Memory**. Policy (preview) hooks into AgentCore Gateway to intercept every tool call and enforce Cedar-backed, fine-grained allow/deny rules outside the model, so you can say “this agent can only refund up to $200” and know it’s enforced deterministically. Evaluations (preview) adds a built-in LLM-as-judge pipeline with stock evaluators plus custom metrics, so you can in theory skip building your own eval stack. And AgentCore Memory’s new episodic mode claims to store full “experience” records and lets agents retrieve and learn from them later.
+
+How does this come together? AWS shipped a use case.
+
+## The proof of concept: Nova Act
+
+<figure style="width: 40%; float: left; margin-right: 20px; margin-bottom: 10px;">
+<img src="aws-ai-stack.jpeg" alt="The AWS AI stack as a layer cake: Trainium at the bottom, Nova Forge, Bedrock, AgentCore on top">
+<figcaption><i>The AWS AI stack: vertically integrated from silicon to agent runtime. Nova Act uses the full stack.</i></figcaption>
+</figure>
+
+Nova Act is the concrete example. It handles browser-based UI automation: form filling, search-and-extract, QA testing. Amazon claims ~90% reliability. It deploys directly to AgentCore Runtime.
+
+It's not "an LLM plus Playwright." Nova Act uses a specialized Nova 2 Lite variant trained on synthetic "web gym" environments: browser simulations that mirror enterprise UIs and provide an automatic reward signal when tasks complete correctly. Instead of judging output quality, the RL loop asks: did the workflow succeed?
+
+That specialization is wrapped in AgentCore. The workflows ship as containers with access to the AgentCore Browser Tool, IAM, observability, and (increasingly) Policy and Evaluations. The platform handles isolation, scaling, logging, and guardrails, so Nova Act behaves like a production automation system rather than a brittle demo.
+
+Seen this way, Nova Act is Amazon’s reference implementation for a certain class of enterprise agents: start with a strong general model, specialize it through domain-specific RL in a controlled environment, and run it on AgentCore with tools and policies around it. It’s the pattern AWS expects customers to adopt.
+
+## One stack to rule them all
+
+Forge, Trainium, AgentCore, and Nova Act connect. Trainium lowers the cost of big training runs. Nova Forge lets enterprises plug their own data and rewards into those runs. AgentCore is where the resulting models *act*, with tools, memory, and policy guardrails. Nova Act shows the pattern in action: a domain-specialized Nova model, trained in a controlled loop, running as a production agent.
+
+The bet behind all this is that enterprise AI won't hinge on generic chatbots, but on **agents shaped by proprietary data and domain feedback**. Most companies won't build the infrastructure to train and operate those agents, but AWS is offering to rent them the whole pipeline.
+
+**Try it with Neo:** [Deploy a Bedrock-powered API with Pulumi](https://app.pulumi.com/neo?prefer_signup=true&prompt=Create%20a%20Python%20Pulumi%20program%20that%20deploys%20an%20AWS%20Lambda%20function%20calling%20Bedrock%20Nova%20Pro%20and%20exposes%20it%20via%20API%20Gateway)
+
+[^1]: Sam Altman stated GPT-4 cost "more than $100 million" to train. [Source](https://news.ycombinator.com/item?id=35971363)
+[^2]: Amazon Nova Technical Report. [Source](https://assets.amazon.science/96/7d/0d3e59514abf8fdcfafcdc574300/nova-tech-report-20250317-0810.pdf)
+[^3]: CNBC reporting on Nova Forge pricing. [Source](https://www.cnbc.com/2025/12/02/amazon-nova-forge-lets-clients-customize-ai-models-for-100000-a-year.html)
+[^4]: SiliconANGLE on AWS "Novella" terminology. [Source](https://siliconangle.com/2025/12/02/aws-introduces-nova-forge-training-bespoke-novella-frontier-models/)
+[^5]: Full-weight SFT on a 70B model generally costs tens of thousands of dollars; RLHF data + compute for large models typically lands in the $100K–$1M+ range. [Source 1](https://www.cudocompute.com/blog/what-is-the-cost-of-training-large-language-models), [Source 2](https://arxiv.org/abs/2403.14101)