It was an honor to hang out with Jensen Huang, CEO of
@nvidia
, and do a long-form podcast with him. Really fun & fascinating technical deep-dive conversation on & off the mic. One of the most brilliant & thoughtful human beings I've ever met. NVIDIA is the most valuable company
Anthropic releases Claude 3.5 Sonnet, their most capable model yet. It outperforms GPT-4o and Gemini 1.5 Pro on multiple benchmarks while being faster and more cost-effective than Claude 3 Opus.
GPT-4o brings native multimodal capabilities to ChatGPT, enabling real-time voice conversations, image understanding, and code interpretation in a single model.
Y Combinator CEO Garry Tan said builders should not sleep on Groq paired with Llama 4 Maverick, describing the combination as very useful for low-latency tasks. The post is notable because real-time responsiveness remains one of the hardest constraints in production AI systems, especially for assistants and agent workflows where delays directly affect usability. Tan’s endorsement suggests the conversation is shifting from pure benchmark leadership toward which model-and-inference stacks actually feel fast enough to use continuously.
LangChain announced `langgraph deploy`, a CLI flow that deploys an agent to LangSmith Deployment with a single command. The release is notable because it targets a familiar pain point in the agent stack: moving from experiments into something teams can run and monitor in production without stitching together custom deployment steps. In effect, LangChain is packaging deployment as a first-class part of the LangGraph workflow rather than an afterthought for platform engineers.
Google AI Developers said Gemini Embedding 2 is now available in preview through the Gemini API and Vertex AI, describing it as the company’s most capable and first fully multimodal embedding model built on the Gemini architecture. Jeff Dean separately amplified the launch, saying the model brings text, images, video, audio, and documents into the same embedding space. The update matters because embeddings sit underneath search, retrieval, and recommendation systems, and a stronger multimodal option gives developers a more practical foundation for building cross-format AI products.
A Hugging Face post shared by Georgi Gerganov introduced Storage Buckets, a new S3-like object storage option on the Hugging Face Hub and the first new repository type the platform has added in four years. Unlike the Hub’s standard versioned repos, Storage Buckets are mutable and non-versioned, with pricing positioned below Amazon S3. The release is significant because it shows Hugging Face expanding from model distribution into underlying storage infrastructure for AI teams building production systems.
Dimillian said he is joining OpenAI at the end of the month and will work on Codex as part of the developer experience team. He said he plans to bring what he learned from building Codex Monitor into the role, signaling that OpenAI is continuing to invest not just in coding models themselves, but in the tooling and workflows around how developers use them. The post matters because it points to deeper product focus on making Codex more usable inside real software teams, where monitoring, feedback loops, and developer experience often determine adoption.
Notion introduced Number Charts, a new dashboard element for displaying a single metric with customizable threshold-based colors. In the company’s post, the feature is positioned as a fast way to let one number tell the story, with yellow, green, and red states for quick status scanning. The launch broadens Notion’s reporting and dashboard toolkit, giving teams a simpler way to surface KPIs inside shared workspaces.
Andreessen Horowitz said the latest edition of its Top 100 Gen AI Consumer Apps ranking shows how quickly the consumer AI market is evolving beyond a narrow set of chat products. The firm argued that the newest leaders are increasingly global, multimodal, and embedded in everyday workflows, while Erik Torenberg separately amplified the release as evidence that the category deserves a refreshed lens on usage. The post matters because a16z’s ranking is one of the most widely watched snapshots of consumer AI adoption, and this edition points to a market where sustained engagement and product diversity are starting to matter as much as raw novelty.
Vinod Khosla said the real bar for robotics is autonomous performance in production environments, not polished lab demos, while highlighting Rhoda AI as a startup that impressed him with strong results from remarkably little robot training data. He emphasized the company’s use of internet-scale video pretraining to build a physical prior before deployment, suggesting a path to more general robotic capability without relying on massive amounts of expensive robot-specific data. The post matters because it captures a shift in how leading investors are judging physical AI: not by whether a robot can complete a staged demo, but by whether it can generalize reliably in the messy settings where commercial value is actually created.
Noam Brown said the core recipe behind frontier reasoning models looks surprisingly similar to AlphaGo. In his framing, the pattern is: imitate large volumes of human data, scale inference-time reasoning, and then apply reinforcement learning to move beyond imitation. The post stands out because it offers a concise mental model for how modern reasoning systems are evolving, linking today’s chain-of-thought and test-time compute strategies back to a landmark earlier system.
Hume said it is open-sourcing TADA, a text-audio dual alignment model designed to generate text and speech in one synchronized stream. The company said the architecture is meant to reduce token-level hallucinations while improving response speed, two of the main constraints that have limited real-time voice agents. The post matters because open-source voice models have often lagged behind closed systems on reliability and interaction quality, and Hume is positioning TADA as a practical step toward production-grade spoken AI.
Posts amplified by Databricks accounts point to OfficeQA Pro as a benchmark designed to test grounded reasoning on realistic enterprise workflows, including finding documents, extracting values, and performing analyses. The key claim is that frontier agents still score under 50 percent end-to-end. If that result holds up, it suggests the gap between flashy reasoning benchmarks and dependable workplace automation is still much wider than the AI hype cycle implies.
Google DeepMind said that a decade after AlphaGo, the techniques pioneered in that system are still compounding across the company’s research stack. In a new retrospective, the lab said those methods have already been used to prove mathematical statements and to assist scientists in making new discoveries. The broader significance is that DeepMind is presenting AlphaGo not as a historical trophy but as an early foundation for agentic systems that can reason through hard scientific problems.
Dan Shipper shared a custom Codex skill that connects to PostHog and a production database, then scans product data to identify bottlenecks and actionable growth insights. He described it as a “growth investigator” that works surprisingly well, pointing to a broader pattern in this batch where agentic tools are creeping from code generation into product analysis and marketing operations. The idea matters because it hints at a next phase for coding agents: not just helping teams build software, but helping them diagnose why the software is or is not growing.
Stanford researcher Percy Liang argued that simulation is becoming the next frontier for AI because the field’s most impressive breakthroughs happen when models can take actions inside a clear environment and learn from well-defined consequences. He pointed to examples like AlphaGo, IMO-level problem solving, and systems that can write complete apps from scratch inside a docker container, where reinforcement learning can safely explore and improve. The post matters because it frames the next wave of progress less as a race for isolated reasoning benchmarks and more as a race to build realistic environments where models can act, be evaluated, and iterated end to end. In the same batch, Databricks promoted OfficeQA Pro as an enterprise benchmark for grounded reasoning, reinforcing the idea that AI evaluation is moving toward task environments rather than standalone tests.
Runway says users can now access Characters directly inside the web app, where they can try preset personalities or create their own real-time assistants. The announcement turns the earlier Characters launch into a more concrete product surface and hints at a broader strategy: AI media tools are evolving from generation interfaces into persistent, interactive agent environments. Examples shared by users already show the feature being adapted for gaming guides and niche knowledge assistants.
AIFrontliner highlighted the release of LTX-2.3, describing it as a major overhaul of the open-weights video model with public weights, training code, benchmarks, and LoRAs. The thread called out sharper output from a rebuilt VAE, better image-to-video motion, native portrait generation up to 1080p, cleaner audio, and direct API access for builders. The release matters because it strengthens the open-source side of the fast-moving video model race at a time when many of the best-known systems are still gated behind closed interfaces.
NVIDIA said it is partnering with Thinking Machines to deploy at least one gigawatt of Vera Rubin systems for frontier AI model training. The announcement matters because it pushes frontier infrastructure talk beyond chip counts and into utility-scale capacity, signaling that the next tier of model builders will be judged partly by how much power and compute they can stand up, not just by benchmark results. For the broader market, it is another sign that frontier AI is becoming an industrial systems race spanning hardware, power, and platform control.
Y Combinator CEO Garry Tan spotlighted Legora's announcement that it has raised $550 million in a Series D led by Accel at a $5.55 billion valuation. The post is a reminder that the AI funding market is still rewarding companies with a sharp vertical wedge and credible enterprise adoption. For legal AI specifically, it suggests the category is moving from experimentation into major-scale capital formation.
Google launch-week updates highlighted a broader Gemini push into productivity and retrieval. Logan Kilpatrick said the company is rolling out a new Gemini-powered Docs, Sheets, Slides, and Drive experience with AI Overviews, fully editable AI-generated slides, and new grounding sources that make document writing more context aware. Hours later, he also introduced Gemini Embedding 2 as a new multimodal embedding model spanning text, images, video, audio, and documents. Together the updates matter because they show Google tightening the loop between where users create work and the multimodal context systems that help AI understand it.
A small but clear cluster from automation platforms suggests the category is moving beyond basic app-to-app recipes and toward AI-native workflow infrastructure. Make announced new If-else and Merge modules for cleaner branching logic, n8n promoted builder sessions focused on webhooks, MCPs, subworkflows, and error handling, and Zapier framed itself as a hands-on partner for getting AI projects into production. Taken together, the posts matter because they show automation vendors converging on the same promise: helping teams operationalize agents and more complex AI workflows rather than just stitch together SaaS tools.
A post amplified by Paul Graham pointed to an Unsloth repository with more than 250 notebooks for LLM training and inference, including workflows for RL, vision, audio, embeddings, and TTS. The notable part is the accessibility claim: developers can follow the stack locally on roughly 3GB of VRAM or run it for free on Colab. That framing makes the release a useful signal that open-source training tooling is continuing to move downmarket toward solo builders and smaller teams.
In the race to adopt and show value from AI, enterprises are moving faster than ever to deploy agentic AI as copilots, assistants, and autonomous task-runners. In late 2025, nearly two-thirds of companies were experimenting with AI agents, while 88% were using AI in at least one business function, up from 78% in 2024, according…
Pokémon Go was the world’s first augmented-reality megahit. Released in 2016 by the Google spinout Niantic, the AR twist on the juggernaut Pokémon franchise fast became a global phenomenon. From Chicago to Oslo to Enoshima, players hit the streets in the urgent hope of catching a Jigglypuff or a Squirtle or (with a huge amount…
Simon Willison shared poll results from 539 recent software job interviewees showing that 43% said experience with AI programming tools was required, 25% said it was optional, and only 32% said it did not come up at all. The finding was quickly echoed by Hugging Face cofounder Thom Wolf, who joked that applying for developer jobs without AI tool experience now looks like applying to be a telephone operator in 2026. The discussion matters because it suggests coding agents are shifting from a personal productivity edge to a concrete expectation in software hiring and review workflows.
Mira Murati said Thinking Machines is working with Nvidia to deploy at least 1 gigawatt of Vera Rubin systems, describing the effort as part of a push to bring adaptable collaborative AI to everyone. The post adds a direct founder-level confirmation to the company’s earlier infrastructure narrative and underscores how aggressively new AI labs are now signaling compute scale as a strategic moat.
Mira Murati / Thinking MachinesMar 10via @MiraMurati
Today we announced new beta features for Gemini in Sheets to help you create, organize and edit entire sheets, from basic tasks to complex data analysis — just describe …
Akshay Pachaar shared a workflow for running Claude Code against local models by pointing the tool at a llama.cpp server with the ANTHROPIC_BASE_URL environment variable, which removes API costs and keeps data on the user’s own machine. The idea stands out because it treats Claude Code less like a closed product and more like a reusable interface that can be swapped onto different backends. In the same batch, Nicolas Camara pitched browser infrastructure for agents through CDP and sandbox access, while Simon Kirane released an open-source “Make it Heavy” framework that recreates Grok Heavy-style behavior in the terminal. Together, the posts show agent tooling becoming more modular, hackable, and self-hosted.
Sara Hooker said Adaption AI is launching a research grant program that gives academic researchers around the world access to the company’s platform. The move stands out because compute and model access remain bottlenecks for many researchers, especially outside major AI centers, so grant programs can meaningfully shape who gets to experiment and publish. For Adaption, the announcement is also a distribution play: broader academic usage can turn into both technical feedback and long-term ecosystem influence.
Lambda Labs said it will appear at Nvidia GTC 2026 with booth demos built on Nvidia Blackwell architecture and an expert session covering Vera Rubin NVL72 and Nvidia GB300 NVL72. The preview is notable because it ties Lambda’s positioning directly to the next wave of high-end AI infrastructure that enterprises are watching for training and inference deployments. In practice, the post works as an early signal that GTC will again be a venue where infrastructure providers compete on access to Nvidia’s newest systems.
Security researcher Lukasz Olejnik said Amazon is holding a mandatory meeting about AI breaking internal systems after a briefing note described a trend of incidents with “high blast radius” caused by “Gen-AI assisted changes,” alongside incomplete best practices and safeguards. Gary Marcus quickly amplified the warning as evidence that reliability concerns around AI-assisted engineering are no longer theoretical. The post matters because it points to a new phase in the AI tooling story: companies are no longer just measuring how much agentic coding can accelerate delivery, but how much operational risk it can introduce when used across large production environments.
Marktechpost reported that ByteDance has released DeerFlow 2.0, an open-source “SuperAgent” framework designed to orchestrate sub-agents, memory systems, and sandboxes for more complex workflows. The release is notable because it reflects a broader shift in open source AI tooling away from single-agent chat interfaces and toward execution stacks built for multi-step autonomous work. For builders, it points to a more modular way to compose research, coding, and task automation systems.
Aravind Srinivas showcased two community-built examples of Perplexity Computer being used for practical consumer automation: one moved a Spotify playlist to YouTube Music from a single pasted URL, and another created a peer-to-peer file transfer app with direct encrypted transfer and no account requirement. The significance is not just the individual demos but the pattern they suggest. Computer-use agents are starting to look like a new layer for coordinating work across existing consumer apps and services, turning awkward manual flows into one-step tasks. That gives Perplexity Computer a clearer product identity as an automation surface rather than just a flashy demo environment.
Andrej Karpathy said an autoresearch setup he left tuning nanochat for about two days discovered around 20 changes that all improved validation loss, and that the gains transferred to larger models as well. The post matters because it frames agents not just as coding assistants but as iterative research workers that can propose, test, and compound model improvements with limited human supervision. In the same batch, Andrew Ng launched Context Hub to feed coding agents up-to-date documentation, while Guillermo Rauch pushed the idea that strong agents must also ship by bundling Vercel CLI into OB-1 sessions. The broader signal is that agentic software development is maturing into a loop of research, context retrieval, and deployment rather than a one-shot code generation task.
Marktechpost said Andrew Ng’s team has released Context Hub, an open-source tool designed to give coding agents access to current API documentation instead of stale references. The project aims to reduce the “agent drift” problem that appears when assistants hallucinate parameters or rely on outdated docs during implementation. If it works as advertised, it could become a practical infrastructure layer for keeping AI coding tools accurate inside real developer workflows.
Yohei Nakajima argued that the idea blue-collar workers won't use AI is already outdated, pointing to multiple Facebook groups of blue-collar business owners where AI is now a frequent topic. He paired the observation with a joke about mistaking Plaud for "Claude for plumbers," but the underlying point was serious: AI tooling is leaking into practical, non-software workflows faster than many skeptics expect. The post adds another signal that real-world AI adoption is broadening well beyond developers and knowledge workers.
Boris Cherny introduced a new Claude Code feature called Code Review, saying Anthropic built it first for internal use after code output per engineer rose 200% this year and review became the bottleneck. The tool sends a team of agents to do a deep review on every pull request, and Cherny said it is already catching real bugs he would have missed. The launch matters because it shows Anthropic pushing Claude Code from a solo coding assistant toward a more complete software-engineering workflow where generation, verification, and team review all live inside the same system.
Vercel announced that Ship 26 is coming soon and said the event will run live in San Francisco, New York, London, Berlin, and Sydney. A follow-up post pushed early details and discounted ticket pricing, reinforcing that the company is preparing a coordinated global launch moment rather than a standard single-city conference. For developers and startup teams, the teaser suggests Vercel is gearing up to unveil a notable new wave of platform updates.
Amazon Science highlighted a discussion from Amazon Scholars and University of Pennsylvania professors Michael Kearns and Aaron Roth arguing that agentic AI tools will trigger a sea change in how research gets done. The post says the impact will span methodology, researcher training, and even peer review. That is notable because it frames AI not just as a lab assistant, but as a force that could alter the norms and institutional mechanics of science itself.
LeRobot announced that version 0.5.0 is officially live, describing it as the project’s biggest release so far with more than 200 merged pull requests and over 50 new contributors. The update matters because it frames open-source robotics as a fast-improving software stack rather than a collection of isolated research repos, giving builders a stronger shared base for work in both simulation and real-world deployment. In practice, the release is a signal that community-maintained robotics tooling is starting to scale like mainstream developer infrastructure.
Nvidia used its main X account to promote Jensen Huang’s March 16 GTC 2026 keynote, framing the event as the unveiling of the next chapter of AI. The teaser is notable because GTC has become one of the industry’s most important stages for new chips, systems, and AI infrastructure strategy. If Nvidia follows its usual playbook, the keynote will likely shape expectations well beyond its own product line, influencing cloud providers, model labs, and enterprise buyers planning their next wave of AI spending.
Abacus.AI CEO Bindu Reddy posted that GPT-5.4 Extra High now tops LiveBench by a healthy margin and said her team is rushing to incorporate it. The message is less about one benchmark score than what follows from it: model providers are still able to reset the competitive baseline overnight, and downstream product teams are adapting in real time. For anyone tracking the model race, the post is a clean snapshot of how fast rankings turn into shipping pressure.
OpenAI announced it is acquiring Promptfoo to strengthen agentic security testing and evaluation inside OpenAI Frontier, while Jason Liu amplified a separate OpenAI developer push around using Codex skills for open-source maintenance. On the same day, Liu also highlighted open-source maintainer credits and token leaderboard usage around the Agents SDK ecosystem. Taken together, the posts suggest OpenAI is building a fuller developer stack around coding agents: evals, security, and repeatable OSS workflows.
Figure posted a new Helix 02 demo showing its humanoid robot tidying a living room fully autonomously, with the main clip drawing roughly 8.6K likes, 1.6K reposts, 613 replies, and 1.7M views during the scrape window. A second post linked to a technical explainer covering the whole-body end-to-end cleanup workflow. Together, the posts frame home reset and cleanup as one of the clearest near-term product demos for household humanoid robots.
Scale AI announced Scale AI Labs, describing it as a new home for the company’s research across data, evaluation, safety, and post-training. The post matters because it positions Scale more explicitly as a research-facing player at a moment when frontier model progress increasingly depends on strong data pipelines, rigorous evals, and post-training techniques rather than raw model size alone. By packaging that work under a dedicated labs banner, Scale is signaling that it wants a larger public footprint in the technical debate around how advanced AI systems are improved and measured.
Runway unveiled Runway Characters, a new product that lets developers deploy real-time intelligent avatars with custom styles, knowledge banks, and conversational behavior through the Runway API. The launch was quickly reinforced by community demos, including examples of characters that could read a game screen, guide players to objectives, and identify real-world objects in context. The post matters because it shows creative AI companies moving beyond text and video generation into interactive agents that can participate in live experiences, opening a path toward AI-native interfaces for entertainment, education, and customer-facing software.