Chapter 2: Installing SONiC NOS

 

ONIE-Based SONiC Installation

Many switch vendors have added SONiC NOS support to at least part of their switch portfolio. Depending on the vendor and switch model, customers may be able to order a switch with a vendor-customized SONiC version that is supported at the same level as the vendor's own network operating system. Some vendors also allow customers to run the community-based SONiC distribution.

The support model for Community SONiC depends on the vendor. Some hardware vendors provide full support, while others provide no support at all. Compared with vendor-specific SONiC distributions, Community SONiC provides greater flexibility because it can be customized, rebuilt, and adapted to customer requirements. However, running a Community SONiC deployment without vendor support or in-house expertise is generally not a recommended operating model.

Community SONiC is typically installed by using ONIE (Open Network Install Environment) [1], a small open-source installation environment that provides a standardized method for installing network operating systems on supported switches. Figure 2-1 illustrates a conceptual ONIE-based Community SONiC installation process.

If the switch is delivered with a vendor-specific SONiC distribution already installed, it may boot directly into that operating system without requiring a separate ONIE installation workflow. For Community SONiC deployments, Continue reading

How we found a bug in the hyper HTTP library

The Images service, built in Rust on Workers, runs on every machine in Cloudflare’s edge network. To handle client connections, we use hyper, an open-source HTTP library for Rust.

Last year, we introduced the Images binding to enable custom, programmatic workflows for processing remote images in Workers. At the end of 2025, we rearchitected the binding to provide a more direct, local connection between the Workers runtime and the Images service.

Shortly after rollout, we received reports that transformation requests from the binding were failing — but only intermittently and only for larger images. Even stranger, the responses for these requests returned a 200 status without any errors logged. The image data was simply cut short: A response that should have been two megabytes might arrive with a few hundred kilobytes instead.

We spent six weeks chasing a nearly invisible bug — a race condition that occurred only under specific conditions — in the hyper library that impacted how the Images binding returned processed image data back to the client. In the end, it took four lines of code to fix it.

Hops, handoffs, and hyper

When developers build on Cloudflare, they compose full-stack applications from a set Continue reading

NB580: Project Glasswing on Hold – or Not; Why You Should Hold In-Person Background Checks

Take a Network Break! Our Red Alert covers critical vulnerabilities found in OpenClaw, the open-source AI assistant. On the news front, we discuss the status of Anthropic’s Project Glasswing and examine the Korean Electronics and Telecommunications Research Institute‘s (ETRI) development of an intelligent, service-programmable mobile core network, a key enabling technology for the 6G era.... Read more »

Rebuilding My Proxmox Cluster for the Version 9 Upgrade

Rebuilding My Proxmox Cluster for the Version 9 Upgrade

I've been running Proxmox for maybe two years now, and I'd consider myself somewhat of a beginner. I set it up once using Proxmox version 8.x alongside Proxmox Backup Server and pretty much forgot about it. I can spin up new VMs and CTs, or remove existing ones, and that's about it. Fast forward to mid-2026, Proxmox released version 9, and I'd been meaning to upgrade. I went through a few guides and forum posts, and people tended to recommend backing up the VMs and CTs using PBS, then reinstalling Proxmox with the new version and restoring from backup.

TL;DR

I upgraded my Proxmox setup from version 8.x to 9 without doing an in-place upgrade. Instead, I used a spare node already on version 9 as a temporary home and rebuilt the other two nodes one at a time. The idea was to back up everything with PBS, restore onto the spare, then wipe and fresh install version 9 on each node before joining them into a new three-node cluster. Once the cluster was sorted, I also rebuilt pbs-01 to version 4 and cleaned up the old backups. The whole thing was seamless, I didn't lose anything, and Continue reading

Automating Palo Alto HA Firewall Upgrades with Ansible

Automating Palo Alto HA Firewall Upgrades with Ansible

In this blog post, we will cover upgrading Palo Alto firewalls in HA using Ansible. This only covers upgrading minor versions, so it won't work if you are going from 10.x to 11.x, for example. This also only supports an HA pair.

This playbook is based on the repo from Palo Alto itself. There are many playbooks there covering scenarios like upgrading the major version, upgrading the content, and so on, but we will only focus on one specific playbook for HA, which I tweaked a little bit to suit my own setup.

Manual Upgrade Process

Of course, this post assumes you already know how to upgrade the firewalls manually. In case you don't, here are the steps. Palo Alto also recommends upgrading the active unit first and then the passive. You download the image to the active unit and tick the box to sync it to the peer, then suspend the active unit to trigger a failover so the passive takes over. Install the image on the suspended unit, reboot it, and wait for it to come back online so the HA pair syncs again. Once it is back, suspend the current active unit (the original passive) Continue reading

TNO065: The Operational Reality of Modern Wireless Networks

Scott sits down with Wi-Fi engineer Eva Santos to explore the realities of modern wireless operations. Eva shares insights on navigating site surveys, the differences between Wi-Fi bands, and the challenges of troubleshooting inconsistent client performance. The conversation also explores the evolving standards of Wi-Fi 6, 7, and 8, the role of security protocols like... Read more »

Technology Short Take 197

Welcome to Technology Short Take 197! I’ve been traveling for business for the last week, so this Technology Short Take has a tad fewer links than I typically include. Even so, I still have links on radical new network designs, the impacts of AI on code security, things beginners get wrong about AWS IAM, and more! Let’s get into the content.

Networking

Security

Temporary Cloudflare Accounts for AI agents

Everyone's writing code with AI agents today. But the moment an agent needs to deploy something — and needs to sign up and create an account — it slams face-first into a wall built for humans: a browser-based OAuth flow, a dashboard to click through, an API token to copy-paste, a multi-factor authentication prompt to satisfy. For an interactive copilot sitting next to a developer, that's annoying. For a background agent, it's a hard stop.

Today we're rolling out Temporary Cloudflare Accounts for Agents.

Agents can now deploy websites, APIs, and agents right away, without first needing to sign up for an account.

Any agent can now run wrangler deploy --temporary and deploy a Worker to Cloudflare. This temporary deployment stays live for 60 minutes, during which time you can claim the temporary account, making it permanently your own. If you don't, it expires on its own.

Our goal? Let your agent code and ship.

Why frictionless deployments matter for AI agents

Frictionless temporary accounts matter more than it might first seem:

  • Background AI sessions have no human in the loop, and are becoming the norm. Any auth step that needs a browser, a copy-paste, or "click here Continue reading

Hedge 309: DNS Persist

As DNS is more widely used to distribute certificate information, proving ownership of a resource becomes more critical. The constant challenges required to prove resource ownership, however, increase delay in connecting or using a resource. DNS persists–as the name implies–creates a persistent connection between a resource and a certificate authority. Henry Birge-Lee, Michael Slaughter, and Shiloh Heurich join Russ and Tom to explain how this new record type works and it’s importance to DNS.
 

 
download

Build your own vulnerability harness

A few weeks ago, we published our initial findings from Project Glasswing, looking at what happens when you point frontier security models at an enterprise codebase. We also explored how our defensive structures adapt to protect our infrastructure and customers from threats posed by frontier AI. Since then, the AI ecosystem has continued to shift rapidly — developers who've built tightly around a single model have already experienced what happens when that model is no longer available or gets superseded by a more capable one. These market shifts only reinforce our core thesis: no matter which underlying model is leading the pack on any given day, the future of agentic workflows will not be found in standalone models, prompts, or single-agent sessions.

Moving from a localized security "skill" to a continuous, fleet-wide scanning pipeline requires an architecture where models are treated as interchangeable components. Relying on a single model inherently limits defensive coverage, as the same system will tend to look at code paths through the exact same lens. To counter this, models should be frequently interchanged and cross-tested. By varying the models across the pipeline — such as using one model for initial discovery and an entirely different Continue reading

How Lynx Works: A Technical Walkthrough

We launched Lynx this week. Instead of restating the pitch, I want to explain how it’s built and why we made the architectural choices we did. If you run Kubernetes and you’re starting to put AI agents on it, this is roughly the system you’d end up designing yourself.

Lynx is a control and data plane for all agentic AI traffic, providing a registry, gateway, audit, authentication with token exchange, policy enforcement, agent sandboxing, shadow agent discovery, and advanced AI capabilities such as red team agent and a guardian supervising agent to keep your agents on track. Lynx is single control point in the path of every agent call – agent-to-agent, agent-to-MCP, agent-to-LLM. Every call is authenticated, authorized against policy, and recorded, with no changes to agent code.

The constraints we started from

Four principles shaped the design:

  1. No agent code changes. Governance has to be applied by the platform, not adopted as a library. If it requires a code change, it won’t land uniformly – and uniformity is the entire point.
  2. No new database in the control plane. The source of truth is the Kubernetes API server and the data model is custom resources – there’s no separate datastore Continue reading

Celebrating 12 years of Project Galileo

Twelve years ago this month, Cloudflare launched an ambitious project built on a simple idea: people shouldn’t be knocked offline just because someone more powerful disagrees with them. Today, Project Galileo provides free access to cybersecurity services to more than 3,400 websites belonging to journalists, human rights defenders, and other nonprofit organizations in 120 countries. We continue to believe that a better Internet is one where anyone with an idea can reach a global audience. 

Each year on the anniversary of Project Galileo, we announce new products, programs, and strategic partnerships. To celebrate our 12th anniversary this year, we’re publishing our first comprehensive report on cyberattacks targeting civil society, releasing case studies that explore the security needs of 16 Project Galileo participants, and announcing new project partners.

Introducing a new annual report on cyberattacks against global civil society

Because Project Galileo now includes 3,400 domains belonging to organizations in over 120 countries, Cloudflare has access to unique data regarding the cyber threats, attacks, and trends targeting civil society — a critical pillar of global democracy. In addition, because the Cloudflare network spans more than 335 cities in 125 countries and more than 20% of the web sits behind it, Continue reading

1 2 3 3,882