Cisco AUTOCOR Passed

Yesterday I took and passed the Cisco AUTOCOR (previously DEVCOR) exam which is the core exam for CCNP Automation. That means I need a specialist exam to become CCNP Automation certified. It also means I’m qualified to sit the CCIE Automation lab.

What did I think of the exam?

As with any exam, there is good and bad. I’ll start with the good.

The exam aligned well with the blueprint. I didn’t feel there were any real surprises or questions on items that weren’t part of the blueprint.

There wasn’t a lot of trivia. No memorization of specific API endpoints or anything like that.

The exam experience was fine. I took it in a testing facility, which I prefer, and I had no issues. I was provided with earplugs which was nice to stay focused although this is a small facility and there was only one other candidate.

I liked the different types of questions. You have your standard multiple choice, single answer and multiple choice, multiple answer, but also fill in the blanks, and lablets. It’s nice that there is quite a bit of code in the exam, it is an automation exam after all. I also think it’s Continue reading

Our billing pipeline was suddenly slow. The culprit was a hidden bottleneck in ClickHouse

At Cloudflare, we are heavy users of ClickHouse, an open-source analytical database management system. We redesigned one of our largest ClickHouse tables to add a column to the partitioning key. The change enabled per-tenant retention on a table that serves hundreds of internal teams. The design went through several rounds of revision and review with engineers across multiple teams before we landed on the final approach. But a few weeks after rollout, the jobs that produce most of Cloudflare's bills were running up against their hard daily deadline.

All the usual suspects looked clean: I/O, memory, rows scanned, parts read. Everything we would normally check when a ClickHouse query is slow appeared to be normal. The problem turned out to be lock contention in query planning, something we'd never had reason to look for before.

This is the story of how this migration exposed a hidden bottleneck in ClickHouse's internals, and the patches we wrote to fix it.

The setup: a petabyte-scale analytics platform

We use ClickHouse to store over a hundred petabytes of data across a few dozen clusters. To simplify onboarding for our many internal teams, we built a system called "Ready-Analytics" in early 2022.

The premise is Continue reading

ARP with EVPN Asymmetric IRB

In a previous blog post, I described the ARP issues you’ll encounter when using centralized routing (on a spine switch) between two EVPN MAC-VRF instances (a fancy name for a VLAN encapsulated in VXLAN or MPLS).

That blog post established a baseline that will help us unravel the ARP behavior in a more realistic scenario: asymmetric Integrated Routing and Bridging (IRB). That’s a mouthful, but it’s really quite a simple concept; the following diagram explains the asymmetric forwarding behavior:

Packet forwarding in an EVPN asymmetric IRB design

Packet forwarding in an EVPN asymmetric IRB design

OpenClaw Ruined AI and It Makes Me Happy

The biggest AI story of 2026 isn’t the growing need for electrical power or the ridiculous way the market sold out for RAM based on a letter of intent to acquire. No, the biggest AI story of the year so far is how a scrappy little project completely upset the AI apple cart. OpenClaw (nee ClaudeBot, nee OpenMolt) set the world on fire. And it destroyed how people were trying to direct AI. I’m sitting over here giggling about it.

Round The Clock

The basics of OpenClaw are simple enough. You have a system of agents that do things. It can read your texts or email and triage the flow of information. It can send you a text summary of the news or the weather every morning. But it can also be configured to monitor things as they arrive to deal with them on the fly. That’s where the real narrative shift has happened.

When you open a browser window to talk to an LLM you are creating a session that has a finite time limit. You are saying that you are going to work on a project for a specific period of time and that’s that. Once you complete Continue reading

Browser Run: now running on Cloudflare Containers, it’s faster and more scalable

We’ve enabled higher usage limits, faster performance, and better reliability for Browser Run by rebuilding on top of Cloudflare’s Containers.

You can now spin up 60 browsers per minute via the Workers binding and run up to 120 concurrently — 4x the previous limit. Also, Quick Action response times dropped more than 50%. You don't need to change anything: these improvements are live today. On top of that, we’re shipping fixes and new features faster than before. Read on to learn how we did it and see the data.

Remind me: what is Browser Run?

Browser Run enables developers to programmatically control and interact with headless browser instances running on Cloudflare’s global network. That’s useful for end-to-end testing of web applications, securely investigating suspicious URLs, and leveraging how browsers can easily render PDF documents, amongst other quick actions like capturing screenshots and extracting content. More recently, it’s become a critical enabler of AI agents to interact with the web. We’re building Browser Run to be the go-to platform to responsibly utilize automated browsers securely at massive scale.

Outgrowing our bunk bed

Before adopting Cloudflare Containers, we shared infrastructure with Browser Isolation (BISO). While technically similar, BISO’s larger container images slowed Continue reading

Meet NFA v26.02, featuring BGP visibility tools, extended threshold matching, and SNMP reporting enhancements.

We’re excited to announce the release of Noction Flow Analyzer v26.02. This version includes a focused set of improvements that enhance BGP visibility, expand threshold-monitoring options, improve flow-processing performance, and refine the SNMP reporting experience. This update builds on the foundation of v26.01 and introduces new tools for network engineers who rely on real-time routing intelligence and traffic analysis.

BGP diagnostics and visibility tools

The biggest addition in v26.02 is a complete set of BGP diagnostics and visibility tools. These give network administrators new insights into routing behavior directly within NFA. The new BGP diagnostics panel introduces ping and traceroute checks, allowing engineers to run connectivity and path diagnostics without leaving the NFA interface. Additionally, a BGP Data Lookup feature enables direct queries against NFA’s internal BGP tables, supporting exact-match and more-specific match modes for precise prefix investigations. Finally, BGP History Lookup provides access to historical route events, including key attributes such as prefix, next-hop, AS path, and more. This makes it easier to trace routing changes over time and connect them with traffic events.

NFA 26.02
We’re excited to announce the release of Noction Flow Analyzer v26.02. This version includes a focused set of improvements that enhance Continue reading

Pytest for Automated Network Testing (II)

Pytest for Automated Network Testing (II)

In part one, we covered the basics of pytest and wrote our first network tests. We tested BGP and OSPF on a single device, then extended it to multiple devices. We also looked at parametrization and how it helps treat each device and each neighbour as an independent test.

In this part, we will cover inventory management with Nornir and pytest fixtures.

Pytest for Automated Network Testing
Pytest gives you full control. You write the test, you decide exactly what to check, and you get a clear pass or fail result. You can test one device
Pytest for Automated Network Testing (II)

Nornir Introduction

Nornir is a Python automation framework designed for network engineers. Instead of writing your own logic to connect to devices, manage inventory, and run tasks in parallel, Nornir handles all of that for you. We have a dedicated series on Nornir, which you can check out here, so we are not going to do a deep dive in this post.

The reason we are using Nornir here is for inventory and task management. Instead of hardcoding a list of IP addresses in our collection file, we define our devices in a hosts file with groups, credentials, and Continue reading

When “idle” isn’t idle: how a Linux kernel optimization became a QUIC bug

CUBIC, standardized in RFC 9438, is the default congestion controller in Linux, and as a result governs how most TCP and QUIC connections on the public Internet probe for available bandwidth, back off when they detect loss, and recover afterward. At Cloudflare, our open-source implementation of QUIC, quiche, uses CUBIC as its default congestion controller, meaning this code is in the critical path for a significant share of the traffic we serve.

In this post, we’ll tell the story of a bug in which CUBIC's congestion window (cwnd) gets permanently pinned at its minimum and never recovers from a congestion collapse event.

The story starts with a Linux kernel change aimed at bringing CUBIC into line with the app-limited exclusion described in RFC 9438 §4.2-12 — a fix to a real problem in TCP that, when ported to our QUIC implementation, surfaced unexpected behaviors in quiche. It has a happy ending: an elegant (near-)one-line fix that broke the cycle.

CUBIC's logic in a nutshell

Before we dive into the core problem, a quick refresher on Congestion Control Algorithms (CCAs) may help to set the stage.

The central knob a CCA turns is the congestion window (cwnd Continue reading

Reorganized ipSpace.net Segment Routing Resources

I created nine sample SR-MPLS topologies for the ITNOG 10 SR-MPLS workshop, and of course, we ran out of time. I plan to cover those topologies and resulting printouts in a series of blog posts; to prepare for those, I cleaned up and reorganized the Segment Routing blog category, which is now split into two:

Hope you’ll find them useful! Also, if you know of other non-vendor Segment Routing resources, please leave a comment, email me, or submit a pull request.

Quantum safe amateur radio secure shell

I’ve previously pointed out that the AX.25 implementation in the kernel is pretty poor. It’s not really being maintained, and even when it gets fixes after I reported it, with people running LTS OSs it can take like 5 years before before the fix actually reaches users, if ever. So when writing applications, you still have to work around kernel bugs from a decade ago. This makes it kind of pointless to upstream patches.

The exception is security patches, and reading between the lines of why the AX.25 code is now being removed from the kernel, it sounds like maybe some LLM (like the looming “Mythos” and the related Glasswing) may have found some severe problems. But even if there aren’t any known security problems yet, having code is now more of a liability than ever. Code needs to be removed, or taken responsibility of. (tangent about ffmpeg at the bottom of this post)

With the kernel code removed, say goodbye to the old walkthrough.

The new API

Well, not “new”, per se, but “replacement”.

With the socket based API about to be gone, we need some other way for applications to send packets and Continue reading

1 2 3 3,870