Kubernetes Operational Maturity: Secure and Resilient Cluster Federation with Cluster Mesh

Practically no one runs a single Kubernetes cluster in production these days. Maybe that’s how it started but data sovereignty requirements, acquisitions, AI initiatives and the need for edge servers, among other considerations, have pulled most enterprises into multi-cluster territory whether they planned for it or not. Reaching Kubernetes operational maturity—the point at which a fleet of clusters operates as one secure, observable, policy-consistent system—depends entirely on how those clusters are connected. Operating in a multi-cluster environment has evolved into the unspoken standard, one requiring a careful re-evaluation of the network architectures used to link clusters together.

That re-evaluation rarely happens. Most enterprises connect their clusters with the same networking patterns they were using before Kubernetes existed: load balancers fronting internal services, DNS records published to external zones, and IP-based firewall rules. Those patterns were built for north-south traffic moving in and out of a traditional data center perimeter, not for east-west traffic moving between internal workloads.

Running east-west traffic on north-south plumbing

The conventional way to make services in one cluster reachable from another is to expose them externally with a load balancer in front, a DNS name registered in a public zone, a firewall rule allowing traffic in. Continue reading

SONIC Part III: SONiC Introduction

SONiC is a vendor-neutral, Linux-based network operating system (NOS) that uses a database-driven architecture. Its software components run in multiple containers and exchange information through Redis. In SONiC, several named databases are defined for different functions, and these databases are mapped to Redis logical database IDs. Through this design, configuration data, application state, operational state, and ASIC-related state move between software layers by means of specialized processes.

Different hardware vendors may add their own platform integrations, transceiver support, monitoring utilities, or management workflows. However, the core SONiC architecture remains the same. This is one of the main reasons why SONiC knowledge, troubleshooting methods, and automation practices are transferable across different hardware platforms.

Vendor neutrality does not mean that every SONiC-based implementation behaves exactly the same in every operational detail. It means that different implementations follow the same architectural model. To organize information clearly, SONiC defines several named databases, each of which is mapped to a Redis logical database ID:

·       CONFIG_DB (Redis DB 4): Stores the user’s intended configuration.

·       APPL_DB (Redis DB 0): Stores application-level objects that are ready for processing by lower software layers.

·       STATE_DB (Redis DB 6): Stores operational state information about system Continue reading

Scaling Akvorado BMP RIB with sharding

To associate routing information—like AS paths or BGP communities—to flows, Akvorado can import routes through the BGP Monitoring Protocol (BMP). As the Internet routing table contains more than 1 million routes, Akvorado needs to scale to tens of millions of routes.1 This has been a long-standing challenge,2 but I expect this issue is now fixed by using RIB sharding, a method that splits the routing database into several parts to enable concurrent updates.

Previous implementation

Akvorado connects 2 elements to build its RIB:

  1. a prefix tree, and
  2. a list of routes attached to each prefix.
Akvorado BMP RIB implementation before sharding with the memory layout of each
structure and a single lock.
Akvorado BMP RIB implementation without sharding. One single read/write lock.

In the diagram above, the RIB stores five IPv4 prefixes and two IPv6 prefixes. One of them, 2001:db8:1::/48, contains three routes:

  • from peer 3, next hop 2001:db8::3:1, AS 65402, AS path 65402, community 65402:31,
  • from peer 4, next hop 2001:db8::4:1, same ASN, AS path, and community,
  • from peer 5, next hop 2001:db8::5:1, AS 65402, AS path 65401 65402 Continue reading

The Five Pillars of AI Agent Accountability: A Diagnostic Framework for Engineering Leaders

You’re in a board meeting. The CISO is presenting on AI risk. The CFO asks a simple question:

“When that finance agent we deployed last quarter accessed a customer payment record, can we tell who authorized it, what policy permitted it, and produce the full audit trail?”

The CISO looks at the head of the platform. The head of the platform looks at security. Nobody answers.

If you can picture that meeting happening at your company, you’re not alone. McKinsey found that only one-third of organizations have AI agent governance maturity at level 3 or higher. The other two-thirds are exactly the silence in that boardroom.

This post is the diagnostic framework that closes that gap. It’s part 2 of a five-part series on AI agent accountability, and if you only have time to read one post in the series, read this one. By the end you’ll have a five-question assessment to run with your team this week, and a maturity model to score where you stand today.

Not all governance equals AI agent accountability. Many enterprises believe they’re covered because they have network policies or an API gateway, but governance without accountability is a security theater: it Continue reading

HN828: How Selector Unifies Cloud and On-Prem Network Observability (Sponsored)

Selector is extending its AI-driven network observability capabilities into public clouds. On today’s sponsored episode, we dig into how Selector gathers and analyzes public cloud network telemetry, how it integrates cloud and on-prem network data to provide end-to-end visibility, how it integrates with third-party Application Performance Monitoring (APM) systems to correlate network and application performance,... Read more »

Hedge 306: RPKI Transport

Synchronizing information across the Internet, at an initial glance, looks like a fairly simple problem to solve. Just copy a file to a host and create a magic protocol, right? Not really. Each kind of data has a fairly unique set of requirements–and RPKI data, used to provide security information for BGP, is no different. Job Snijders joins Tom and Russ to talk about ERIK, a protocol developed to synchronize RPKI records.
 
For more information, check out Job’s web site and the IETF draft.
 

 
download

Technology Short Take 196

Welcome to Technology Short Take 196! Just in time for the US Memorial Day holiday, I am back with another list of articles related to various data center technologies like networking, security, operating systems, and applications. You will find articles on VPNs, Linux local privilege escalation (LPE) vulnerabilities, browser quirks and workarounds, the death of Terraform (again), and so much more. Enjoy your weekend reading!

Networking

Servers/Hardware

Security

Continue reading

Public Videos: OpenFlow Deep Dive

Remember OpenFlow, the One Protocol to Bind Them All1? I haven’t heard anyone even mention it in ages, and I never bothered to ask whether anyone is still using it after the dismal results of the 2022 poll.

Anyway, if you still have to deal with that ancient blunder, six hours of deep dive videos I recorded a decade ago might still be useful. You can watch them without an ipSpace.net account.

Looking for more binge-watching materials? You’ll find them here.

Announcing Claude Compliance API support with Cloudflare CASB

Today, we are extending Cloudflare’s cloud access security broker (CASB) to support the Claude Compliance API. Security and compliance teams can now monitor Claude usage directly in the Cloudflare dashboard. No endpoint agents required.

Enterprise security teams have long struggled to see how users interact with sanctioned and unsanctioned applications. The rapid adoption of AI applications has made this harder. Employees spend significant time in these new surface areas, and their interactions differ from traditional SaaS: users upload files, share freeform prompts, and providers generate content that may contain sensitive data.

Cloudflare CASB helps solve this problem. One API integration gives you out-of-band visibility and control over the applications your organization uses. This integration builds on our existing support for AI governance, extending coverage over the most common tools security teams now manage. 

The fast path to safe AI adoption

AI adoption has outpaced security governance. While IT and security teams raced to enable AI tools for productivity, the controls lagged behind. Most organizations today operate with partial visibility: they may block unauthorized AI tools at the network layer, but they cannot see what happens inside sanctioned ones.

This matters because AI tools are not like traditional SaaS Continue reading

Cisco Launches Major Updates to Certifications

Cisco just announced major updates to their certification portfolio. Here’s what’s changing:

  • CCNA v2.0
  • CCIE practical exam AI DOO module
  • CCIE automation v1.2

CCNA v2.0

Effective February 2027, the CCNA is getting a major update. The future networking administrator/engineer will be more of an orchestrator than operator. Meaning that punching commands on the CLI will only be a small part of the future job role. Instead, you must be able to design, secure, and optimize increasingly autonomous networks. To be job-ready, you’ll need to learn how to:

  • Troubleshoot production issues under pressure
  • Evaluate what an AI assistant recommends and know when it’s wrong
  • Secure an environment by design, not an afterthought

The CCNA is about to get a whole lot more practical! Here’s what’s changing:

Troubleshooting gets a front seat. Employers value troubleshooting over reciting commands. Every domain will diagnostics and problem resolution. Think of the old TSHOOT CCNP exam, but instead of a separate exam, this is the format of the CCNA now. I’m really excited about this!

Security everywhere. We can no longer afford to think of security only as a separate domain, it needs to be part of everything we do. The new exam Continue reading

BGP and Multi-Cloud Routing

MultiCloudFor years, enterprise cloud networking was built around a simple assumption: pick a primary cloud provider, connect the data center to it, and expand from there.

That model no longer reflects how many organizations actually operate.
Today, workloads often live across AWS, Azure, and Google Cloud at the same time. Sometimes this is intentional. Sometimes it is the result of acquisitions, separate engineering teams, SaaS dependencies, regional requirements, or SaaS platforms that depend on a specific cloud provider. Either way, the network has to make these environments behave like one reliable system.
That is where the hard part begins.

Cloud-native routing tools are useful inside each provider, but they do not automatically solve routing between providers, between clouds and colocation hubs, or between multiple cloud environments and an enterprise WAN. Once routing needs to become dynamic, policy-driven, and resilient across administrative boundaries, BGP becomes the common language.

BGP is not new, and it is not always simple. But in multi-cloud networking, it remains one of the few mechanisms that AWS, Azure, Google Cloud, carriers, colocation providers, SD-WAN platforms, and enterprise routers can all understand.

What inter-cloud routing actually means

The term “inter-cloud routing” is often used loosely, so it is Continue reading

NAN123: How ION Meets the Out-of-this-World Challenges of Deep-Space Networking

Eric Chou and guest host Drew Conry-Murray sit down with deep space networking specialist Scott Spicer. Following the Artemis 2 mission, they discuss the challenges of long-delay space communications and the essential technologies making it possible such as the Interplanetary Overlay Network (ION), Delay-Tolerant Networking (DTN), and Contact Graph Routing (CGR).  AdSpot Sponsor: Meter Meter... Read more »

My Network is Talking Back Thanks to SuzieQ MCP and it’s Channeling Sam Kinison

Last Updated: 2025-05-19 Every SuzieQ Enterprise release quietly adds things that end up being genuinely useful in day-to-day network operations. Version 3.3 has had a few updates already. The GUI has seen a lot of attention. The new workbench makes it even easier to get at your data without jumping around, and you can now READ MORE

The post My Network is Talking Back Thanks to SuzieQ MCP and it’s Channeling Sam Kinison appeared first on The Gratuitous Arp.

Worth Reading: Agentic AI Setup: Sandboxes and Worktrees

Most of the hyperventilated AI “success stories” are as useful as the “ANSIBLE!!!” movement was a few years ago. It’s thus always a pleasure to find someone with well-established software development chops who took the time to describe what works for them.

One cannot argue with Mike McQuaid’s credentials (at least if you happen to be using homebrew on MacOS, which you REALLY SHOULD), and his Sandboxes and Worktrees: My secure Agentic AI Setup in 2026 article is full of relevant recommendations in case you’re brave enough to let AI agents loose on your GitHub repository.

1 2 3 3,872