Skip to content

Authentication

Why Authentication Needed an Ontology Class

v3.5 kernels had no concept of user identity. Every NATS message was anonymous. The web shell (when it arrived) had no way to distinguish users or restrict actions. This created a fundamental gap: the access: auth field on actions had no enforcement mechanism.

The v3.6 answer is not to bolt on auth as middleware. It is to make authentication a first-class ontological concept -- declared in the same ontology.yaml that declares kernel types, actions, and edges. If auth is not in the ontology, it does not exist in the cluster.

Three Authentication Levels

CKP supports three authentication levels over NATS: anonymous (anon), authenticated (auth), and owner (owner). A single authentication model works identically for browsers (WSS) and server-side kernels (TCP), eliminating the class of bugs where "it works in dev but not in production" because the auth path differs.

Access LevelNATS MechanismCKP MappingTypical Use
anonNo auth required on NATS connectiongrants.identity: anon actionsStatus checks, public queries, read-only browsing
authJWT token in NATS connection credentials or message headersgrants.identity: auth actionsData input, tool invocation, session participation
ownerJWT with kernel-owner claimgrants.identity: owner actionsConfiguration, teardown, grant management

The three-level model maps directly to the grants block in conceptkernel.yaml, creating a single source of truth for access control. See Namespace Security -- Grants Block for the full grants schema.

Topic ACL Rules per Auth Level

Access level determines which NATS topics a client can publish to and subscribe to. These ACLs are enforced by the NATS server configuration, not by kernel code.

LevelCan Publish ToCan Subscribe To
anoninput.{Kernel}result.{Kernel}, event.{Kernel}
authinput.{Kernel}, admin.{Kernel}result.{Kernel}, event.{Kernel}, metrics.{Kernel}, stream.{Kernel}
ownerAll kernel topicsAll kernel topics

Why Anon Can Publish to Input

Anonymous users must be able to invoke actions that the kernel has explicitly granted to anon (e.g., status). The grants block controls which actions are permitted, not the transport layer. Allowing anon to publish to input while restricting the action set via grants provides the right separation of concerns.

Kernel Type Authentication Requirements

Kernel TypeConnection AuthMessage AuthNotes
node:hotSPIFFE JWT-SVIDSPIFFE JWT-SVIDAlways connected, always authenticated
node:coldSPIFFE JWT-SVIDSPIFFE JWT-SVIDAuthenticated on startup, connection held for session
agentSPIFFE JWT-SVIDSPIFFE JWT-SVIDLong-lived connection with streaming
inlineWSS (anon or Keycloak)Keycloak JWT in headersBrowser client, per-message auth
staticNoneNoneNo NATS connection

SPIFFE JWT-SVID Integration

Server-side kernel-to-kernel communication uses SPIFFE JWT-SVIDs as NATS connection credentials. Every Concept Kernel is a SPIFFE workload with a stable identity assigned at mint time.

python
# Caller obtains JWT-SVID from SPIRE agent:
spiffe_jwt = spire_agent.fetch_jwt_svid(audience='nats.{domain}')

# NATS connection:
nats.connect('nats://nats.{domain}:4222',
             user=f'spiffe://{domain}/ck/{class}/{guid}',
             password=spiffe_jwt)

Subject-level ACLs are derived from the grants block:

  • publish: ck.{own-guid}.* is always allowed (own topics).
  • subscribe: ck.{other-guid}.* is allowed only if a grant exists and the requested action matches.

SPIRE Certificate Lifecycle

SPIRE handles the entire certificate lifecycle automatically:

  1. SVID issued at kernel startup from local SPIRE agent.
  2. SVID valid for 1 hour (configurable TTL).
  3. SPIRE rotates SVID automatically before expiry.
  4. No kernel code manages certificates.

Why This Matters

SPIFFE provides workload identity without shared secrets. Each kernel gets a cryptographically verifiable identity at mint time. SPIRE manages the full lifecycle -- issuance, rotation, revocation -- so kernel code never touches certificates. The SPIFFE ID (spiffe://{domain}/ck/{class}/{guid}) maps directly to the grants block, creating a unified identity model from Kubernetes namespace to NATS topic ACL.

Anonymous-to-Authenticated Escalation

Browser clients MAY connect anonymously and escalate to authenticated mid-session without NATS reconnection. This is a critical UX feature: a user can browse public kernel status anonymously, then log in to invoke authenticated actions without losing their NATS connection.

The escalation flow:

  1. Client connects to NATS WSS without credentials (anonymous).
  2. Client publishes to input.{Kernel} -- only anon-granted actions are permitted by the grants block.
  3. Client authenticates via Keycloak (or equivalent OIDC provider) and obtains a JWT.
  4. Client includes Authorization: Bearer {jwt} in subsequent NATS message headers.
  5. NatsKernelLoop verifies the JWT and grants auth-level access for that message.

Token refresh occurs transparently. The client refreshes via the identity provider and includes the new token in the next message's headers. No NATS reconnection is required because authentication is per-message (via headers), not per-connection.

JWT Claims Used by CKP

ClaimUsage
preferred_usernameMapped to X-User-ID for audit and identity
subUnique subject identifier
audAudience -- MUST match the kernel's auth.client_id
expExpiration timestamp -- MUST be checked
realm_access.rolesUsed for owner level determination
azpAuthorised party -- the client_id that obtained the token

AuthConfig Schema

Every CK.Project instance MAY declare an auth block in its project declaration. If omitted, the project operates in anonymous-only mode -- all actions with access: auth are unreachable.

yaml
auth:
  provider: keycloak       # keycloak | none
  instance: keycloak-name  # Keycloak CR on cluster
  realm: realm-name        # Keycloak realm
  client_id: ck-web        # OIDC public client
  issuer_url: https://id.example.com/realms/realm-name
  create_realm: false      # true = operator creates realm
  redirect_uris: []        # required if create_realm
  web_origins: []          # required if create_realm

The schema is deliberately minimal. CKP is not an auth framework -- it delegates to Keycloak for the actual OIDC machinery. What CKP controls is the declaration (what auth exists) and the provisioning (how auth reaches the cluster).

Two Modes

The create_realm flag governs whether the operator is a consumer or creator of Keycloak infrastructure:

Mode 1: Reuse existing realm (default, create_realm: false)

The project attaches to a Keycloak realm that already exists. The operator verifies the OIDC endpoint, injects the issuer URL into deployments, and moves on. Zero Keycloak write permissions required.

This works when the existing realm has a wildcard redirect URI (e.g., https://*.tech.games/*) that covers the new project's hostname. The reference deployment uses this: delvinator.tech.games reuses the techgames realm.

Mode 2: Create own realm (create_realm: true)

The operator generates a KeycloakRealmImport CR and applies it to the cluster. The Keycloak operator provisions the realm, client, and cryptographic key provider. The new realm is specific to this project.

The reference deployment demonstrates this: hello.tech.games creates its own hello realm because it runs in a different namespace and needs its own redirect URIs.

Why Two Modes?

The reuse/create split reflects a real operational trade-off:

ConcernReuseCreate
Keycloak write permissionsNone neededkeycloakrealmimports: get, list, create
Shared user baseYes -- same realm means same usersNo -- separate realm, separate users
Operator simplicityJust verify endpointGenerate CR, wait for provisioning
TeardownNothing to clean upRealm retained (identity outlives compute)

The operator MUST NOT modify or delete existing realms. It can birth realms but never destroy them. This is a deliberate asymmetry: identity is more permanent than compute.

deploy.auth -- The Reconciliation Step

deploy.auth is a step in the CK.Operator reconciliation lifecycle. It executes between deploy.routing and deploy.endpoint:

deploy.namespace    -- create/verify project namespace
deploy.storage.ck   -- create CK loop PV (ReadOnlyMany)
deploy.storage.data -- create DATA loop PV (ReadWriteMany)
deploy.processors   -- create Deployments, Services
deploy.web          -- create web server Deployment
deploy.routing      -- create HTTPRoute
deploy.auth         -- provision auth (THIS STEP)
deploy.endpoint     -- verify external endpoint HTTP 200

Step-by-Step Execution

The deploy.auth step is idempotent -- running it multiple times on the same project produces the same result.

StepActionFailure Mode
1Read auth block from project declarationIf missing or provider: none: skip remaining steps
2If create_realm: true: create KeycloakRealmImport CRSkip if CR already exists (idempotent)
3Inject KEYCLOAK_ISSUER env var into processor deploymentsDeployment update
4Inject KEYCLOAK_CLIENT_ID env var into processor deploymentsDeployment update
5Inject auth config into web index.html ConfigMapConfigMap update
6Verify: OIDC discovery endpoint returns HTTP 200Deploy blocks until reachable
7Verify: JWKS endpoint returns HTTP 200 with keysDeploy blocks until reachable

KeycloakRealmImport Generation

When create_realm: true, the operator generates a full Keycloak realm import:

yaml
apiVersion: k8s.keycloak.org/v2alpha1
kind: KeycloakRealmImport
metadata:
  name: {subdomain}-realm
  namespace: keycloak-operator
  labels:
    conceptkernel.org/project: {hostname}
spec:
  keycloakCRName: {instance}
  realm:
    realm: {realm}
    displayName: "{subdomain} (CKP)"
    enabled: true
    clients:
      - clientId: {client_id}
        publicClient: true
        standardFlowEnabled: true
        directAccessGrantsEnabled: true
        redirectUris: {redirect_uris}
        webOrigins: {web_origins}
        defaultClientScopes: [openid, profile, email]
        protocolMappers:
          - name: audience
            protocol: openid-connect
            protocolMapper: oidc-audience-mapper
            config:
              access.token.claim: "true"
              id.token.claim: "true"
              included.client.audience: {client_id}
    components:
      org.keycloak.keys.KeyProvider:
        - name: eddsa-key-provider
          providerId: eddsa-generated
          config:
            active: ["true"]
            algorithm: ["EdDSA"]
            enabled: ["true"]
            priority: ["200"]

Key design decisions:

  • Public client (publicClient: true) -- no client secret needed. The web shell is a browser SPA; it cannot keep secrets.
  • Direct access grants -- enables the password grant flow that ck-client.js uses from the browser.
  • EdDSA key provider -- Ed25519 signatures. Faster and shorter than RSA. Priority 200 overrides the default RSA provider.
  • Audience mapper -- ensures the JWT aud claim contains the client ID, which processors validate.

Verification

Auth verification adds checks to the proof chain. All checks must pass before the deploy is marked ready.

CheckMethodExpected
oidc_discoverycurl {issuer_url}/.well-known/openid-configurationHTTP 200
jwks_reachablecurl {issuer_url}/protocol/openid-connect/certsHTTP 200 + keys array present
env_injectedInspect processor deployment env varsKEYCLOAK_ISSUER and KEYCLOAK_CLIENT_ID present
web_config_injectedInspect web ConfigMapAuth config present in window.__CK_CONFIG

The first two checks (oidc_discovery and jwks_reachable) bring the total from 13 (v3.5.2) to 15 (v3.5.5+). The latter two are internal consistency checks.

Verification Order Matters

oidc_discovery runs before jwks_reachable. If OIDC discovery fails, the JWKS check is skipped -- there is no point validating keys if the issuer endpoint is unreachable. This is the same halt-on-failure principle used throughout the proof chain.

Teardown Semantics

Auth resources follow a clear lifecycle rule: identity outlives compute. Users who authenticated against a realm retain their identity even after all kernel compute is removed.

Resource TypeTeardown ActionRationale
DeploymentsDeletedCompute is ephemeral
ServicesDeletedRouting follows compute
ConfigMapsDeletedConfiguration follows compute
HTTPRoutesDeletedRouting follows compute
ConceptKernel CRsDeletedLogical representation of running kernels
PersistentVolumesRetainedData is the kernel's accumulated knowledge
KeycloakRealmImportRetainedIdentity outlives compute
Keycloak instanceUntouchedShared across projects
NamespaceRetainedAnchors PVCs and realm

All operator-created resources carry a conceptkernel.org/project label, enabling cross-namespace inventory:

bash
kubectl get pv,keycloakrealmimport -A -l conceptkernel.org/project=hello.tech.games

RBAC for Realm Creation

CK.Operator requires additional RBAC for create_realm: true mode:

yaml
- apiGroups: [k8s.keycloak.org]
  resources: [keycloakrealmimports]
  verbs: [get, list, create]

Note the deliberate absence of patch, update, and delete. The operator can birth realms but MUST NOT modify or destroy existing ones. This is enforced at the Kubernetes RBAC level -- not by convention.

Why No Update or Delete

Granting update or delete on realms would allow the operator to destroy user accounts or change security settings. The protocol deliberately limits the operator to create -- it can bring identity into existence but cannot alter it after creation. This matches the broader CKP principle that compute is ephemeral but identity persists. Manual Keycloak administration is required for realm modification or deletion.

Multi-Project Auth: The Hello.Greeter Test

v3.5.7 deployed Hello.Greeter to hello.tech.games as a proof that auth works across multiple projects with different realm strategies:

ProjectHostnameRealmModeNamespace
Delvinatordelvinator.tech.gamestechgamesReuseck-delvinator
Hellohello.tech.gameshelloCreateck-hello

Both projects:

  • Pass 15/15 verification checks (including auth)
  • Show working login in the web shell
  • Have independent namespace isolation

The hello realm was created by the operator via KeycloakRealmImport. The techgames realm was pre-existing. Both produce valid JWTs that processors can verify.

Architectural Consistency Check

Logical Analysis: Auth and the Three-Loop Model

Question: Does auth belong in the CK loop, the TOOL loop, or the DATA loop?

Answer: Auth config is declared in the CK.Project ontology -- this is CK loop territory (TBox). The auth provider, realm, and client ID are identity declarations, not runtime state. They change at design time, not at runtime.

However, JWTs themselves are DATA loop artifacts. A JWT is an instance: it has a creation time, an expiry, claims, and a signature. It is produced by Keycloak (an external process) and consumed by kernel processors. The JWT is not stored in the DATA loop (it lives in browser memory), but it follows the same pattern: a runtime artifact governed by a design-time schema.

Question: Why not embed auth directly in NatsKernelLoop?

Answer: Because not all kernels need auth. A LOCAL.* kernel running on a developer machine has no Keycloak. An AUTONOMOUS kernel with SPIFFE mTLS uses a different identity model entirely. Auth is a concern of the project, not the kernel. The kernel sees env vars (KEYCLOAK_ISSUER, KEYCLOAK_CLIENT_ID) injected by the operator -- it does not know how those values got there.

Gap identified: The current auth model covers browser-to-kernel authentication (JWT via NATS headers). It does NOT cover kernel-to-kernel authentication, which requires SPIFFE SVIDs. This is acknowledged in the v3.5 spec (Chapter 16) and deferred to a future SPIFFE integration milestone.

Conformance Requirements

NATS Authentication (Chapter 17)

CriterionLevel
Server-side kernels MUST use SPIFFE JWT-SVID for NATS authREQUIRED
Browser clients MUST connect via WSSREQUIRED
Anonymous-to-authenticated escalation MUST be supported without reconnectionREQUIRED
JWT MUST be verified server-side before handler dispatchREQUIRED
Token refresh MUST NOT require NATS reconnectionREQUIRED
NATS topic ACLs MUST be enforced by the NATS serverREQUIRED

AuthConfig and Provisioning (Chapter 18)

CriterionLevel
CK.Operator MUST inject auth env vars when auth is declaredREQUIRED
CK.Operator MUST verify OIDC discovery endpoint before marking deploy readyREQUIRED
CK.Operator MUST verify JWKS endpoint before marking deploy readyREQUIRED
CK.Operator MUST NOT modify existing Keycloak realmsREQUIRED
CK.Operator MUST NOT delete KeycloakRealmImport on teardownREQUIRED
CK.Operator MUST label all auth resources with conceptkernel.org/projectREQUIRED
If auth.provider is keycloak, realm, client_id, issuer_url MUST be presentREQUIRED
If create_realm: true, redirect_uris MUST contain at least one entryREQUIRED
Auth config (issuer, realm, client_id) MUST be injected by the operator, not hardcoded in kernel codeREQUIRED

See also: Namespace Security for grants enforcement and ODRL projection, Loop Isolation for volume-level security, NATS Messaging for topic conventions and transport details, Message Envelope for JWT verification in the NatsKernelLoop processing cycle.

Released under the MIT License.