Documentation Index Fetch the complete documentation index at: https://akua-1dce587a.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
An agent asked “why is my app returning 500s?” runs a full incident triage: checking pod health, recent Kubernetes events, error logs, and deployment rollout history, all composed across multiple execute calls, reasoning about each result before deciding what to check next.
The triage flow
This isn’t a single code block. It’s how the agent thinks . Each step is one execute call, but the agent decides what to check based on what it finds.
Step 1: Pod health check
async () => {
const clusterId = "cls_abc123" ; // resolved by the agent from conversation
const namespace = "production" ; // resolved by the agent from conversation
const kube = ( path ) => cnap . request ({
method: "GET" ,
path: `/v1/clusters/ ${ clusterId } /kube_proxy/ ${ path } ` ,
}). then ( r => r . body );
const pods = await kube ( `api/v1/namespaces/ ${ namespace } /pods` );
return pods . items . map ( p => ({
name: p . metadata . name ,
phase: p . status . phase ,
restarts: p . status . containerStatuses ?. reduce (( s , c ) => s + c . restartCount , 0 ) || 0 ,
ready: p . status . containerStatuses ?. every ( c => c . ready ) || false ,
containers: p . status . containerStatuses ?. map ( c => ({
name: c . name ,
ready: c . ready ,
restarts: c . restartCount ,
state: Object . keys ( c . state || {})[ 0 ],
reason: c . state ?. waiting ?. reason || c . state ?. terminated ?. reason || null ,
})),
}));
}
See all 25 lines
The agent sees a pod in CrashLoopBackOff with 12 restarts. It decides to check events and logs.
Step 2: Recent events
async () => {
const clusterId = "cls_abc123" ; // resolved by the agent from conversation
const namespace = "production" ; // from step 1
const podName = "api-proxy-7f8b4c..." ; // from step 1 results
const kube = ( path ) => cnap . request ({
method: "GET" ,
path: `/v1/clusters/ ${ clusterId } /kube_proxy/ ${ path } ` ,
}). then ( r => r . body );
const events = await kube (
`api/v1/namespaces/ ${ namespace } /events?fieldSelector=involvedObject.name= ${ podName } `
);
// Sort by last timestamp, return most recent
return events . items
. sort (( a , b ) => new Date ( b . lastTimestamp ) - new Date ( a . lastTimestamp ))
. slice ( 0 , 15 )
. map ( e => ({
type: e . type ,
reason: e . reason ,
message: e . message ,
count: e . count ,
last: e . lastTimestamp ,
}));
}
See all 26 lines
Events show OOMKilled. The container ran out of memory. The agent checks logs to confirm.
Step 3: Error logs
async () => {
const clusterId = "cls_abc123" ; // resolved by the agent from conversation
const namespace = "production" ; // from step 1
const podName = "api-proxy-7f8b4c..." ; // from step 1 results
const logs = await cnap . request ({
method: "GET" ,
path: `/v1/clusters/ ${ clusterId } /kube_proxy/api/v1/namespaces/ ${ namespace } /pods/ ${ podName } /log` ,
query: { tailLines: "200" , previous: "true" },
}). then ( r => r . body );
// Filter for errors and warnings
const lines = logs . split ( " \n " );
const errors = lines . filter ( l =>
/error | fatal | panic | exception | oom | killed/ i . test ( l )
);
return {
total_lines: lines . length ,
error_lines: errors . length ,
errors: errors . slice ( - 20 ),
};
}
See all 23 lines
Note previous: "true": the agent fetches logs from the crashed container, not the restarting one. It finds memory allocation failures in the last 20 error lines.
Step 4: Deployment rollout history
async () => {
const clusterId = "cls_abc123" ; // resolved by the agent from conversation
const namespace = "production" ; // from step 1
const deploymentName = "api-proxy" ; // from step 1 results
const kube = ( path ) => cnap . request ({
method: "GET" ,
path: `/v1/clusters/ ${ clusterId } /kube_proxy/ ${ path } ` ,
}). then ( r => r . body );
const [ deployment , replicaSets ] = await Promise . all ([
kube ( `apis/apps/v1/namespaces/ ${ namespace } /deployments/ ${ deploymentName } ` ),
kube ( `apis/apps/v1/namespaces/ ${ namespace } /replicasets` ),
]);
// Find ReplicaSets owned by this deployment
const owned = replicaSets . items
. filter ( rs => rs . metadata . ownerReferences ?. some ( o => o . name === deploymentName ))
. sort (( a , b ) => parseInt ( b . metadata . annotations ?.[ "deployment.kubernetes.io/revision" ] || "0" )
- parseInt ( a . metadata . annotations ?.[ "deployment.kubernetes.io/revision" ] || "0" ));
return {
current_image: deployment . spec . template . spec . containers [ 0 ]?. image ,
current_limits: deployment . spec . template . spec . containers [ 0 ]?. resources ?. limits ,
revisions: owned . slice ( 0 , 5 ). map ( rs => ({
revision: rs . metadata . annotations ?.[ "deployment.kubernetes.io/revision" ],
image: rs . spec . template . spec . containers [ 0 ]?. image ,
replicas: rs . status . replicas ,
created: rs . metadata . creationTimestamp ,
})),
};
}
See all 32 lines
The agent finds that the latest revision changed the image but removed memory limits. Root cause identified.
Why this matters
An SRE manually doing this would:
kubectl get pods to check status
kubectl describe pod to read events
kubectl logs --previous to check crash logs
kubectl rollout history to check what changed
That’s 4 separate commands with raw output they need to mentally parse. The agent does it in 4 execute calls, but each one filters and extracts only what’s relevant. The LLM reasons about structured findings, not walls of YAML.
More importantly, the agent adapts . It doesn’t run a fixed checklist. It sees OOMKilled and decides to check previous container logs and deployment history. A traditional MCP tool would need a pre-built “debug pod” tool that tries to anticipate every scenario.
Kubernetes access The kube proxy and exec endpoints used in each triage step.
Security audit Proactive security checks before incidents occur.
Parallel log analysis Fetch and count logs across all pods in a single call.
Hosted agents Ambient agents that start triage automatically on deploy failures.