Skip to content

📝 Author

Birat Aryal — birataryal.github.io Created Date: 2026-03-28
Updated Date: Saturday 28th March 2026 15:35:14
Website - birataryal.com.np
Repository - Birat Aryal
LinkedIn - Birat Aryal
DevSecOps Engineer | System Engineer | Cyber Security Analyst | Network Engineer


System Level Troubleshooting

Q1. You see a production server with 100% CPU. Walk through your diagnostic approach.

Answer: Start broad, narrow down, then act. Never guess and restart — always understand the root cause first.

Bash
# Step 1: Immediate overview
top -b -n1 | head -20          # snapshot — which PID is consuming CPU?
uptime                          # load average trend (1m, 5m, 15m)

# Step 2: Identify the offending process
top -b -n1 -o %CPU | head -15
pidstat -u 2 5                  # per-process CPU, 5 samples at 2s intervals

# Step 3: What is that process doing?
strace -p <PID> -c -T 2>&1 | head -20   # syscall profile
perf top -p <PID>                         # kernel-level hot functions (need perf)
ls -la /proc/<PID>/fd | wc -l            # file descriptor count

# Step 4: Is it CPU-bound or I/O wait?
vmstat 1 5
# If wa (iowait) > 20%: storage problem, not pure CPU
# If us (user) near 100%: application code
# If sy (system/kernel) elevated: kernel issue, syscall storm

# Step 5: Memory pressure contributing?
free -h
cat /proc/meminfo | grep -E 'MemFree|Cached|SwapUsed|Dirty'

# Step 6: If Java/.NET process — get thread dump
jstack <PID> > /tmp/threaddump.txt
kill -3 <PID>                   # also triggers thread dump to stdout

# Step 7: Network causing CPU load?
ss -s                           # socket summary
cat /proc/net/softnet_stat      # NIC softirq drops

VMStat Command Details

Mnemonic:

“Run Block - Swap Free BuffCache - SwapIn SwapOut - BlockIn BlockOut - Interrupt Context - User System Idle Wait Steal”

Field Meaning Mnemonic
r running queue Run
b blocked (I/O wait) Block

Signals

r > CPU cores high CPU contention b > 0 -> I/O bottleneck

Disk

Field Meaning Mnemonic
bi blocks in (read) Block In
bo blocks out (write) Block Out

Signals

High bi/bo + wa → disk bottleneck Sudden spikes → burst workload or flush

Field Meaning Mnemonic
us user CPU User
sy kernel CPU System
id idle Idle
wa I/O wait Wait
st stolen (VM) Steal

🟥 CPU Bottleneck

👉 “Run high, Idle low”

r ↑, id ↓


🟥 Memory Bottleneck

👉 “Swap is death”

si/so > 0


🟥 Disk Bottleneck

👉 “Blocked + Waiting”

b > 0, wa ↑


🟥 Thread Contention

👉 “Context explosion”

cs ↑↑


🟥 Virtualization Issue

👉 “Steal means host stealing CPU”

st > 0

🔥 Real-World Workflow (What seniors actually do)

Step 1 — Detect

Bash
vmstat 1

👉 “Something is wrong: memory/disk/cpu”


Step 2 — Narrow Down

If disk suspected:

Bash
iostat -x 1

If CPU:

Bash
top

If memory:

Bash
free -m

Step 3 — Root Cause

  • Identify process
  • Identify pattern
  • Identify spike source