What Is Server Monitoring?
Server monitoring is the continuous collection, storage, visualization of CPU, RAM, disk, network, uptime, and application metrics, and the generation of alerts when certain thresholds are exceeded.
Modern monitoring systems answer the following questions:
- Why did performance drop?
- Which resource is the bottleneck?
- When will a crash occur?
- When will the disk fill up?
The most common monitoring stack:
- Prometheus
- Grafana
- Node Exporter
- Alertmanager
Why Is Monitoring Important?
| Uptime | Annual Downtime |
|---|---|
| 99% | 3.65 days |
| 99.9% | 8.7 hours |
| 99.99% | 52 minutes |
The purpose of monitoring is less about increasing uptime and more about reducing MTTR (Mean Time To Recovery).
With monitoring:
- Time to detect an issue: 30β60 seconds
- Without monitoring: Hours
Critical Metrics to Monitor
| Metric | Normal | Risk | Critical |
|---|---|---|---|
| CPU | 20β60% | 80% | 95% |
| RAM | 40β70% | 85% | 95% |
| Disk usage | 50% | 80% | 90% |
| Disk I/O wait | < 5% | 10% | 20% |
| Load average | # of CPU cores | +50% | 2x |
Example:
- 2 vCPU server
- Load average = 4
- CPU = 95%
β Response time 200ms β 1200ms
Monitoring Tools Comparison
| Tool | Type | Cost |
|---|---|---|
| Prometheus | Self-hosted | Free |
| Grafana Cloud | SaaS | Paid |
| Datadog | SaaS | Expensive |
| Zabbix | Self-hosted | Free |
| New Relic | SaaS | Paid |
Monitoring Architecture
Server β Node Exporter β Prometheus β Grafana
β
Alertmanager β Email / Telegram
Installation Steps
Node Exporter
wget https://github.com/prometheus/node_exporter/releases/latest/download/node_exporter.tar.gz
tar xvf node_exporter.tar.gz
cd node_exporter*
./node_exporter
Prometheus
wget https://github.com/prometheus/prometheus/releases/latest/download/prometheus-linux-amd64.tar.gz
tar xvf prometheus-linux-amd64.tar.gz
cd prometheus*
prometheus.yml
scrape_configs:
- job_name: "node"
static_configs:
- targets: ["localhost:9100"]
./prometheus --config.file=prometheus.yml
Grafana
sudo apt install grafana
sudo systemctl start grafana-server
Dashboard ID: 1860 (Node Exporter Full)
Alert Rule Example
groups:
- name: cpu_alert
rules:
- alert: HighCPUUsage
expr: avg(rate(node_cpu_seconds_total{mode!="idle"}[5m])) > 0.85
for: 2m
labels:
severity: critical
annotations:
summary: "CPU usage yΓΌksek"
Production Scenario (Before / After)
Server:
- 4 GB RAM
- 2 vCPU
- 15k daily traffic
| Metric | Before | After |
|---|---|---|
| Response time | 620 ms | 240 ms |
| Downtime | 3 hrs/mo | 15 min/mo |
| CPU spike | 50 min | 6 min |
| Disk crash | Once/mo | 0 |
Why the improvement?
- Disk alert β log cleanup
- CPU alert β bot blocking
- RAM alert β swap fix
Risks
| Error | Consequence |
|---|---|
| Too many alerts | Alert fatigue |
| Wrong threshold | False alarm |
| Monitoring CPU only | Disk crash missed |
| Not monitoring logs | Root cause not found |
When Should You Set Up Monitoring?
- If you are using a VPS
- If you have a revenue-generating site
- If traffic > 1,000/day
- If you have an API / SaaS
Monitoring = revenue protection system.
CTA
For monitoring setup, alerting, and server optimization, you can use a professional server management service.