Network fault scenario troubleshooting skills: As an operation and maintenance staff, you must stay calm when encountering problems!

Today I will share the common troubleshooting scenarios and command tools used in Linux network troubleshooting.

When your server can't connect, the interface reports an error, the page loads slowly, or the port can't be opened... you may immediately ask:

"What went wrong?"

Don't panic! Check out these six common network failure scenarios and learn how to use Linux commands to quickly locate and troubleshoot issues efficiently!

This article only discusses server network troubleshooting ideas, and does not discuss network device troubleshooting, which is the job of network engineers.

1. The server cannot access the public network

(1) Phenomenon

  • Pinging www.baidu.com does not respond
  • curl reports an error: Could not resolve host

(2) Troubleshooting commands

ip a              # 查看是否有 IP 地址
ip route          # 查看是否有默认网关
ping 8.8.8.8      # 判断是否能 ping 通外网 IP
cat /etc/resolv.conf  # 检查 DNS 设置
  • 1.
  • 2.
  • 3.
  • 4.

(3) Analysis Guidelines

  • ip a Do you see any IP? The network card may not be enabled. Restart the network card.
  • Pinging the IP address works but the domain name doesn't → DNS issue, check /etc/resolv.conf or use dig
  • Is your DNS server address set as the private network address but not working? Try changing it to 8.8.8.8!

It is also possible that the intranet environment restricts access, and these commands are useless. Find a network engineer quickly, he did it.

2. The service is working but the connection is not working

(1) Phenomenon

  • Web page/interface request timeout
  • The program cannot connect to a port

(2) Troubleshooting commands

netstat -lntup             # 检查服务是否监听端口,或者用ss
netstat -lntup | grep 8080 # 查是否是你要的程序
iptables -L -n      # 检查是否被防火墙挡了
firewall-cmd --list-all   # 如果防火墙是firewalld就用这个,ubuntu使用ufw做防火墙
telnet localhost 8080  # 测试本地端口,如果通再从其他服务器测试该端口
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.

(3) Analysis Guidelines

  • The service is not listening on the port? The program is not running or the configuration is wrong.
  • Listening on 127.0.0.1 but no one can access it? Change it to listening on 0.0.0.0
  • Is the firewall blocking it? Remember to check firewall-cmd or ufw, as well as the cloud server's security group.

If there is no problem, then restart and try the final trick.

3. Cannot ping each other on the same LAN

(1) Phenomenon

  • A cannot ping B, but B can ping the gateway
  • Intranet communication failed

(2) Troubleshooting commands

ip a   # 查看IP
ip route   # 查看路由
ping <对方 IP>    # 互ping
arp -a    # 看看ARP缓存
tcpdump -i eth0 icmp    # 抓到看看
cat /etc/hosts.deny    # 查看是否被限制访问
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.

(3) Analysis Guidelines

  • Wrong subnet mask → routing calculation error
  • Check the ARP cache to see if the MAC address is resolved.
  • Capture packets to see if there are any ICMP requests or responses (tcpdump)
  • The hosts.deny file may also block

Also, the network engineer has banned ping, and no matter how I check, it’s useless. I have to find him and use my blame-shifting skills.

4. Website access is slow and often freezes

(1) Phenomenon

  • Users frequently report that the website is slow and unresponsive.
  • Log reports 504 Gateway Timeout

(2) Troubleshooting commands

ping www.lige.com -c 4  # 先ping一下,测试连通性
traceroute www.liege.com   # 跟踪路由,看看经过哪些设备
curl -w "@curl-format.txt" -o /dev/null -s http://your_site  # curl检测
netstat -antp | grep ESTABLISHED | wc -l
  • 1.
  • 2.
  • 3.
  • 4.

curl-format.txt can print request time details (TTFB, connection time, etc.).

(3) Analysis Guidelines

  • High ping latency? Connection issues?
  • Is the traceroute latency high or timing out on a particular hop? Is this a network bottleneck?
  • curl looks at DNS → Connection → First Packet Time

The ultimate killer move is to restart the service to see the effect. If it doesn’t work, ask the developer to optimize it.

5. The service monitoring is normal, but it cannot be accessed externally!

(1) Phenomenon

  • The service is monitoring normally, but others cannot access it
  • curl localhost OK, but curl public IP does not work

(2) Troubleshooting commands

netstat -lntup | grep 端口  # 查看端口监听情况
curl localhost:端口  # 本地端口检测
curl 公网IP:端口      # 外网检测
iptables -L
firewalld-cmd --list-all   # 防火墙检测
telnet IP  端口   # 访问处telnet检测端口,也有可能是域名访问
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.

(3) Analysis Guidelines

  • The program monitors 127.0.0.1, but cannot access it from the public network?
  • Forgot to open the port on the firewall?
  • Remember to check the cloud service provider’s security group as well!

6. Want to test if the bandwidth is too slow?

(1) Phenomenon

  • Uplink and downlink bandwidth is not fully utilized
  • User access is slow, but the server is not under pressure

(2) Troubleshooting commands

iperf3 -s  # 一台服务器作为服务端
iperf3 -c <服务端 IP>
  • 1.
  • 2.

(3) Analysis Guidelines

  • Intranet speed test to see network cable/switch bottlenecks
  • Cross-region speed testing can help troubleshoot operator/cross-border network issues

Comparison chart of scenarios and troubleshooting commands:

Scenario

Recommended commands

Unable to access the public network

ip

, ping, ip route, dig

Interface timeout/service unreachable

ss

, telnet, iptables

LAN machines cannot communicate with each other

arp

, ping, tcpdump

Slow website access

traceroute

, curl, netstat

The service is not accessible externally

ss

, firewalld, curl

Network bandwidth test

iperf3

Knowing how to use commands does not mean you can troubleshoot problems. Understanding scenarios and using commands to solve problems is what makes a master!