It’s when you feel most confident troubleshooting CPU issues that the Linux Gods grin and throw a new problem your way. This guide will help you deal with those ungodly moments when both the load and frustration levels run high.

Step 1: Gather some basic info

A. Find what type of processes are consuming the CPU by running the command ‘top’. On the third line you will see something like this: Cpu(s): 0.0 us, 0.0 sy, 0.0 ni, 99.5 id, 0.0 wa. Below is a key to interpret each of these values.

  • us: Time spent performing ‘User’ or application tasks such as apache, mail services, etc.
  • sy: Time spent running ‘System’ tasks.
  • id: Time the system spends in an idle state.
  • wa: IO Wait, time spent waiting for a hard drive or block device to process the request

B. Find the processes using up that CPU. The command below will show you the top 20 processes (run this a couple times).

  • ps aux | awk ‘{print $3,$2,$1,$11,$12,$13}’ | sort -n | tail -n20

C. Find the IPs and correlated ports that are getting hit hard. This command shows you how many connections an IP Address has to a specific port.

  • netstat -ntu | cut -d: -f2 | awk ‘{print $2,$1}’ | sort | uniq -c | sort -n

What patterns can you notice? Is there an IP that is slamming your server? Is there a process that’s draining those resources? Is the IO Wait (wa%) high? Write these findings down!

Step 2: Dig Deeper

A. If it was a specific process or a couple processes consuming the CPU then it’s time to find out more about them.

  • How long has it been running?
    • ps aux | grep [Insert PID] | awk ‘{print $10}’
  • Is it a child process or does it have child processes?
    • ps -elf –forest | grep -A10 -B10 “[Insert PID]
  • How much data has it been reading and writing to disk?
    • cat /proc/[Insert PID]/io | egrep “write_bytes:|read_bytes:”
  • What files is the process connected to or using?
    • lsof -p [Insert PID]
  • Are there error logs for that process? What is going on in them?
    • tail -f /path/to/error.log
  • Has anyone logged into the server recently or anyone currently logged in?
    • last -n10
    • w
  • Is there bash history that would indicate a change made to the server?
    • history
    • cat /home/*/.bash_history

B. If there was a specific IP Address or port that was hitting your server find out more about that port and IP.

  • What service is running on that port?
    • netstat -tunlp
  • Where is that IP Address coming from?
    • curl ipinfo.io/[Insert IP]
  • What is the hostname associated with the IP?
    • nslookup [Insert IP]

With all of this information you should be able to make a very good guess as to what is going on in your system. You can now move onto actually resolving the issue.

Step 3: Resolve the issue

A. If you have determined that the process is safe to end, you can kill or hard kill the process. If you are still unsure about what the process is doing or if its safe to kill it would be wise to consult with your team and get their opinion on the subject. Otherwise, proceed with the execution.

  • kill [Insert PID]
  • kill -i [Insert PID]

B. If you have determined that the IP is safe to block, then proceed by blocking it with your firewall.

  • csf -d [Insert IP]
    • OR
  • iptables -I INPUT -s [Insert IP] -j DROP

 

You can now verify that the issue resolved, sit down, and enjoy a nice relaxing cup of tea.

Leave a Reply

Your email address will not be published. Required fields are marked *