Leveraging lsof to Troubleshoot Network,
Filesystem, Native Library or Device Problems
Last update: Nov 24, 2008
This article is an excerpt from my Addison-Wesley Professional Ruby Series shortcut ”Troubleshooting Ruby Processes - Leveraging System Tools When the Usual Ruby Tricks Stop Working”, edited to fit this publication format and re-published with permission of Addison-Wesley.
1. What Is lsof?
lsof stands for “LiSt Open Files”. This shell command seems deceptively simple: It lists information about files opened by processes on a UNIX box.
Despite its (apparent) modest mission statement, lsof is actually one of the most powerful and useful UNIX commands. Its raw power comes from one of UNIX’s design principle often described as ”in UNIX everything is a file”. What this means is that the lsof concept of an open file not only covers regular files but also the following:
- Directories
- Streams or network files (for example, Internet or UNIX domain sockets and NFS files)
- Native libraries (for example, .soor .dylibdynamic libraries linked to a process)
- Block and character special files (for example, disk volume, external hard drive, console, or mouse)
- Pipes
Wait, I Cannot Find lsof on My System!
lsof is such a popular tool that it has been ported to pretty much all UNIX dialects (Linux, Mac OS X, BSD, Solaris, and so on). If it is unavailable on your box, use your usual package management system to install it. You can find lsof packages for Solaris on Sun Freeware.
This wide scope offers monitoring capabilities for a great range of resources. For instance, a single lsof command suffices to check the state of all opened Internet sockets on your machine and figure out which process owns each socket. In fact, lsof is so flexible and powerful that most users, including myself, do not fully utilize its incredible power. Unsurprisingly, lsof is also one of the most popular UNIX tools and has been ported to every UNIX dialect under the (IT) sun. Furthermore, not only will you find lsof on any UNIX machine, but you will find that it also works and behaves in the same way in all environments (as opposed to netstat, for instance).
2. What Is lsof Good For?
lsof is especially good for troubleshooting problems related to file access, network access, and native libraries. It essentially provides a static view of resource usage. If you need a better understanding of the dynamics of the process or have a history of system resource usage, it is time to switch to dynamic tools like strace or DTrace.
3. Usage
This document is not a lsof reference detailing all lsof options. That would be a book in its own right. The objective is more to give you a concrete picture of lsof and cover its the most useful usage scenarios. Hopefully this will get you interested enough to investigate the lsof quick start and the lsof man page.
3.1. Typical Usage Scenarios
If you invoke lsof without an option or parameter, it lists all open files belonging to all active processes. Note that lsof is typically installed with only enough privileges to list resources attached to your processes, so if you want to peek at all open files on your system, you need to launch lsof with root user privileges. For instance
sudo lsof
This command generates a lot of output, looking like this:
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
init 1 root cwd DIR 8,2 4096 2 /
...
postmaste 7209 postgres cwd DIR 8,2 4096 212984 /var/lib/postgresql/8.1/main
postmaste 7209 postgres rtd DIR 8,2 4096 2 /
postmaste 7209 postgres txt REG 8,2 2958468 183890 /usr/lib/postgresql/8.1/bin/postgres
postmaste 7209 postgres mem REG 8,2 151296 680474 /usr/lib/libk5crypto.so.3.0
postmaste 7209 postgres 0r CHR 1,3 9845 /dev/null
postmaste 7209 postgres 1w REG 8,2 65604 18244 /var/log/postgresql/postgresql-8.1-main.log
postmaste 7261 postgres 4w FIFO 0,6 22065 pipe
postmaste 7209 postgres 4u unix 0xf7897e40 21971 /tmp/.s.PGSQL.5433
postmaste 7209 postgres 3u IPv4 21969 TCP localhost:5433 (LISTEN)
postmaste 7209 postgres 5u IPv4 21976 UDP localhost:32769->localhost:32769
postmaste 7209 postgres DEL REG 0,8 32769 /SYSV0052e6a9
...
Xorg 7003 root mem CHR 1,5 2882 /dev/zero
Xorg 7003 root mem CHR 195,0 21517 /dev/nvidia0
Xorg 7003 root 14u CHR 13,63 4178 /dev/input/mice
...
In practice, though, you will need to scope lsof output in a more controlled manner. To list only files opened by a particular process, use this:
sudo lsof -p <pid>
Replace <pid> in the preceding command with the PID1 of the process that you want to investigate.
To view the files opened by all processes executing a command starting with ruby, here is a more convenient variant:
sudo lsof -c ruby
To display only files opened by a particular user (say www), use this:
sudo lsof -u www
To find the processes that have the /tmp/obscure.lockfile open, use this:
sudo lsof /tmp/obscure.lock
To discover all the files opened on the /dev/sda1 device (suppose that you cannot unmount it because the device is busy), use this:
sudo lsof /dev/sda1
To list all the ports and address of current Internet connections on your system, use this:
sudo lsof -i
To make the previous command run faster by inhibiting port numbers and network number conversions, use this:
sudo lsof -Pni
To discover all applications using any protocol on any port of host ph7spot.com, use this:
sudo lsof -i @ph7spot.com
To display all applications using any protocol on ports 3000, 3001, or 3002 of host ph7spot.com, use this:
sudo lsof -i @ph7spot.com:3000-3002
To figure out which rogue process is holding up the port your application insists on starting on (say 2001), use this:
lsof -i :2001
To inspect all networking related to ports 3000 to 4000, use this:
lsof -i :3000-4000
To list the files whose file descriptor are 24 or 64 for every process, use this:
sudo lsof -d24,64
3.2. Combining Multiple Selections
You can combine multiple options when invoking lsof. By default, selections are ORed. Therefore,
lsof -p 1789 -u mysql
lists all files opened by the mysql user or the process whose PID is 1789.
To AND the selection, use the -a option. For example, to list all opened sockets that belong to processes owned by user www, launch this:
lsof -Pni -a -u www
To inspect Internet connections for Ruby processes, use this:
sudo lsof -i -a -c ruby
Similarly, you can find out which file is open as file descriptor 51 for the process whose PID is 1789 with the following:
lsof -d 51 -a -p 1789
The -u, -g, -p, and -d options provide you with yet another way to combine multiple selections. You can provide multiple IDs as a comma-separated list. All these IDs are then joined in a single ORed set before participating in an AND option selection:
sudo lsof -p 1515,1789,1998
The preceding command lists files being used by processes whose pids are 1515, 1789,or 1998. You can even prefix some IDs with a ^ character to exclude any output related to this ID. For instance, to exclude any file opened by a process running as root from lsof output, use this:
lsof -u ^root
Such exclusion is applied without ORing or ANDing and takes effect before any other selection criteria are applied.
4. Concrete Examples Using lsof to Troubleshoot Problems in Real-Life
4.1. Checking That a Mongrel Cluster Is Up and Listening on the Right Ports
lsof provides you with a quick and easy way to check that your Mongrel cluster started properly and is listening to incoming connections on the right ports. Suppose that you have a cluster of ten servers running on ports 5000 and up. You can check that everything is fine by using the following:
sudo lsof -Pni :5000-5009
Make sure that all the TCP sockets are in a LISTEN state on all the ports:
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
mongrel_r 31657 www 3u IPv4 15065029 TCP localhost:5000 (LISTEN)
mongrel_r 31660 www 3u IPv4 15065035 TCP localhost:5001 (LISTEN)
mongrel_r 31663 www 3u IPv4 15065041 TCP localhost:5002 (LISTEN)
mongrel_r 31666 www 3u IPv4 15065047 TCP localhost:5003 (LISTEN)
mongrel_r 31669 www 3u IPv4 15065053 TCP localhost:5004 (LISTEN)
mongrel_r 31672 www 3u IPv4 15065059 TCP localhost:5005 (LISTEN)
mongrel_r 31675 www 3u IPv4 15065065 TCP localhost:5006 (LISTEN)
mongrel_r 31678 www 3u IPv4 15065071 TCP localhost:5007 (LISTEN)
mongrel_r 31681 www 3u IPv4 15065077 TCP localhost:5008 (LISTEN)
mongrel_r 31684 www 3u IPv4 15065083 TCP localhost:5009 (LISTEN)
If you also feel like checking the database connections while you’re at it (assuming that you only run one Mongrel cluster on that box), use this:
sudo lsof -Pni -a -c mongrel
You get something like this for a Rails application connected to multiple databases:
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
mongrel_r 31657 www 3u IPv4 15065029 TCP localhost:5000 (LISTEN)
mongrel_r 31657 www 6u IPv4 15065280 TCP web.server.net:50602->db.server.net:postgres (ESTABLISHED)
mongrel_r 31657 www 8u IPv4 15065284 UDP localhost:36675
mongrel_r 31657 www 9u IPv4 15065286 TCP web.server.net:50603->db.server.net:1521 (ESTABLISHED)
mongrel_r 31657 www 10u IPv4 15066499 UDP localhost:36693
mongrel_r 31657 www 11u IPv4 15066501 TCP web.server.net:50974->db.server.net:1521 (ESTABLISHED)
mongrel_r 31657 www 12u IPv4 15074114 TCP web.server.net:53647->db.server.net:mysql (ESTABLISHED)
mongrel_r 31660 www 3u IPv4 15065035 TCP localhost:5001 (LISTEN)
mongrel_r 31660 www 6u IPv4 15065304 TCP web.server.net:50607->db.server.net:postgres (ESTABLISHED)
mongrel_r 31660 www 8u IPv4 15065308 UDP localhost:36676
mongrel_r 31660 www 9u IPv4 15065310 TCP web.server.net:50608->db.server.net:1521 (ESTABLISHED)
mongrel_r 31660 www 10u IPv4 15066219 UDP localhost:36689
mongrel_r 31660 www 11u IPv4 15066221 TCP web.server.net:50887->db.server.net:1521 (ESTABLISHED)
mongrel_r 31660 www 13u IPv4 15077072 TCP web.server.net:54539->db.server.net:mysql (ESTABLISHED)
...
4.2. Checking That You Are Using a Native Database Driver
If you are using a database from Ruby and plan to deploy your application in production, make sure that you use the native database driver (mysql gem for MySQL, pg gem for PostgresQL), you will save yourself time and headaches. The pure Ruby drivers bundled with Rails are convenient for getting started, but they are not very stable and run significantly slower than the native ones.
One way to check that the MySQL native drivers are installed properly for a Rails application is to launch script/console and make sure that require_library_or_gem 'mysql' succeeds. Nevertheless, lsof provides a quicker and bulletproof alternative. Just make sure that your Ruby process loaded the mysql native library by typing the following:
lsof -p <pid> -a -d mem | grep mysql
Replace <pid> in the preceding command with the PID of your Ruby process. If you are using the native drivers, you should see a line similar to this:
mongrel_r 20384 www mem REG 8,2 91302 625398 /usr/local/lib/ruby/site_ruby/1.8/ i686-linux/mysql.so
4.3. Detecting Connection Leaks
You can turn lsof into a powerful monitoring tool by having it periodically refresh its output. Use the -r option:
lsof -r 2 -p <pid> -a -i TCP
The preceding command monitors all TCP connections for a particular process, refreshing its output every 2 seconds. That can be useful for detecting connection leaks.
If you suspect that your Rails application might be leaking Oracle database connections, you could launch.
lsof -r 10 -c mongrel -a -i :1521
(Oracle typically listens on port 1521).
You can then do the following.
- Warm up your application with a reasonable load.
- Look at
lsofoutput and count the number of connections. - Put your application under load for a while.
- Check lsof
outputagain and verify that the number of connections is the same; if not, it is likely that you do have a connection leak.
You can also use a similar technique to check that your application is not leaking file descriptors and is closing files properly.
5. Exploring Other Tricks
Although it’s possible to come up with tons of examples on how to use lsof to solve a full range of problems, I want to keep this article reasonably focused. I encourage you to explore lsof usage on your own and come up with your own tricks. How would you find out the current working directory of a process using for instance?2
The Web is also a great resource for discovering ideas on uses for lsof: from security audit to file recovery, to understanding why you can’t empty your Mac OS trash can.
6. Exploring Other lsof Options
lsof has a lot of options that I won’t cover in this article. My objective is to get you interested enough that you start exploring lsof on your own (Please the beginning of ”Usage” section for pointers to refrence documents). in fact, even the options that I did cover are often more flexible than what I presented. For instance, the -c option also understands regular expressions, so you can use it like this:
sudo lsof -Pni -a -c ‘/^ruby|^m.ngrel|^memcache.?$/’
There are three modes that I want to point out, though, because they are especially useful: repeat mode, field output, and terse output.
6.1. Repeat Mode
As briefly covered when discussing how to use lsof to track connection leaks and introducing the -r option in the previous section, ”Detecting Connection Leaks”, you can turn lsof into a powerful monitoring tool by using its repeat mode. It is worth noting that using lsof repeat mode is more efficient than using the watch command or a custom shell script, because you avoid the lsof startup overhead.
When you use the -r repeat mode, lsof exits only if it is interrupted (Ctrl+C) or receives the QUIT signal. There is a variant, though: the +r repeat mode. When you use it, lsof exits the first time that no open file matches the selection criteria. This is useful for triggering a specific action in a script and is often used for supervisory purposes. To make this mode even more script-friendly, lsof exit code is meaningful (0 if any open files were ever listed, 1 if none were ever listed). Finally, scripted usage of the +r repeat mode is even more useful when coupled with field output, which we cover next.
6.2. Field Output
By default, lsof output is formatted to be easily read by a human in a terminal window. This mode is usually called “formatted display”. You can change this behavior by switching to field output mode designed for scripting use. lsof output then produces output that other programs can easily parse. Read the “Output for Other Programs” and the “-F” sections of the lsof man page for more details. One of the cool things about field output mode is that it is relatively homogeneous across UNIX dialects, making it easier to write portable scripts.
6.3. Terse Output
Another useful feature for scripting with lsof is the terse output mode. When you activate it with the -t option, lsof does not output the first line header, suppress all warning messages, and only output the pids of the processes with open files matching the selection criteria. This mode is especially useful when used in combination with the kill command. The command
kill -9 `lsof -t /tmp/obscure.lock`
stops any process holding up the /tmp/obscure.lockfile, for instance.
7. Conclusion
The more time you invest in learning lsof, the more you’ll be amazed by its incredible power and the more you’ll use it in your everyday tasks: With lsof you will quickly answer questions and solve problems that were previously tedious if not impossible to crack.
So what are the next steps? Practice using and incorporating lsof into your daily routine. Don’t forget to search the web to learn new tricks and to stimulate your imagination. As you start your journey to become a ‘lsof’ master, make sure not to forget to read the lsof quick start and the lsof man page.
I hope you enjoyed this article. As usual, don’t hesitate to provide feedback or share your favorite lsof tricks!
What Next?
Talk back
Post the first comment to share the love or get a discussion going!
Bookmark it
You can bookmark this document directly or by a simple click to Digg, del.icio.us or Reddit.
Recommend me
If you have enjoyed this article, you might consider recommending me on Working With Rails.
Subscribe to RSS
If you're familiar with RSS, you might want to subscribe to the PH7 RSS feed. You can use one-click subscriptions to our RSS-feed through Bloglines, Google Reader, My Yahoo, Newsgator, Rojo
