Leveraging lsof to Troubleshoot Network,
Filesystem, Native Library or Device Problems

Last update: Nov 24, 2008

This article is an excerpt from my Addison-Wesley Professional Ruby Series shortcut ”Troubleshooting Ruby Processes - Leveraging System Tools When the Usual Ruby Tricks Stop Working”, edited to fit this publication format and re-published with permission of Addison-Wesley.

1. What Is lsof?

lsof stands for “LiSt Open Files”. This shell command seems deceptively simple: It lists information about files opened by processes on a UNIX box.

Despite its (apparent) modest mission statement, lsof is actually one of the most powerful and useful UNIX commands. Its raw power comes from one of UNIX’s design principle often described as ”in UNIX everything is a file”. What this means is that the lsof concept of an open file not only covers regular files but also the following:

Wait, I Cannot Find lsof on My System!

lsof is such a popular tool that it has been ported to pretty much all UNIX dialects (Linux, Mac OS X, BSD, Solaris, and so on). If it is unavailable on your box, use your usual package management system to install it. You can find lsof packages for Solaris on Sun Freeware.

This wide scope offers monitoring capabilities for a great range of resources. For instance, a single lsof command suffices to check the state of all opened Internet sockets on your machine and figure out which process owns each socket. In fact, lsof is so flexible and powerful that most users, including myself, do not fully utilize its incredible power. Unsurprisingly, lsof is also one of the most popular UNIX tools and has been ported to every UNIX dialect under the (IT) sun. Furthermore, not only will you find lsof on any UNIX machine, but you will find that it also works and behaves in the same way in all environments (as opposed to netstat, for instance).

2. What Is lsof Good For?

lsof is especially good for troubleshooting problems related to file access, network access, and native libraries. It essentially provides a static view of resource usage. If you need a better understanding of the dynamics of the process or have a history of system resource usage, it is time to switch to dynamic tools like strace or DTrace.

3. Usage

This document is not a lsof reference detailing all lsof options. That would be a book in its own right. The objective is more to give you a concrete picture of lsof and cover its the most useful usage scenarios. Hopefully this will get you interested enough to investigate the lsof quick start and the lsof man page.

3.1. Typical Usage Scenarios

If you invoke lsof without an option or parameter, it lists all open files belonging to all active processes. Note that lsof is typically installed with only enough privileges to list resources attached to your processes, so if you want to peek at all open files on your system, you need to launch lsof with root user privileges. For instance

sudo lsof

This command generates a lot of output, looking like this:

COMMAND    PID   USER      FD    TYPE     DEVICE     SIZE    NODE NAME 
init       1     root      cwd   DIR         8,2     4096       2 / 
... 
postmaste  7209  postgres  cwd   DIR         8,2     4096  212984 /var/lib/postgresql/8.1/main 
postmaste  7209  postgres  rtd   DIR         8,2     4096       2 / 
postmaste  7209  postgres  txt   REG         8,2  2958468  183890 /usr/lib/postgresql/8.1/bin/postgres 
postmaste  7209  postgres  mem   REG         8,2   151296  680474 /usr/lib/libk5crypto.so.3.0 
postmaste  7209  postgres   0r   CHR         1,3             9845 /dev/null 
postmaste  7209  postgres   1w   REG         8,2    65604   18244 /var/log/postgresql/postgresql-8.1-main.log 
postmaste  7261  postgres   4w   FIFO        0,6            22065 pipe 
postmaste  7209  postgres   4u   unix 0xf7897e40            21971 /tmp/.s.PGSQL.5433 
postmaste  7209  postgres   3u   IPv4      21969                  TCP localhost:5433 (LISTEN) 
postmaste  7209  postgres   5u   IPv4      21976                  UDP localhost:32769->localhost:32769 
postmaste  7209  postgres  DEL   REG         0,8            32769 /SYSV0052e6a9 
... 
Xorg       7003  root      mem   CHR         1,5             2882 /dev/zero 
Xorg       7003  root      mem   CHR       195,0            21517 /dev/nvidia0 
Xorg       7003  root      14u   CHR       13,63             4178 /dev/input/mice 
... 

In practice, though, you will need to scope lsof output in a more controlled manner. To list only files opened by a particular process, use this:

sudo lsof -p <pid> 

Replace <pid> in the preceding command with the PID1 of the process that you want to investigate.

To view the files opened by all processes executing a command starting with ruby, here is a more convenient variant:

sudo lsof -c ruby  

To display only files opened by a particular user (say www), use this:

sudo lsof -u www 

To find the processes that have the /tmp/obscure.lockfile open, use this:

sudo lsof /tmp/obscure.lock 

To discover all the files opened on the /dev/sda1 device (suppose that you cannot unmount it because the device is busy), use this:

sudo lsof /dev/sda1

To list all the ports and address of current Internet connections on your system, use this:

sudo lsof -i

To make the previous command run faster by inhibiting port numbers and network number conversions, use this:

sudo lsof -Pni

To discover all applications using any protocol on any port of host ph7spot.com, use this:

sudo lsof -i @ph7spot.com

To display all applications using any protocol on ports 3000, 3001, or 3002 of host ph7spot.com, use this:

sudo lsof -i @ph7spot.com:3000-3002

To figure out which rogue process is holding up the port your application insists on starting on (say 2001), use this:

lsof -i :2001 

To inspect all networking related to ports 3000 to 4000, use this:

lsof -i :3000-4000

To list the files whose file descriptor are 24 or 64 for every process, use this:

sudo lsof -d24,64 

3.2. Combining Multiple Selections

You can combine multiple options when invoking lsof. By default, selections are ORed. Therefore,

lsof -p 1789 -u mysql 

lists all files opened by the mysql user or the process whose PID is 1789.

To AND the selection, use the -a option. For example, to list all opened sockets that belong to processes owned by user www, launch this:

lsof -Pni -a -u www 

To inspect Internet connections for Ruby processes, use this:

sudo lsof -i -a -c ruby 

Similarly, you can find out which file is open as file descriptor 51 for the process whose PID is 1789 with the following:

lsof -d 51 -a -p 1789 

The -u, -g, -p, and -d options provide you with yet another way to combine multiple selections. You can provide multiple IDs as a comma-separated list. All these IDs are then joined in a single ORed set before participating in an AND option selection:

sudo lsof -p 1515,1789,1998 

The preceding command lists files being used by processes whose pids are 1515, 1789,or 1998. You can even prefix some IDs with a ^ character to exclude any output related to this ID. For instance, to exclude any file opened by a process running as root from lsof output, use this:

lsof -u ^root 

Such exclusion is applied without ORing or ANDing and takes effect before any other selection criteria are applied.

4. Concrete Examples Using lsof to Troubleshoot Problems in Real-Life

4.1. Checking That a Mongrel Cluster Is Up and Listening on the Right Ports

lsof provides you with a quick and easy way to check that your Mongrel cluster started properly and is listening to incoming connections on the right ports. Suppose that you have a cluster of ten servers running on ports 5000 and up. You can check that everything is fine by using the following:

sudo lsof -Pni :5000-5009 

Make sure that all the TCP sockets are in a LISTEN state on all the ports:

COMMAND     PID USER   FD   TYPE  DEVICE SIZE NODE NAME 
mongrel_r 31657  www    3u  IPv4  15065029     TCP localhost:5000 (LISTEN) 
mongrel_r 31660  www    3u  IPv4  15065035     TCP localhost:5001 (LISTEN) 
mongrel_r 31663  www    3u  IPv4  15065041     TCP localhost:5002 (LISTEN) 
mongrel_r 31666  www    3u  IPv4  15065047     TCP localhost:5003 (LISTEN) 
mongrel_r 31669  www    3u  IPv4  15065053     TCP localhost:5004 (LISTEN) 
mongrel_r 31672  www    3u  IPv4  15065059     TCP localhost:5005 (LISTEN) 
mongrel_r 31675  www    3u  IPv4  15065065     TCP localhost:5006 (LISTEN) 
mongrel_r 31678  www    3u  IPv4  15065071     TCP localhost:5007 (LISTEN) 
mongrel_r 31681  www    3u  IPv4  15065077     TCP localhost:5008 (LISTEN) 
mongrel_r 31684  www    3u  IPv4  15065083     TCP localhost:5009 (LISTEN)

If you also feel like checking the database connections while you’re at it (assuming that you only run one Mongrel cluster on that box), use this:

sudo lsof -Pni -a -c mongrel

You get something like this for a Rails application connected to multiple databases:

COMMAND     PID USER   FD   TYPE DEVICE SIZE NODE NAME 
mongrel_r 31657  www    3u  IPv4 15065029     TCP localhost:5000 (LISTEN) 
mongrel_r 31657  www    6u  IPv4 15065280     TCP web.server.net:50602->db.server.net:postgres (ESTABLISHED) 
mongrel_r 31657  www    8u  IPv4 15065284     UDP localhost:36675 
mongrel_r 31657  www    9u  IPv4 15065286     TCP web.server.net:50603->db.server.net:1521 (ESTABLISHED) 
mongrel_r 31657  www   10u  IPv4 15066499     UDP localhost:36693 
mongrel_r 31657  www   11u  IPv4 15066501     TCP web.server.net:50974->db.server.net:1521 (ESTABLISHED) 
mongrel_r 31657  www   12u  IPv4 15074114     TCP web.server.net:53647->db.server.net:mysql (ESTABLISHED) 
mongrel_r 31660  www    3u  IPv4 15065035     TCP localhost:5001 (LISTEN) 
mongrel_r 31660  www    6u  IPv4 15065304     TCP web.server.net:50607->db.server.net:postgres (ESTABLISHED) 
mongrel_r 31660  www    8u  IPv4 15065308     UDP localhost:36676 
mongrel_r 31660  www    9u  IPv4 15065310     TCP web.server.net:50608->db.server.net:1521 (ESTABLISHED) 
mongrel_r 31660  www   10u  IPv4 15066219     UDP localhost:36689 
mongrel_r 31660  www   11u  IPv4 15066221     TCP web.server.net:50887->db.server.net:1521 (ESTABLISHED) 
mongrel_r 31660  www   13u  IPv4 15077072     TCP web.server.net:54539->db.server.net:mysql (ESTABLISHED) 
... 

4.2. Checking That You Are Using a Native Database Driver

If you are using a database from Ruby and plan to deploy your application in production, make sure that you use the native database driver (mysql gem for MySQL, pg gem for PostgresQL), you will save yourself time and headaches. The pure Ruby drivers bundled with Rails are convenient for getting started, but they are not very stable and run significantly slower than the native ones.

One way to check that the MySQL native drivers are installed properly for a Rails application is to launch script/console and make sure that require_library_or_gem 'mysql' succeeds. Nevertheless, lsof provides a quicker and bulletproof alternative. Just make sure that your Ruby process loaded the mysql native library by typing the following:

lsof -p <pid> -a -d mem | grep mysql 

Replace <pid> in the preceding command with the PID of your Ruby process. If you are using the native drivers, you should see a line similar to this:

mongrel_r 20384 www mem REG 8,2 91302 625398 /usr/local/lib/ruby/site_ruby/1.8/ i686-linux/mysql.so

4.3. Detecting Connection Leaks

You can turn lsof into a powerful monitoring tool by having it periodically refresh its output. Use the -r option:

lsof -r 2 -p <pid> -a -i TCP

The preceding command monitors all TCP connections for a particular process, refreshing its output every 2 seconds. That can be useful for detecting connection leaks.

If you suspect that your Rails application might be leaking Oracle database connections, you could launch.

lsof -r 10 -c mongrel -a -i :1521

(Oracle typically listens on port 1521).

You can then do the following.

  1. Warm up your application with a reasonable load.
  2. Look at lsof output and count the number of connections.
  3. Put your application under load for a while.
  4. Check lsof output again and verify that the number of connections is the same; if not, it is likely that you do have a connection leak.

You can also use a similar technique to check that your application is not leaking file descriptors and is closing files properly.

5. Exploring Other Tricks

Although it’s possible to come up with tons of examples on how to use lsof to solve a full range of problems, I want to keep this article reasonably focused. I encourage you to explore lsof usage on your own and come up with your own tricks. How would you find out the current working directory of a process using for instance?2

The Web is also a great resource for discovering ideas on uses for lsof: from security audit to file recovery, to understanding why you can’t empty your Mac OS trash can.

6. Exploring Other lsof Options

lsof has a lot of options that I won’t cover in this article. My objective is to get you interested enough that you start exploring lsof on your own (Please the beginning of ”Usage” section for pointers to refrence documents). in fact, even the options that I did cover are often more flexible than what I presented. For instance, the -c option also understands regular expressions, so you can use it like this:

sudo lsof -Pni -a -c ‘/^ruby|^m.ngrel|^memcache.?$/’

There are three modes that I want to point out, though, because they are especially useful: repeat mode, field output, and terse output.

6.1. Repeat Mode

As briefly covered when discussing how to use lsof to track connection leaks and introducing the -r option in the previous section, ”Detecting Connection Leaks”, you can turn lsof into a powerful monitoring tool by using its repeat mode. It is worth noting that using lsof repeat mode is more efficient than using the watch command or a custom shell script, because you avoid the lsof startup overhead.

When you use the -r repeat mode, lsof exits only if it is interrupted (Ctrl+C) or receives the QUIT signal. There is a variant, though: the +r repeat mode. When you use it, lsof exits the first time that no open file matches the selection criteria. This is useful for triggering a specific action in a script and is often used for supervisory purposes. To make this mode even more script-friendly, lsof exit code is meaningful (0 if any open files were ever listed, 1 if none were ever listed). Finally, scripted usage of the +r repeat mode is even more useful when coupled with field output, which we cover next.

6.2. Field Output

By default, lsof output is formatted to be easily read by a human in a terminal window. This mode is usually called “formatted display”. You can change this behavior by switching to field output mode designed for scripting use. lsof output then produces output that other programs can easily parse. Read the “Output for Other Programs” and the “-F” sections of the lsof man page for more details. One of the cool things about field output mode is that it is relatively homogeneous across UNIX dialects, making it easier to write portable scripts.

6.3. Terse Output

Another useful feature for scripting with lsof is the terse output mode. When you activate it with the -t option, lsof does not output the first line header, suppress all warning messages, and only output the pids of the processes with open files matching the selection criteria. This mode is especially useful when used in combination with the kill command. The command

kill -9 `lsof -t /tmp/obscure.lock`

stops any process holding up the /tmp/obscure.lockfile, for instance.

7. Conclusion

The more time you invest in learning lsof, the more you’ll be amazed by its incredible power and the more you’ll use it in your everyday tasks: With lsof you will quickly answer questions and solve problems that were previously tedious if not impossible to crack.

So what are the next steps? Practice using and incorporating lsof into your daily routine. Don’t forget to search the web to learn new tricks and to stimulate your imagination. As you start your journey to become a ‘lsof’ master, make sure not to forget to read the lsof quick start and the lsof man page.

I hope you enjoyed this article. As usual, don’t hesitate to provide feedback or share your favorite lsof tricks!


  1. PID stands for process identification number. This is a unique number that defines the process within the kernel that is assigned when the process is initiated. The PID is typically passed to process control functions to perform actions on a given process (for example, kill).

  2. Hint : -d cwd

What Next?

Talk back

Share the love or join the discussion, readers have already posted 4 comments.

Bookmark it

You can bookmark this document directly or by a simple click to Digg, del.icio.us or Reddit.

Recommend me

If you have enjoyed this article, you might consider recommending me on Working With Rails.

Subscribe to RSS

If you're familiar with RSS, you might want to subscribe to the PH7 RSS feed. You can use one-click subscriptions to our RSS-feed through Bloglines, Google Reader, My Yahoo, Newsgator, Rojo

Original web site design by: JFX diz*web.