COLLECT

Contents

1. Synopsis
2. Flags
3. Description
4. Data Conversion and Filtering
5. Data Fields
  5.1 Process
  5.2 Disks
  5.3 Tapes
  5.4 LSM Volumes
  5.5 CPU Summary
  5.6 Single CPU
  5.7 Memory
  5.8 Filesystems
  5.9 Network
  5.10 Message Queues
  5.11 Tty I/O
6. Examples
7. Bugs/Restrictions
8. Acknowledgements

1. SYNOPSIS

Collect to STDOUT collect [flags]
Collect to a file collect -f <file> [flags] (file '-' = stdout)
Playback (from file) collect -p <file> [flags] (file '-' = stdin)
Playback (from multiple files) collect -p <file> [-p <file> [-p <file>]] [flags]
Binary conversion/selection collect -p <collect data file> -f <new collect data file> [flags]

2. FLAGS

-a collect to STDOUT as well as to a binary file
-i <I>[:<PI>] set interval [process interval] to I [PI] seconds (default: 10 seconds)
-e [pmdlncfh] exclude subsystem(s) from collection/playback. subsystems are:
Proc,Mem,Disk,Lsm-Vol,Net,Cpu,Filesys,Header,Tty
-s [pmdlncfh] select subsystem(s) for collection/playback (default: all subsystems)
-S sort processes by %CPU usage
-n X select only top X processes (useful with -S)
-D rzXX[,rzXY,...,rzZZ] collect data only for disks in list (Regular Expression allowed)
-P pid[,pid,...,pid] collect data only for processes in list ('%' = collect process)
-P Ppid[,pid,...,pid] collect data only for processes whose Parent PID (PPID) is in list OR who are members of a Process Group (PGID) with the same ID.
-P Ccmd[,cmd,...,cmd] collect data only for processes whose Process Names contain above string. This can be a partial string, but must match exactly (no Regular Expressions).
-P Uuser[,user,...,user] collect data only for processes owned by the specified users. UIDs (numeric) are also allowed.
-L vol1[,vol2,...,volN] collect data only for LSM volumes in list. In order to make the names unique, the format is the diskgroup name, a slash ('/') and the volume name, for example: "rootdg/vol01". (Regular Expression allowed)
-T sum MB/sec for all disks (turns off all subsystems by default)
-t prefixes a 'tag' to all data lines to facilitate data-crunching in scripts
-C [<start-time>],[<end-time>] Chop out a series of samples from a binary file. Format is [+]Year:Month:Day:Hour:Minute:Second. The optional '+' sign at the front indicates that time time is relative to the beginning of collection. No plus means absolute time. Every field except 'Second' can be left off. For absent fields, in absolute format, the values from the beginning of collection are used (see DESCRIPTION). If <start-time> is omitted, the start of the collection period is used. If <end-time> is omitted, the end of the collection period is used.
-l seek to last valid record and print it. This is primarily used by collgui to get the ending time of the collection period.
-h print usage summary.
-v enable verbose output.
-V [-p <datafile>] show collect and current datafile version, and optionally, the datafile version of the specified datafile.
-F Print 'full' process lines which are longer that 80 columns. The process priorities are shown and the RSS and VSZ values are always in KB, rather than formatted to fit into 4 columns.
-R Set the duration for collect to (R)un. This can have one of two forms. The first can be sequence of <number><letter>[<number><letter>[...]] where number is an integer and letter is one of w,d,h,m,s where w=week, d=day, h=hour, m=minute, and s=second. For example: 4w2d5h would be 4 weeks, 2 days, and 5 hours. The second form is the same as that used by the -C option, except that a + (plus-sign) indicates the value should be relative to the current time. With no +, the value is an absolute time at which collect should exit.
-o[tmfnzl] various single options:
t: show absolute sys and user time (proc subsys)
m: show pages instead of MB for memory
f: don't ask before overwriting output file
n: don't set priority higher
z: don't compress output file
l: don't lock pages in memory
q: use instantaneous Q-lengths instead of calculated averages

3. DESCRIPTION

collect is a tool to collect operating system and process data. Although it was originally intended for benchmarking situations, it was also designed to be as flexible as possible. Any subset of the 'subsystems' (Process, Memory, Disk, Tape, Lsm-Volumes, Network, Cpu, Filesystems, Message Queue, Tty), and Header can be defined for which data is to be collected. This subset is contstructed by either selecting (-s...) or excluding (-e...) particular subsystems (using the first letter).

Because collect is meant to be as reliable as possible, as of version V1.10, it takes some steps to insure that it delivers reliable statistics: it locks itself into memory using plock(), so that it cannot be swapped out by the system, and it raises its priority using nice(). These measures should not have any impact on a system under normal load, and they should have only a minimal impact on a system under extremely high load. The locking of pages can be disabled using the -ol switch. The raising of the priority can be disabled using the -on switch.

Playing back multiple data files

As of V1.10, collect can accept multiple binary data files using the -p switch and play them back as one stream, with monotonically increasing sample numbers. It is also possible to combine multiple binary input files into one binary output file, simply by supplying the input files with the -p switch and the output file with the -f switch. collect will combine arbitrary input files in whatever order you give on the command line!. That means the input files must be in chronological order if you want to do anything with them later. collect does NOT check this. It is up to YOU! Also, collect will allow you to combine binary input files from different systems, made at different times, with differing subsets of subsystems for which data has been collected. I let common sense guide you in using this option. The usual filtering options (-s<ss>, -e<ss>, -P, -D, etc) can be used here.

Normalization of data

Where it makes sense, data is presented in units/second. That is, for a disk, data such as KiloBytes transferred or the number of transfers is always "normalized" for 1 second. This happens no matter what interval is chosen. The same is true for cpu interrupts, system calls, and context switches; memory pages out, pages in, pages zeroed, reactivated, and copied on write; network packets in and out, collisions, and KiloBytes in and out. On the other hand, things like free memory pages are (obviously) snapshot values, as are cpu states, disk queue lengths, or process memory and cpu use, to name a few.

Interval

A collection interval can be specified using -i followed by an value, optionally followed (without spaces) by comma and another value. Both values can be floating-point, i.e., a half-second interval can be specified using -i0.5. If the optional second value is given, this is a separate interval which applies only to the process subsystem. The process-interval must be a multiple of the regular interval. Collecting process information is more expensive than everything else, and furthermore is not generally needed at the same frequency as, for example, disk I/O. Also, process data is the single largest space-hog in the binary data-file. Generally, specifying a process-interval of anything greater than 1 will significantly decrease the load the collector places on the system being monitored (< 2 % of a cpu).

Specifying what data to collect

To address space problems, the -S -nX options (mnemonics: sort,num) can be used to sort by percent CPU usage and and save only X processes, or specific processes can be targeted using -P<list>, where the list is comma-separated without blanks. Also, if there are many (> 100) disks connected to the system being monitored, the -D<list> option can be used to monitor only a particular list of disks. The form for specifying a disk is exactly the same as the way collect display disk names. That is, for versions 3 and 4 of the operating system, 'rzX', where X is the unit number (examples: rz0, rz22, rz109), and for version 5 (Steel) 'dskX', where X is a sequentially assigned number not related to SCSI IDs. Because of the naming ambiguity under V4 when a disk has a LUN of zero, both -Drz23 and -Drza23 will match both 'rz23' and 'rza23'. There is no ambiguity in Steel. Regular expressions are allowed here (see the grep(1) man page). For example, all raid disks could be selected using -Dre.* (this might need to be single-quoted to keep metacharacters such as "*" from being interpreted by the shell).

Compression

collect can now read and write gnuzip format compressed datafiles, thanks to the authors of the zlib compression library. Compressed output is by default enabled. It can be disabled using the -oz switch. The gnuzip compression format is used, so output files can be decompressed with the gnu tools, and older, uncompressed datafiles can be compressed with gzip, and the resulting files can be read by collect in their compressed form. Compression during collection does not appear to generate any additional CPU load. It appears that because compression uses buffers and therefore does not write to disk after every sample, and therefore makes fewer system calls, its overall impact is negligible. There is one aspect of compression that deserves mention: because the output is buffered, if collect should be terminated abnormally (due to a system crash, for example) more samples stand to be lost than if compression is not used. This should not, however, be an important consideration for most people. The extension ".gz" will be appended to the output filename, unless it already has this extension.

Specifying a time-range from a playback file

It may be useful to select samples from a sub-period of the time that the collector ran. This can be done using the -C<start>,<end> (mnemonic: chop) option. The format is [+]Year:Month:Day:Hour:Minute:Second. The plus sign ('+') indicates that the time should be interpreted as relative to the beginning of the collection period. For example, if the collection period is from October 21, 1996, 16:44:03 to October 21, 1996, 16:54:55, -C+4:21 means display records beginning 4 minutes and 21 seconds after start of the collection period, or starting at 16:48:24. If there is no plus at the beginning, the time value is taken to be absolute. In this case, fields on the left-hand side can be omitted, in which case the values from the start of the collection period will be used in their places. Using that last collection-period example, it is sufficient to give -C46:00,47:00 which would mean from 16:46:00 to 16:47:00. However, if the collection has run overnight, it is necessary to specify the day as well. For example, if the period were Oct 21 16:44 to Oct 22 9:30, to specify a time-slice from 23:00 to 1:00, -C21:23:00:00,22:1:00:00 would be necessary.

If you want ascii output while collecting to a file, -a will do the trick.

-t (mnemonic: tags) will prefix each real data line with a unique tag. This in intended for make it easier for script-writers to extract data. (tags are superfluous if the perl script cfilt is used).

-T (mnemonic: total) will shut off collection for all subsystems except 'disk' and 'tape', and only display a summary value in MB/sec for all disks and tapes in the system. Of course, by using -s, data for subsystems can be collected (or displayed in playback mode) in spite of this.

-R (mnemonic: run) will cause collect to terminate after a specified amount of time.

All flags that can reasonably be applied to collection AND playback do. That is, -P<list> during collection will collect data ONLY for the processes specified. However during playback it will only display data for the corresponding processes. So, if you're trying to save space in the binary data file, you can limit your collection to specific processes or specific disks, or specific subsystems, but if you want to look at lots of data, different bits at a time, you should collect everything, and then use the flags during playback to see just what you want. Obviously -sh (select Header subsystem) won't be very effective during collection, but you can do it if you want :-).

4. DATA CONVERSION AND FILTERING

collect automatically converts version 7, 8, and 9 to version 10 when playing back. collect can also be used to 'convert' a version 7, 8 or 9 datafile to version 10. This is done by specifying a playback file using -p flag and specifying a file to be written using the -f flag. Most of the flags the can be used to collect or extract particular kinds of data also work during conversion, for example: -s and -e to select data only from particular subsystems, -nX and -S to take only X processes and sort them by cpu usage, -D to select disks, -L to select LSM Volumes, and -P, -PC, -PU, -PP to select processes. The timeslice (-C) mechanism does what you would expect. The interval option (-i) has no effect when converting.

5. DATA FIELDS

5.1 Processes

PID The process ID
PPID The Parent PID. Only shown when the -F flag is used.
User The username
%CPU The percent of the cpu(s) the process is currently (more or less) using
RSS Resident Set Size - Physical memory used by process; includes shared memory. When the -F flag is used, this value is in KB, otherwise it is displayed in a compact format using 4 columns. The suffixes 'K', 'M', and 'G' are DECIMAL! That is, 'K' means x1000, 'M' x1000000, and 'G' x1000000000.
VSZ Virtual memory used by process. The format is the same as for RSS.
UsrTim The cpu time in user-mode accumulated by the process in Minutes:Seconds
SysTim The cpu time in kernel-mode accumulated by the process in Minutes:Seconds
Pri The Unix priority of the process. Only shown when the -F (full) flag is used.
IBk Input Block Operations - actual filesystem blocks read or written
OBk Output Block Operations
Maj Major faults - faults that were satisfied by doing I/O (going to disk)
Min Minor faults - faults that were satisfied from cache
Command The name of the running program; arguments are not retrieved.

5.2 Disks

DSK This is an index into the table that collect spits out - for scripting
NAME The name of the disk: rz<Unit-Number>
B/T/L The Bus/Target/Lun IDs
R/S Reads per second
RKB/S KiloBytes read per second
W/S Writes per second
WKB/S KiloBytes written per second
AVS Average service time; time spent actually servicing the request - no wait time (in milliseconds).
AVW Average wait time; time spent in the wait queue (in milliseconds).
ACTQ The number of requests in the active queue (that is, being serviced by the disk).
WTQ The number of requests in the wait queue (have not yet been submitted to disk).
%BSY Percent Busy - time spent servicing requests in interval divided by interval

5.3 Tapes

NUM an index for scripting
NAME the device name, rmt<UNIT>, where UNIT is bus * 8 + target. This does NOT correspond to the device name in /dev, which is arbitrary.
B/T/L The Bus/Target/Lun IDs
R/S Reads per second
RKB/S KiloBytes read per second
W/S Writes per second
WKB/S KiloBytes written per second

5.4 LSM Volmumes

VOL Index for scripting
NAME Name in the form "Diskgroup/Volume" to insure uniqueness
R/S Reads per second
RKB/S KiloBytes read per second
W/S Writes per second
WKB/S KiloBytes written per second
RAVS Average service time for reads with respect to LSM driver (includes disk driver wait time)
WAVS Average service time for writes with respect to LSM driver (includes disk driver wait time)

5.5 CPU Summary

USER...WAIT CPU states, averaged over all CPUs
INTR Interrupts per second
SYSC System calls per second
CS Context switches per second
RUNQ Number of processes in the run queue
AVG5,30,60 Load average over the last 5, 30, and 60 seconds
FORK Number of forks/Second
VFORK Number of vforks/Second

5.6 Single CPU

CPU Index for scripts
USER percent time (ticks) spent in user-level code. This includes NICE ticks.
SYS percent time (ticks) spent in kernel
IDLE percent time (ticks) spent doing nothing
WAIT Idle ticks while waiting for I/O to happen

5.7 Memory

Free MegaBytes (or Pages -- see -tm switch) available
Swap MegaBytes (or Pages) available on swap device(s)
Act MegaBytes (or Pages) "active"
InAc MegaBytes (or Pages) "Inactive" - allocated to a process, but marked as not used in > X seconds
Wire MegaBytes (or Pages) permanently allocated by kernel
UBC Bufcache MegaBytes (or Pages)
PI Pages paged in per second
PO Pages paged out per second
Zer Pages zeroed per second - overwritten with zeroes before handing to a process
Re Pages reactivated - status changed from inactive to active
COW Copies-on-write per second
SW Processes swapped per second
HIT UBC hits per second
PP UBC pages pushed (written to disk) per second
ALL Pages allocated by UBC per second

5.8 Filesystem

FS Index for scripting
Filesystem Name of Filesystem (or Domain#Fileset in the case of AdvFS)
Capacity In MB
Free In MB

5.9 Network

Cnt Index for scripting
Name Name of Network adaptor
Inpck Packets received per second
InErr Input error packets per second
Outpck Packets sent per second
OutErr Output error packets per second
Coll Collisions per second
IKB KiloBytes received per second
OKB KiloBytes sent per second
%BW Percent of theoretical bandwidth being used (ethernet = 10Mbits/sec)

5.10 Message Queues

ID this is the ID according to ipcs
Key the key according to ipcs
OUID the owner UID of the Message Queue
BYTES the number of bytes is use for all messages in this queue
Cnt the number of messages in queue
SPID the PID of the last process to send a message on this queue
RPID the PID of the last process to read a message from this queue
STIME the time (in epoch seconds) of the last send
RTIME the time of the last receive
CTIME the creation time of this queue

5.11 Tty I/O

In number of characters input
Out number of characters output
Can portion of input chars on CANNON queue
Raw portion of input chars on RAW queue

6. Examples

collect

display to stdout, show data for all subsystems, interval is 10 seconds. (this is like running 'vmstat 10', 'iostat 10', 'netstat 10', 'volstat -i 10', and doing a 'while true; do ps ax; sleep 10; done') simultaneously.

collect -sp -S -n10 -p foo.data

playback data-file "foo.data", select only process subsystem, sort by cpu usage, and show top ten.

collect -ef -i1,5 -f foo.data

collect to "foo.data", exclude filesystem data, collect every second, except for process data, which gets collected every 5 seconds.

collect -sh -p foo.data

print info header from binary data-file "foo.data" and exit.

collect -sd -Drz0,rz1,rz8

collect to stdout, select only disk subsystem, and then only disks "rz0", "rz1", and "rz8"

collect -p /tmp/olddata.col -f /tmp/oldconverted.col

convert old datafile /tmp/olddata.col to current datafile version

collect -p data.col -f subset.col -C4:22:00,5:22:00 -sdcml

write a new datafile, selecting only Disk, Cpu, Memory, and LSM subsystems, and only for samples between 4:22:00 and 5:22:00.

7. Bugs/Restrictions

The average service time for raid (SWXCR - /dev/reXX) devices is not available.

collect is not capable of dynamically recognising new devices or hardware. That is, if you run collect, and then shove a new disk onto a bus and start using it, collect will not see the disk until it is run again. The same is true of tape drives and LSM volumes created.

Statistics for ISDN PPP connections are not available.

8. Acknowledgements

Thanks to Jean-loup Gailly (jloup@gzip.org) and Mark Adler (madler@alumni.caltech.edu) for making zlib good and freely available!