CFILT - a filter for collect


1. Synopsis
2. Flags
3. Description
4. Expressions
  4.1 subsystem
  4.1 selection criterion
  4.1 tag expression
5. Examples
6. Bugs


cfilt [-p] [-aN] [-f<input file>] expression[ expression[ ...]]


-p select only the samples that contain process data. This is useful when a separate process interval was given to collect, such as '-i1,4', but you want to graph process data against some non-process data, such as cpu idle (or anything, really).
-a<number> average values for N of samples
-f<input-file> optionally read <input-file> instead of stdin. This can also be a binary collect file, in which case collect is run as a pre-processor


cfilt allows the arbitrary selection of values from the output of collect. It condenses the output of collect into 1 line per sample, or per N samples, if using the -a flag to average N samples. The data in this form can then be graphed using gnuplot or excel (yuck).

cfilt can also be used live, that is, as a filter to collect while it's collecting and writing to standard output. Obviously, this only works if no normalization is being done, as that requires that all samples be seen so that cfilt can determine the highest value, which is then used to normalize.

The first two columns in cfilt's output are always the "epoch-second" and sample-number. The epoch-second is the internal unix time format, the number of seconds since the beginning of the "epoch", January 1st, 1970. This is extracted directly from the collect output: at the beginning of each record there is a line similar to the following:

#### RECORD 1 (873230968:160) (Tue Sep 2 22:09:28 1997) ####

in this example, epoch-seconds is 873230968. The sample-number is also extracted from this line. In this example it is 1.


An expression has the following syntax:


4.1 subsystem

can be one of: proc, disk, mem, net, cpu, sin, file, tty, and lsm (first 3 chars are significant)

if a plus-sign '+' is on the end, or no selection-criterion has been given, then numerical values are summed for all lines of a subsystem. If a selection-criterion has been provided, and there is no plus-sign on the end of the subsystem-name, then for each value in the selection-criterion, the corresponding values for each <tag-expr> will be printed. For example, given the following output from collect:

# DISK Statistics
0	 rz1  0/1/0	5   300	  10  1000   10	   0   70
1	 rz2  0/2/0	7   400	  11  2000   10	   0   80
2	 rz3  0/3/0	9   500	  12  3000   10	   0   90

Assuming that cfilt is called with the single following expression, "disk:r/s" would sum reads/second for all disks. That is, 5+7+9=21. The output of cfilt would be:

<epoch-seconds> <sample#> 21

"disk+:name=rz1,rz2:r/s" would sum reads/second for disks rz1 and rz2, 5+7=12. ("name=rz1,rz2" is a selection-criterion, which is discussed below.) The output of cfilt would be:

<epoch-seconds> <sample#> 12

"disk+:name=rz1,rz2:rkb/s+wkb/s" would sum KiloBytes read and written for disks rz1 and rz2, 300+400+1000+2000=3700, as follows ("rkb/s+wkb/s" is a "tag-expression", which is discussed below.):

<epoch-seconds> <sample#> 3700

"disk:name=rz1,rz2:r/s" would print reads/second for rz1 and reads/second for rz2, as follows:

<epoch-seconds> <sample#> 5 7

4.2 selection-criterion (optional)

A selection-criterion is a field tag (see "tag-expr") on the left of an equals-sign, and a comma-separated list of values in that field that should be selected.


examples: "pid=1234,1235,8888", "command=init", "name=rz0,rz1"

4.3 tag-expr

tags are the column-labels used by collect, for example, in the disk subsystem, the tags are 'dsk', 'name', 'b/t/l', 'r/s', and so on. A tag-expr can be anything from a complicated arithmetic expression to simply the name of a collect output field, such as "rss".

arithmetic expressions:
tag1+tag2 add values tag1 and tag2
tag1-tag2 subtract
tag1*tag2 multiply
tag1~tag2 divide tag1 by tag2
log(tag1) functions
(100-tag1)~tag2 constants and grouping

If a percent sign '%' is appended to the <tag-expr>, all values are normalized to 100, or if an integer follows the '%', then it is used instead of 100. This is useful for graphing results simultaneously

The available functions are: cos, sin, tan, sqrt, log, exp, abs, atan2, int, and convtime for converting Minutes:Seconds.TenthsHundredths to Seconds.TenthsHundreths.

If subsystem has multiple lines/sample, values are added for all lines that match in one record. (if no 'selection-critieron', then all are taken)


5. Examples

cfilt cpu:user+nice:intr%:sysc%:cs%

OUTPUT: <seconds> <sample#> <user+nice> <interrupts> <syscalls> <conswitch> (where interrupts,syscalls,conswitch are normalized)

cfilt proc+:user=urban:rss

OUTPUT: <seconds> <sample#> <RSS (resident set size) for all processes owned by urban>

cfilt cpu:idle net:inpkt+outpck% mem:free%

OUTPUT: <seconds> <sample#> <cpu:idle> <net:inpck+outpck(normalized)> <mem:free(normalized)>

cfilt pro:pid=1234,8888:rss:vsz

OUTPUT: <seconds> <sample#> <rss(pid=1234)> <vsz(pid=1234)> <rss(pid=8888)> <vsz(pid=8888)>

cfilt pro+:pid=1234,8888:rss:vsz

OUTPUT: <seconds> <sample#> <rss(sum for pid 1234,8888)> <vsz(sum)>

cfilt pro+:rss:vsz

OUTPUT <seconds> <sample#> <rss(sum all procs)> <vsz(sum for all procs)>


when normalizing and averaging are being used, the highest normalized value will not necessarily be as high as the value used for normalizing.

when summing is enabled, and a non-trivial expression is given, the expression will be executed for each line in the sample, and the results will be summed. This is probably not what you'd expect. To get approximately what you'd expect, divide by the number of lines/sample.