SCRUB-tcpdump

Documentation

Getting the Source
Using the tool

Getting the Source

In order to check out the code for the current version of the standalone program, you will want to grab an SVN client for your system (using the command-line version in Linux/Unix/Mac OS X is good enough, while TortoiseSVN is often used by people under Windows who do not wish to utilize a command-line client). If you are using the command-line client then issue the command:

svn co https://scrub-tcpdump.svn.sourceforge.net/svnroot/scrub-tcpdump/trunk/scrubtcpdump scrub-tcpdump

This will pull the latest version of the source from the repository. Alternatively, if you are using TortoiseSVN, then you can just check out a new repository into a directory from the same URL as above.

Once the source has been pulled, you need a system with a C compiler (at the current time we only use C and not C++ to maintain compatibility with tcpdump). The system has only been tested with variants of gcc and the make system. It has been compiled on Mac OS X 10.4.10 using the native Xcode compiler and the bash prompt, and also in Linux and on Windows using the Cygwin environment. In all cases, you will need to compile and install libpcap, which is available from the tcpdump website. Build and install that first (if you prefer to use pre-built packages, use the one with the dev build from your available system) and then enter the directory to which you pulled the scrub-tcpdump source and simply issue the command 'make' there. This should produce a binary called 'scrub-tcpdump' which you can then use to anonymize packet captures in libpcap format.

Using the tool

To invoke the tool, one simply has to issue the following command:

scrub-tcpdump <input> <output> [options]
The following two options are required:

-r filename file to be read, pcap format

-w filename file to be output, pcap format

The following are options:

-i device pcap-style device name from which to capture packets

-o "anonymization sting" string of anonymization options as explained in the next paragraph

-k permutation_key optional permutation key, only necessary if using keyed randomization

-f BPF_filter_string filter string for libpcap

The source file needs to be a file in tcpdump/libpcap format, and in a version that is compatible with the version of libpcap against which you have built the tool. The output file will be in libpcap format, also. WARNING: The tool will silently over-write any existing file with the same name as the output file that you specify.

The anonymization options string is composed of a number of options, in any order, selected from the tables below. Its format follows the pattern of <field function> pairs, where the field is one of the designators from table 2 below and the function used to anonymize that field is from table 3 below. They must be paired as per the accepted pairings in Table 1. Errors in pairings will be accepted by the system with undefined results. An explanation of what each function means in the context of each field follows table 3. There will always be an even number of options in this string, the first indicating the field to be anonymized and the second the function to use in the anonymization process. For example, to anonymize the source IP address with black marker and all the ports with bilateral classification the anonymization options string should look like:

"srcip bm tcpsrcport bi tcpdstport bi udpsrcport bi udpdstport bi"

and so on. Each of the field designations must be taken from the left-hand column of Table 2 below. The function names must be taken from Table 3 below. It is up to the user to ensure that the combination of field and function makes sense (i.e. that you do not attempt to use bilateral classification on an IP address or prefix-preserving pseudonymization on a timestamp, etc) - pairings that the system supports (although errors will be silently accepted by the system with undefined and probably nefarious effects) are listed in Table 1.

Below you will find a table of all of the fields that scrub-tcpdump anonymizes and the methods that can be used to anonymize that field. After that will be a description of how to specify those options to the tool on the command line.

Tabel 1: Packet Fields and Corresponding SCRUB-tcpdump Anonymization Options
Field	Anonymizing Options
Fragmentation Flag — 3 bit Network-layer field.	Black marker Pure random permutation Keyed random permutation
IP address — 32-bit Network-layer field (source and destination may be specified independently)	Black marker Pure random permutation Keyed random permutation Prefix-preserving pseudonymization Truncation
Payload — 0-variable bits Transport-layer field	Black marker Selective black marker
Port — 16-bit Transport-layer field (TCP and UDP/source and destination - each selected independently)	Black marker Pure random permutation Keyed random permutation Bilateral classification
Sequence Number — 32-bit Transport-layer field	Black marker Pure random permutation Keyed random permutation Grouping
TCP Flags — 8-bits Transport-layer field All flags, or each flag individually	Black marker Grouping Pure random permutation Keyed random permutation
Time Stamp — pcap field	Black marker Time unit annihilation Truncation Enumeration Random time shift Pure random permutation Keyed random permutation
Time-to-Live — 8-bit Network-layer field	Black marker Pure random permutation Keyed random permutation Grouping
Total packet length — 16-bit network layer AND pcap layer	Black marker Pure random permutation Keyed random permutation Grouping
Transport Protocol Number Field — 8-bit Network-layer field	Bilateral classification(sets TCP/UDP/ICMP to 253, all others to 254) Black Marker (because 0 is an illegal value, sets all instances to 254) Keyed random permutation Pure random permutation
Window Size — 16-bit transport layer	Black marker Pure random permuation Keyed random permutation Bilateral classification

The following table lists the fields and their field designations for use in the anonymization string for use with scrubtcpdump:

Table 2: Packet Field Entities and Corresponding SCRUB-tcpdump Parameter Strings
Packet Field Entities	SCRUB-tcpdump Parameter String Specifier
TCP Source port	tcpsrcport
UDP Source port	udpsrcport
TCP destination port	tcpdstport
UDP destination port	udpdstport
TCP Flags	tcpflags
Window Size	tcpwindow
Payload	payload
Source IP address	srcip
Desination IP address	dstip
Time To Live	ttl
Total packet length (IP field)	pktlen
Transport Protocol Number Field	transportprotocol
Fragmentation Flag	fragflags
Sequence Number	sequence
Timestamp	timestamp
IP Length	iplen
Total packet length (pcap field)	pktcaplen

The following contains the names of the anonymization methods and their specifications for the anonymization string. There is also a brief description of what that anonymization method does.

Table 3: SCRUB-tcpdump Anonymization Options with Parameters and Descriptions
Anonymization Options	Parameter String Specifier	Description
Bilateral classification	bi	This is used on TCP/UDP source fields and will classify all low ports to port 1 and all high ports to port 65535. This is provided so that some information about whether the packet was bound for a low port (below 1024, and therefore one of the 'well-known' ports) or a high port (above 1024, and thus bound for an 'unspecified' port) was indicated by the original input.
Black marker	bm	This sets the specified field to all 0's (or 255 in some cases) and effectively results in a complete loss of the data that is in that field.
Enumeration	en	This will annihilate any sense of precision about the spacing of the values in the field, but will maintain the original order of items, separating all successive packets to the next available value for that field. For example, when applied to the time-stamp fields, it will order the packets in its output according to their original timestamp, but they will start counting from 0, with the second packet as having been sent at 1 second, the next packet at 2 seconds, etc.
Grouping	grN	Grouping takes all possible values for a field, partitions them into a number of mutually exclusive, exhaustive partitions, and then selects a representative member of each partition to represent all occurrances of the members of that field. For instance, if a field can take all values from 1-100, a possible partitioning scheme would be: [1-5], [6-15], [16-50], [51-100] with representative members {1, 6, 16, 51}. Then, all values in the specified field which are in the range [1-5] would be replaced by {1} while all numbers in [6-15] are replaced by {6} and so on. The set of ranges for each field are enumerated here, with the representative values following the set of ranges in curly braces: Sequence Number: [0-2¹⁰], [2¹⁰+1-2²⁰], [2²⁰+1-2³⁰], [2³⁰+1-2³²] TCP Flags: [RST/SYN/FIN]=0, [URG/ACK/PSH]=0
Keyed randomization	rp	This will take a user-specified key and use that as the basis for a randomized permutation of the field. The randomization is reproduceable if the same key is specified a again.
Prefix-preserving pseudonmymization	pp	This method will separately anonymize the subnet mask and the host portion of an IP address, using Ramasamy's TSA algorithm algoirthm, thus ensuring that all hosts in the same subnet in the source file are also mapped to the same subnet in the resulting output, but it will, obviously, be a different subnet than the original input.
Pure randomization	ra	This will map all input to a random value on output - however, all mappings of a given value X, which is randomly mapped to Y at its first encounter, will likewise be mapped to Y for the remainder of the file. This is identical to Random permutatin/keyed randomization above, but it is not reproduceable by the use of a key.
Random uniform window	rwN,M	This will randomly assign the specified values of the field to a uniformly-distributed random value over the window [N:M]
Regex catchall	re	This will eliminate all hostnames and URLs from the payload of any TCP and UDP packets.
Truncation	trN	The N in trN is actually a number, and it specifies the number of bits to be truncated (by being set to 0) from all elements in this field.

Documentation

Getting the Source

Using the tool

Tabel 1: Packet Fields and Corresponding SCRUB-tcpdump Anonymization Options