Keep your programs from performing tasks they weren’t meant to do.
One of the more
features in NetBSD and OpenBSD is
system call access manager. With
system administrator can specify which programs can make which system
calls, and how those calls can be made. Proper use of
systrace can greatly reduce the risks inherent
in running poorly written or exploitable programs.
policies can confine users in a manner completely
independent of Unix permissions. You can even define the errors that
the system calls return when access is denied, to allow programs to
fail in a more proper manner. Proper use of
requires a practical understanding of system calls and
what functionality programs must have to work properly.
First of all, what exactly are system calls? A system call is a function that lets you talk to the operating-system kernel. If you want to allocate memory, open a TCP/IP port, or perform input/output on the disk, you’ll need to use a system call. System calls are documented in section 2 of the manpages.
Unix also supports a wide variety of C library calls. These are often confused with system calls but are actually just standardized routines for things that could be written within a program. For example, you could easily write a function to compute square roots within a program, but you could not write a function to allocate memory without using a system call. If you’re in doubt whether a particular function is a system call or a C library function, check the online manual.
You may find an occasional system call that is not documented in the
online manual, such as
You’ll need to dig into other resources to identify
these calls (
break() in particular is a very old
system call used within
libc, but not by
programmers, so it seems to have escaped being documented in the
Systrace denies all actions that are not
explicitly permitted and logs the rejection using
syslog. If a program running under
systrace has a problem, you can find out which
system call the program wants to use and decide if you want to add it
to your policy, reconfigure the program, or live with the error.
Systrace has several important pieces:
the policy generation tools, the runtime access management tool, and
the sysadmin real-time interface. This hack gives a brief overview of
we’ll learn about the
systrace(1) manpage includes a full
description of the syntax used for policy descriptions, but I
generally find it easier to look at some examples of a working policy
and then go over the syntax in detail. Since
has been a subject of recent
security discussions, let’s look at the policy that
OpenBSD 3.2 provides for named.
Before reviewing the named policy,
let’s review some commonly known facts about the
name server daemon’s system-access requirements.
Zone transfers and large queries occur on port 53/TCP, while basic
lookup services are provided on port 53/UDP. OpenBSD
/var/named by default and logs everything to
systrace policy file is in a file named
after the full path of the program, replacing slashes with
underscores. The policy file
contains quite a few entries that allow access beyond binding to port
53 and writing to the system log. The file starts with:
# Policy for named that uses named user and chroots to /var/named # This policy works for the default configuration of named. Policy: /usr/sbin/named, Emulation: native
Policy statement gives the full path to the
program this policy is for. You can’t fool
systrace by giving the same name to a program
elsewhere on the system. The
Emulation entry shows
which ABI this policy is for. Remember, BSD systems expose ABIs for a
variety of operating systems.
theoretically manage system-call access for any ABI, although only
native and Linux binaries are supported at the moment.
The remaining lines define a variety of system calls that the program may or may not use. The sample policy for named includes 73 lines of system-call rules. The most basic look like this:
/usr/sbin/named tries to use the
accept() system call to accept a connection on a
socket, under the native ABI, it is allowed. Other rules are far more
restrictive. Here’s a rule for
bind( ), the system call that lets a program request a TCP/IP
port to attach to:
native-bind: sockaddr match "inet-*:53" then permit
sockaddr is the name of an argument taken by the
accept() system call. The
match keyword tells systrace
to compare the given variable with the string
inet-*:53, according to the standard shell
pattern-matching (globbing) rules. So, if the variable
sockaddr matches the string
inet-*:53, the connection is accepted. This
program can bind to port 53, over both TCP and UDP protocols. If an
attacker had an exploit to make named attach a
command prompt on a high-numbered port, this
policy would prevent that exploit from working.
At first glance, this seems wrong:
native-chdir: filename eq "/" then permit native-chdir: filename eq "/namedb" then permit
eq keyword compares one string to another and
requires an exact match. If the program tries to go to the root
directory, or to the directory
systrace will allow it. Why would you possibly
want to allow named to access the root
directory? The next entry explains why:
native-chroot: filename eq "/var/named" then permit
We can use the native
chroot() system call to
change our root directory to
/var/named, but to
no other directory. At this point, the
directory is actually
/var/named/namedb. We also
know that named logs to syslog. To do this, it
will need access to
native-connect: sockaddr eq "/dev/log" then permit
This program can use the native
call to talk to
/dev/log and only
/dev/log. That device hands the connections off
We’ll also see some entries for system calls that do not exist:
native-fsread: filename eq "/" then permit native-fsread: filename eq "/dev/arandom" then permit native-fsread: filename eq "/etc/group" then permit
certain system calls with very similar functions into groups. You can
disable this functionality with a command-line switch and only use
the exact system calls you specify, but in most cases these aliases
are quite useful and shrink your policies considerably. The two
fsread is an alias for
under the native and Linux ABIs.
fswrite is an
rmdir(), in both the native and Linux ABIs.
open() can be used to either read or write a
file, it is aliased by both
fswrite, depending on how it is called. So
named can read certain
files, it can list the contents of the root directory, and it can
access the groups file.
Systrace supports two optional keywords at the
end of a policy statement,
errorcode is the error that is returned when the
program attempts to access this system call. Programs will behave
differently depending on the error that they receive.
named will react differently to a
“permission denied” error than it
will to an “out of memory” error.
You can get a complete list of error codes from the
errno manpage. Use the error name, not the error
number. For example, here we return an error for nonexistent files:
filename sub "<non-existent filename>" then deny[enoent]
If you put the word
log at the end of your rule,
successful system calls will be logged. For example, if we wanted to
log each time named attached to port 53, we
could edit the policy statement for the
call to read:
native-bind: sockaddr match "inet-*:53" then permit log
You can also choose to filter rules based on user ID and group ID, as the example here demonstrates.
native-setgid: gid eq "70" then permit
This very brief overview covers the vast majority of the rules you
will see. For full details on the
grammar, read the
If you want some help with creating your policies, you can also use
systrace’s automated mode
The original article that this hack is based on is available online at http://www.onlamp.com/pub/a/bsd/2003/01/30/Big_Scary_Daemons.html.