Tokenizing Rules

The sendmail program views the text that makes up rules and addresses as being composed of individual tokens. Rules are tokenized—divided into individual parts—while the configuration file is being read and while they are being normalized. Addresses are tokenized at another time (as we’ll show later), but the process is the same for both.

The text our.domain, for example, is composed of three tokens: our, a dot, and domain. Tokens are separated by special characters that are defined by the OperatorChars option (OperatorChars on page 1062) or the $o macro prior to V8.7:

define(`confOPERATORS', `.:%@!^/[  ]+') ← m4 configuration
O OperatorChars=.:%@!^/[  ]+            ← V8.7 and later
Do.:%@!^=/[  ]                          ← prior to V8.7

When any of these separation characters are recognized in text, they are considered individual tokens. Any leftover text is then combined into the remaining tokens:

xxx@yyy;zzz    becomes  →   xxx  @   yyy;zzz

@ is defined to be a token, but ; is not. Therefore, the text xxx@yyy;zzz is divided into three tokens.

In addition to the characters in the OperatorChars option, sendmail also defines 10 tokenizing characters internally:

(  )<>,;"\r\n

This internal list, and the list defined by the OperatorChars option, are combined into one master list that is used for all tokenizing. The previous example, when divided by using this master list, becomes five tokens instead of just three:

xxx@yyy;zzz    becomes →   xxx  @   yyy  ;  zzz

In rules, quotation marks can be used to override the meaning of tokenizing ...

Get sendmail, 4th Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

sendmail, 4th Edition by Bryan Costales, Claus Assmann, George Jansen, Gregory Neil Shapiro

Tokenizing Rules

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly