Chapter 4. Developing Inclusive Forms

Forms allow users to interact directly with a site. They are often the thing that differentiates a website from a web application.

What’s in a Name?

In Dale Carnegie’s influential 1936 self-help book, How To Win Friends and Influence People, he states “a person’s name is, to that person, the sweetest and most important sound in any language.” Names are a core part of our personal identities. We often identify with them, turn at the sound of them said across the room, and intuitively appreciate when a person we have just met remembers our names.

Unfortunately, as web developers, it is possible to make assumptions about names that lead to their incorrect handling. When working with names, we should be prepared for a variety of characters, spacing, and unique international formats.

In his article “Falsehoods Programmers Believe About Names”, Patrick McKenzie lists out 40 common misconceptions, including these assumptions:

  • People have exactly one canonical full name.
  • People’s names fit within a certain defined amount of space.
  • People’s names are written in any single character set.
  • People have last names, family names, or anything else which is shared by folks recognized as their relatives.
  • My system will never have to deal with names from China, Japan, Korea, Ireland, the United Kingdom, the United States, Spain, Mexico, Brazil, Peru, Russia, Sweden, Botswana, South Africa, Trinidad, Haiti, France, or the Klingon Empire, all of which have “weird” naming schemes in common use.

The full list is well worth a read, as it succinctly points out many potential missteps.

In her article “"Hello, My Name Is <error>”, Aimee Gonzalez-Cameron shares her story of taking the GRE, an exam administered for admission to graduate school in the United States. One of the first instructions in registering for the exam was as follows:

Important: The name you use when you register for a GRE test must exactly match (excluding accents, apostrophes and spaces) the name on the identification (ID) documents that you will present on the day of your GRE test. If it does not, you may be prohibited from taking the test or your test scores may be canceled after you take the test. For example, a last name of Fernandez de Córdova would be entered as FernandezdeCordova.

As she points out, “Students shouldn’t stress about instructions or worry that their answers will be thrown out because they can’t complete the first step correctly.” The lack of a technical system that properly handles a common American surname format is both culturally insensitive and requires extra instruction for correct handling.

Perhaps relatable from the perspective of many developers is the case of Christopher Null. Without reading further, you may already be shaking your head at the heartache that a last name of “Null” may cause when dealing with web forms. In his article, “"Hello, I’m Mr. Null. My Name Makes Me Invisible to Computers”, he details his experience using the Web with the last name of Null. Because “null” is used to represent an empty string in the majority of programming languages, it is sometimes used to check for blank form fields. Because of this, many form fields will assume the field is blank, report an error, or crash, forcing him to use a different last name.

As developers, we can take a more inclusive strategy to working with names, treating these not as edge cases, but instead by expecting a wide variety of potential inputs.

International Names

Names come in many different formats around the world; however, it is easy to apply our own cultural biases when designing systems that deal with names. As an American, for instance, my bias is to consider names in the format of a first name followed by a surname. Based on that format, I make several potentially false assumptions about things such as familial relationship. However, there are many different ways that a name can be constructed even within a single country or culture. Let’s look at a few of these structures to see how they may challenge our assumptions.

Multiple names

Many names may be longer than the “given name, family name” format. In many Spanish- and Portuguese-speaking countries, it is common to compose a name of one or two given names and two or three family names consisting of the mother’s surname followed by the father’s surname. In some cases, the conjunction de (“of”) may be added between the maternal and paternal surnames, or sometimes surnames may reflect geographic origin.

Arabic names are traditionally much longer than given and family names, often having specific meaning. This description from Wikipedia highlights the false assumptions that a non-Arabic speaking person may make about the traditional Arabic name Abdul Rahman bin Omar al-Ahmad:

With “Abdul”: Arabic names may be written “Abdul (something),” but “Abdul” means “servant of the” and is not, by itself, a name. Thus for example, to address Abdul Rahman bin Omar al-Ahmad by his given name, one says “Abdul Rahman,” not merely “Abdul”. If he introduces himself as “Abdul Rahman” (which means “the servant of the Merciful”), one does not say “Mr. Rahman” (as “Rahman” is not a family name but part of his (theophoric) personal name); instead it would be Mr. al-Ahmad, the latter being the family name.

Name order

Names do not always appear in the format of a given name followed by a family name, meaning that a typical form field of “First name” followed by “Last name,” may not produce the intended results. As an example, Chinese names place the surname before the personal name.

Rather than a family surname, Icelandic names follow a patronymic (and, occasionally, matronymic) naming format. For example, if an Icelandic man named Birgir has a son named Jón, Jón’s full name would be Jón Birgisson (“Birgir’s son”). If Jón then had a daughter named Sigrún, Sigrún would be named Sigrún Jónsdóttir (“Jón’s daughter”). Because of this, a list of Icelandic names would be expected to be sorted by given name rather than family name.

Characters

Names from many regions may consist of characters outside of the Latin alphabet. There are those that may not make use of the Latin alphabet in written form, such as Arabic, Cyrillic, or Japanese (though many of these languages also have Romanized versions, such as the Japanese name Yamada Tarō (山田太郎)). There are also accented characters such as ó, ü, and ñ. Names may also contain a mix of ß. Names may contain non-letter characters such as apostrophes (e.g., the Irish name Francis O’Neill), which forms may attempt to strip during validation as unacceptable characters.

Further Reading

These are only a few examples of how names may differ around the world. Additionally, they assume that a person’s name is derived from a single culture, ignoring the possibility that name attributes from multiple cultures may be applied to a person’s name. W3C’s “Personal Names Around the World” dives into greater detail and links to several additional Wikipedia articles discussing naming formats.

Mojibake

Mojibake is a term used to describe the garbled set of characters that are produced through an improper use of character encoding. Mojibake is typically caused by text that lacks proper (or any) Unicode encoding. Users whose names contain special characters may often see mojibake versions of their name. A quick image search for mojibake reveals many encoding issues across the Web, though it is likely that the majority go undocumented or are documented without knowing the term.

In his talk, “"Hello, my name is __________.”, developer Nova Patch found several examples of mojibake affecting users of web services. Perhaps the best-documented and consistent mojibake mangling of a name belongs to Nóirín Plunkett, who shared several instances of her mojibaked name on Twitter (see Figure 4-1).1

Figure 4-1. Nóirín’s tweets displaying mojibake in action

Perhaps one of the more impressive mojibake instances was of a Russian postal worker who hand-corrected a package’s mojibake (see Figure 4-2). This illustrates how common encoding problems can be when working with Cyrillic languages. In fact, there is even a Russian specific term for mojibake: krakozyabrı.

Figure 4-2. Hand-decoded mojibake by a Russian postal worker (image source unknown)

What Are We to Do?

Now that we’ve taken a quick look at the importance and value of names, we can consider how we can best implement name-inclusive fields in our forms. We can do this by considering the format of the field itself and the way we handle the character encoding of the field.

Input format

If possible, create name fields that are a single text input. Allow the input field to take in long names as well as accepting special characters and spaces. If possible, avoid limiting the length of the field in your database as well, so that an individual’s name is never truncated when it is returned. See Figure 4-3.

A single text input field with a "Name" label.
Figure 4-3. If possible, use name fields that are a single text input

If you plan to address the user through the web interface, email, or other means, it may be worth adding an additional field that asks “What should we call you?” (see Figure 4-4). This allows users to enter the name they most associate themselves with.

A text input field with a "What should we call you?" label.
Figure 4-4. If you will address the person, add a “What should we call you?” field

Character encoding

As we’ve seen with mojibake, character encoding can present its own unique set of challenges. To avoid the accidental mangling of names, we should permit punctuation (such as hyphens and apostrophes), allow spaces, and avoid changing character encoding formats between systems, such as form to database. A complete discussion of character encoding is beyond the scope of this book, but as a rule of thumb use UTF-8 encoding both on the front-end and the database.

In HTML, simply add the character set meta tag specifying UTF-8 to the <head> of the page:

<meta charset="utf-8">

Inclusive Gender

For many, gender is not simply the binary sex of either male or female as determined at birth. The advocacy group GLAAD defines transgender as:

An umbrella term (adj.) for people whose gender identity and/or gender expression differs from the sex they were assigned at birth. The term may include but is not limited to: transsexuals, cross-dressers and other gender-variant people.

The most cited study on transgender population numbers in the United States places the transgender population at 0.3%, or roughly 700,000 adults in the United States. According to Monica Chalabi, the author of the report and the FiveThirtyEight article “Why We Don’t Know the Size of the Transgender Population”, these numbers may be inaccurate, tending toward low, due to the lack of non-binary gender options on official forms such as the census as well as a reluctance to provide the information when asked.

To be as inclusive as possible, we can build systems that accept and respect non-binary gender options. When including gender in a form, my recommendation is to:

  • Provide male and female options
  • Provide a “custom” text input; if data collection is important, you may provide autocomplete suggestions, but still allow custom inputs
  • Offer a “prefer not to say” option

Both Facebook and Google follow patterns similar to those.

Google offers the choices of “Male,” “Female,” “Decline to state,” and “Custom” in a select menu (see Figure 4-5).

Google's gender options: male, female, decline to state, and custom.
Figure 4-5. Google offers four gender choices in a select menu

If the “Custom” option is selected, the user is presented with a text input box and a choice of pronoun to be addressed by (see Figure 4-6).

Google's "other" option.
Figure 4-6. A selection of “Custom” displays a text input and choice of pronoun

By contrast, Facebook requires a binary gender choice during account creation (see Figure 4-7).

The Facebook sign up form.
Figure 4-7. Facebook’s sign-up form presents users with a binary gender choice

However, once a user has created a Facebook account, it’s possible to select a more inclusive gender. Facebook’s pattern offers three choices: “Male,” “Female,” and “Custom.” When “Custom” is selected, users are given a text input box with autocomplete suggestions as well as a selection of pronouns to be addressed by (see Figure 4-8).

Facebook custom gender option
Figure 4-8. Facebook allows a user to select a custom gender, offers autocomplete gender suggestions, and provides a choice of pronouns

What About Titles?

Forms may often include a title field, with gendered choices such as “Mr.,” “Ms.,” and “Mrs.” Not requiring these fields or providing a text input option gives users the most control over this option. By doing so, we allow those who prefer not to use a title to do so as well as those with a non-binary gender to not be forced into using a gendered title.

In Summary

When we ask users to complete a form with personal information, we are asking about their personal identity. By considering name formats, internationalization, and gender we provide online spaces that are welcoming and inclusive to all.

Get Building Web Apps for Everyone now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.