I was having a discussion with my housemate who is a data analyst by trade, and the conclusion that we came to is that there are two sensible options here, depending on the amount of work you personally want to do (we’re assuming here that the collection of gender data is actually useful to you, rather than simply of interest in which case it is almost always better to leave it out).
The simple option is to have three or four discrete options:
Other, and possibly
Prefer not to say. In my experience, this is the most acceptable option for gathering data while being both simple and inclusive – it acknowledges that there are people who don’t fit the gender binary, allows users to select a different option, and doesn’t overload your cisgendered users with lots of options. It also allows people to completely opt out if they really don’t want to answer (the standard objection is that it’ll negatively impact your data collection, but in practice it probably doesn’t make much of a difference). Note that if gender identity is particularly important to your application, then this may not be the most sensible or inclusive option.
The ideal but more complex option is to have a textbox and suck it up – it’s a data sanitisation problem. A simple find/replace on your dataset will be able to lump your users into a group of man/male/boy responses, a group of woman/female/girl responses, and a group of assorted other responses. Crucially if you’re doing demographic analysis, whatever is left over probably isn’t statistically significant at an individual level so in your analysis it is acceptable to put them in an internal
Other category. You can then preserve that minority data for further study should you find you need it.
Alternatively, as noted in the comments, it may be possible to combine the two approaches. Once a user selects your
Other option, you could then display a text box which allows them to specify their gender identity exactly. This has the benefit of minimising cognitive load on cisgendered users while also capturing specific minority data. The downsides are that you may still run into issues sanitising this data to make it useful, and your form must be able to handle revealing a hidden element.
Gender is the correct label for this field, from a descriptive point of view and from a data collection point of view. You’d be surprised how many people think it’s hilarious to answer
Sex: with “Yes please”.
If you choose to go with the simple dropdown/radio button approach, then
Other is probably the most appropriate label for the third group. It is easily understandable, and non-exclusive in terms of what it might represent.
Transgender is probably not an appropriate label here unless you include additional ones because it excludes people outside the binary who are not transgender or who do not view the label as appropriate for them, and it doesn’t actually tell you the respondent’s gender (transgender just tells you their gender is not the same as their assigned sex at birth). The problem with the use of the word “other” is that it is exclusionary and can potentially feel like the user is being shoved into a box of leftovers – not an ideal experience! For that reason, a text box is probably preferred if you want to make sure you’re being inclusive.
Think Outside The Box mirrors these recommendations and has some other interesting guidelines for form construction.