Gender is part of our identity, influencing how we behave and interact with others (e.g. Witt & Wood, 2010; Wolf, 2000). Knowledge of gendered behaviours is important because gender is a core part of the human experience. Gender understanding may also inform those using the social web for communication purposes, from mental health professionals to marketers (Bettany et al., 2010). In particular, information about fine-grained gender expression differences may help a better understanding of how gender influences online interactions, supporting better decision-making.
Whilst human biological male and female sexes have genetic definitions, albeit with intersex and other exceptions (WHO, 2020), gender is a complex social construction with its expression varying between societies and over time. For example, higher education was an almost exclusively male trait when females were barred from UK universities but now females outnumber males in UK university degree awards (57.8% female, 42.2% male, 0.06% nonbinary in 2018) (HESA, 2019) so it has become a moderately female-associated trait. A person may declare or inhabit a gender (their
Male and female genders are associated with culturally-specific sets of behaviours, often similar to those for biological sexes, known as
Nonbinary genders encompass a range of gender identities, not all on the spectrum of male to female, and not necessarily constant over time. A nonbinary person might fall on the spectrum between male and female (demigender, transmasculine, transfeminine, genderfluid), may have multiple genders, or may be agender (no gender) and outside the male/female spectrum. Transgender people, who identify with a gender different from their birth-assigned sex, aren’t nonbinary if they identify as male or female. Nonbinary status is separate from sexual orientation. One of the few behaviours that might be expected of a nonbinary person in general (including those not on the male/female spectrum) is the preference for linguistic terms associated with their status, such as they/them pronouns (e.g. McGlashan & Fitzpatrick, 2017), although this is not universal and is language-specific (Miltersen, 2018).
It is not clear how gender influences self-presentation on the social web, beyond a few aspects that have been tested for (see below). Moreover, it is not clear that there would be
How does self-presentation in UK Twitter biographies differ by gender?
This section introduces research into gendered behaviours to set the context for the results, with additional literature drawn upon later to discuss these results.
Although gender expression differences pervade societies, there are disagreements about their underlying causes. From a social role theory perspective (Eagly, 1987), humans discover gendered behavioural expectations from birth and thus play learned gendered roles. Gendered mass media coverage (Garg et al., 2018) is part of the gender socialisation process, but routine daily interactions and expectations are presumably more important. In contrast, the biological perspective suggests that biological sex genetic differences may be the root causes of some or all female-male gender differences. An evolutionary psychology explanation of sex differences emphasises natural selection to favour traits useful for the biological sex role divisions of labour in the hunter-gatherer societies that dominated human existence (Buss, 1995). Gender expectations vary between societies (Costa, Terracciano, & McCrae, 2001) and change over time (Eagly et al., 2020), however. In the USA, for example, expectations for females have changed the most (Diekman & Eagly, 2000) and male traits have become increasingly acceptable for women (Twenge, 1997). The social role theory approach seems better able to explain the complexity of gender differences in modern life, but it is possible that neither explanation is completely wrong.
Previous research has theorised about the nature of underlying gender expression differences. Social role theory argues that males are socialised to become more agentic (e.g. assertiveness, independence, mastery) whereas females are socialised to be more communally oriented (e.g. concerned about others, expressive, friendly). These traits underly many gender expression differences (Eagly & Wood, 2016). From the biological perspective, human males may be intrinsically more interested in tangible inanimate “things” (e.g. machines) whereas females may be more interested in other humans: a people-things dichotomy. If true, this would explain many gender differences in vocational (e.g. male carpenters vs. female nurses) (Lippa, 1998), and academic (Su & Rounds, 2015) interests, although there are exceptions (e.g. cell biology is a female majority interest). Social role theory can explain the same differences since thing-oriented jobs tend to satisfy agentic goals and people-focused jobs tend to satisfy communal goals (Thelwall et al., 2019).
Although nonbinary genders have presumably existed for as long as people, they are under-researched (Matsuno & Budge, 2017). No systematic study has reported characteristics or preferences of nonbinary people, focusing instead on health (Richards et al., 2016), lived experiences (Bradford et al., 2019), or anti-nonbinary harassment (Nadal et al., 2016). Whilst there is no reason to believe that there are many behavioural patterns for nonbinary genders as a whole since they reflect a range of identities, one study has suggested that the importance of self-description words and narratives is a frequent concern for nonbinary people (Bradford et al., 2019).
People control the information they share to manage the impression that they make on others (Goffman, 1959). For example, teens sometimes post content online to appear interesting and attractive (Yau & Reich, 2019). Many factors influence the extent to which people self-disclose on the social web, including age (Nosko, Wood, & Molema, 2010) and personality (Grieve, March, & Watkinson, 2020). Lower self-esteem associates with higher self-promotion content on Facebook (Mehdizadeh, 2010). Oppressed groups, including perhaps nonbinary and transgender users, may self-censor more because of a greater need for impression management to counteract negativity towards them or negative stereotypes (Pitcan, Marwick, & Boyd, 2018). Related to this, public social media generate “context collapse”, with users unable to tailor their self-presentation to different audiences (e.g. parents, teachers, friends, colleagues, future employers, customers) (Baym & Boyd, 2012). Thus, some aspects of the self may be hidden from public profiles to appeal to a general imagined audience (Marwick & Boyd, 2011).
A content analysis of 2,633 English-language Australian/UK/US Twitter user biographies found five types of information: Relational (relationships with other people); Occupational; Political, Ethnic/religious; and Stigmatized (membership of a stigmatised group) (Priante et al., 2016). Thus, Twitter profiles contain multiple types of information as part of the owner's online identity.
There are gender differences in online presentation styles, although perhaps weaker than offline (Oberst et al., 2016). These are only known for males and females, and mainly concern US students and/or Facebook. For example, male Facebook photos are more likely to accentuate status whereas females are more likely to showcase family relationships (Tifferet & Vilnai-Yavetz, 2014). These differences extend to stylistic variations. For example, males were more likely than females to use the possessive pronoun “my” when describing their partner in Facebook messages in a study of 75,000 users (Schwartz et al., 2013). These Facebook users also showed age, personality and geographic differences in post language.
For Twitter, female tweets have been suggested to be more personal (Weathers et al., 2014). More personal information was also shared by females (and parents) in a study of 187 tweeters (Walton & Rice, 2013). Privacy controls are more important to females than males on social network sites, so the lack of these on Twitter compared to other sites, such as Facebook, may reduce female participation in Twitter (Kuo et al., 2013) and may hide female-associated private issues from Twitter. A study of 69 top-ranked tennis players found the men to focus on being a sports fan whereas the women spent more time on general brand management (Lebel & Danylchuk, 2012), showing that specific types of user may have their own gender differences.
The research design uses word association thematic analysis (Thelwall, 2021), a method to find differences between sets of texts by identifying words that occur statistically significantly more frequently in one subset than another. In this case the texts are Twitter self-descriptions and the subsets are self-descriptions created by different genders or gender groups. This is more appropriate than content analysis or thematic analysis because it can identify fine-grained differences at a level that would not be identified by human coders (Thelwall et al., 2021). The method has three stages (Figure 1).
Automatically create a large systematic sample of UK male, female and nonbinary Twitter users and their Twitter biographies (the top three boxes in Figure 1).
Automatically extract words that occur more often than expected in the Twitter biographies of each gender type (Word Association Detection).
Manually group the extracted words into themes (Word Association Contextualisation and Thematic Analysis).
It is not possible to directly sample UK Twitter users since there are no comprehensive public lists. Instead, users were found indirectly by searching for tweets and identifying the tweeters. Queries are needed for the search stage, and a set of COVID-19 queries was used for this. This virus affects almost everyone in the UK and may therefore generate a large sample of active tweeters. The following queries were submitted to Twitter every 15 minutes from 10 March to 30 June 2020, specifying English language: coronavirus, “corona virus”, covid-19, covid19, #stayhome, #socialdistancing, #quarantinelife, #lockdown, #washyourhands, #stayathome, #selfisolation, #workingfromhome, #quarantine. The Twitter Applications Programming Interface (API) was used through the free Mozdeh software to submit the queries and download the matching tweets. A complete list of users was extracted from the metadata associated with these tweets.
In July 2020 the Twitter API was used for a second time to download the Twitter biographies and locations of all the users identified above (i.e. users that had tweeted about COVID-19). Users specifying the UK as their location were retained and the remainder discarded. At the time, the Twitter API (v1.1) did not include user information along with tweets, but this is an option in the Twitter API v2.0 introduced later in 2020.
Twitter does not record gender, so this was inferred from usernames and pronouns in biographies. Lists of male and female gendered first names was obtained from public US census 1990 tables of popular first names by gender, supplemented with Gender-API.com queries for additional UK names. A name was considered to be gendered if it was used at least 90% of the time by the same male or female gender. The sources used do not record nonbinary genders and there do not seem to be specific nonbinary first names, partly because people are named by their parents before they develop a gender identity. Twitter users were characterised as male or female if the first part of their username matched either the male or female name list, otherwise they were classed as unknown.
A minority of Twitter users specify in their biographies the pronouns that they prefer to be known by (e.g. “Pronouns: they/them”). They/them pronouns are often important aspects of identity for people with nonbinary genders (e.g. Bergman & Barker, 2017; Richards et al., 2016), although not all nonbinary people exclusively use they/them pronouns. This style was checked for in all user biographies, assigning additional male, female and nonbinary genders. If this conflicted with the name-based gender assignment, then the pronouns were taken as definitive. All cases of pronoun/first name disagreement were manually checked for accuracy and pronouns were always found to be reliable. The results for all the 169 nonbinary Twitter profiles were manually checked since any errors would have a greater impact on the results. All were found to be correct in the sense of consistent with the tweeter's wider bio information.
The gender identification is sometimes wrong because: (a) nonbinary people not specifying pronouns may be assigned male or female if this is suggested by their first name, (b) males can have majority female names and females can have majority male names for a variety of reasons, including gender differences between cultures (e.g. Maria and Nicola can be Italian male names) and spelling mistakes (e.g. Lesley/Leslie), and (c) Twitter usernames may be for organisations starting with a gendered name (@MarieCurieSCO) or may not reflect the owner's name (e.g. @NinaSimoneNo1Fan). Nevertheless, this method produces lists of users that have a high probability of having the specified gender identity. All errors in the lists would reduce the power of the method used in stage 2 but would not lead to false results because they would reduce average differences between sets of users. The final sample included 161,938 female, 247,380 male, and 169 nonbinary Twitter users with a UK location. Although the nonbinary sample is much smaller than the others, the statistical methods used in the next step take this into account so that the risk of identifying misleading differences is minimised.
Systematic differences between profiles were identified at the level of individual words for the word association detection (WAD) part of the WATA method used. The word frequency approach of WAD identifies words that occur statistically significantly more often in the biographies of a given gender than in the remaining gendered biographies. For example, the word “grumpy” occurred more often in male biographies than in female biographies. Thus, males are more frequently bad tempered, more likely to consider bad-temperedness as part of their public identity or prefer this word to express their bad-temperedness. As this example illustrates, the words identified by the approach are fine-grained linguistic differences, reflecting an unknown combination of personal traits, linguistic style, and presentational preferences. This interpretational ambiguity is a practical limitation.
Before analysis profiles were processed to remove URLs and emojis and to convert plural words to singular (following English rules) to increase the statistical test power. Words were then identified as gendered if they (1) occurred in at least two profiles (to exclude typos) (2) occurred in a higher proportion of biographies from that gender than from other genders, and (3) passed a test of statistical significance. The statistical significance test used was a 2x2 chi-squared test, which is a standard approach to assess whether the difference between two proportions is statistically significant (Thelwall, 2021). This test was repeated for every word in every biography for a gender, generating a high chance of false positives due to multiple simultaneous tests. The Benjamini and Hochberg (1995) procedure was used to guard against this by increasing the significance threshold systematically to give a 99.9% chance that all words reflect underlying gender differences in preferences, if the test assumption is valid. The main test assumption that is questionable is independence: the tests assume that the likelihood of a word being used by one person is independent of its use by others in the same group, but there may be a degree of copying. This is commented on below when it seems relevant. Copying (or multiple profiles for the same person) is more likely to influence the nonbinary results because the smaller sample size makes each person more influential in the data.
This stage identified 1,474 words that are statistically significantly more used by UK males, females or nonbinaries, with a high degree of certainty. Nonbinary words were the rarest because of the smaller number of nonbinary users found. Female words were more common than male words despite females being a minority in the dataset, suggesting that females more likely to use general words that could meet statistical significance thresholds. Biography average lengths were similar (female: 8.6 words, nonbinary: 8.6 words; male: 8.3 words).
The 1,474 gendered words were clustered into groups using a modified thematic analysis (Braun & Clarke, 2013) to help identify broader patterns in the results. This was supported by an initial word association contextualisation step.
Each gendered word was manually assigned an initial theme to illustrate its typical use by the associated gender. This was a two-stage process. First, a sample of at least 10 descriptions was read (selected with a random number generator by Mozdeh) to identify the typical context of the word for its main gender user group. Second, the word was given a theme name generalizing this use, as in thematic analysis. The first stage is necessary because of polysemy and the multiple different possible contexts for a word, even when its meaning is clear. The following examples illustrate the process and the necessity for word sense and context disambiguation.
“ig” (in the female list) usually occurred to link to an Instagram profile and was tagged “Social Media”. It was tagged “Social Media” because other social media terms were mentioned in a similar context.
“Party” usually occurred in statements of political affiliations and was tagged “Politics” (even though in other contexts, the word could mean celebration).
“Mitchell” always occurred in the context of employment (e.g. “clerk at Irwin Mitchell, solicitors”), and was tagged “Job”.
“Heels” occurred in the context of shoes worn by the user (e.g. “barrister in high heels”), and was tagged “Appearance”.
“Disabled” occurred mostly in job descriptions (rather than declaring a disability) and was therefore tagged “Job”.
“Egg” occurred in both food preferences and self-evaluations (e.g. “all round good egg”) and was tagged “Multiple”.
As the last example illustrates, terms used in different contexts were classed “Multiple” rather than given additional themes. Two coders (ST and MT) carried out this stage, comparing the themes, checking discrepancies, removing themes judged incorrect, and adding extra themes to words when they represented different valid perspectives.
After the initial grouping of words into themes, the list was revisited multiple times to ensure that the themes were consistent and to recheck words that might fit into revised or new themes. Large themes were also rechecked to split them into smaller sub-themes (e.g. the Interests/Crafts subtheme was generated from the initial large Interests theme). This process continued until the results stabilised. Two coders (ST and MT) worked on this stage.
The 1,474 gendered words identified were grouped into 33 themes. Each word reflects a fine-grained offline, presentational style or linguistic gender difference. Linguistically, the results also partly depend on polysemy, presence in idioms, or common allusions/metaphors. For example, the term
Although experimental results are usually presented in numbers, graphs and tables (quantitative) or descriptive text (qualitative), here they are reported as plain lists of words because of the emphasis on fine-grained differences. The lists serve the dual functions of clarifying the theme name and showing each fine-grained difference. The methods combine singular and plural versions of words for testing and the singular versions are reported below but [s] or [es] is added when it is more intuitive to report the plural versions. The results are analyzed and compared to previous academic findings here rather than a separate Discussion section because of the wide variety of themes included.
The online supplement reports the percentage of male, female, and nonbinary descriptions using each word. There are too many terms to explain the context of each, but the online supplement gives information in cases where this is unclear and spells out acronyms when there is ambiguity (others can be looked up online). Twitter can also be searched for the terms for more context since its search matches descriptions as well as tweets. The original Twitter descriptions cannot be shared because of Twitter's terms and conditions and privacy reasons. In many cases, terms were usually in lists, giving no extra context. For example,
Twitter users described multiple aspects of themselves in their biographies. Most themes include terms from at least two genders. The absence of nonbinary terms could be due to the smaller amount of data (fewer people in this category) or the absence of underlying differences. A complete list of themes is reported here since this study is attempting to identify all gender differences in Twitter bios.
The negative words for males may reflect a greater use of self-deprecating humour (Greengross & Miller, 2008). The extensive use of qualification terms for females is surprising given that status concerns are a male trait. It is possible that women work more in health fields (e.g. nursing) where explicitly stated qualifications are important, or that they feel a greater need to give evidence of competence in a sexist society.
The large number of gender identity words for nonbinary people is partly a side-effect of the detection method (specifying pronouns). The informal term
Females and nonbinaries seem more likely to share illness information in their Twitter descriptions. Females offline tend to be more willing to share personal information and are more likely to have disabling illnesses (Crimmins, Kim, & Solé-Auró, 2011; Gordon et al., 2017), despite living longer. This may partly explain the greater reporting of long-term serious conditions by females in Twitter biographies. Males may also hide illness because they view weakness as non-masculine (Jeffries & Grogan, 2012). Given that females on Twitter are more likely to value social support when they disclose a disease, compared to males (De Choudhury et al., 2017), there may be less reason for males to disclose their conditions.
Previous studies have sometimes suggested that there are higher illness rates amongst nonbinary people, attributing this to the additional stress of being marginalised (Lefevor et al., 2019). Nevertheless, evidence of health differences is inconclusive (Scandurra et al., 2019).
There were gender differences in the words used to express opinions and only females disproportionately used supportive terms. Many of the evaluative terms could also reasonably have been classified as Self-Evaluation but the focus of the terms was on the interest rather than the person. For example, whilst
Females are expected to express emotions more and to be more compassionate (Eagly, 1987). In addition, females seem to express sentiment more directly in the social web (Thelwall, 2018). Thus, the gender differences reflect known patterns. The inclusion of some negative terms for males may reflect their greater tendency for self-deprecating humour (Greengross & Miller, 2008).
Some Twitter biographies described relationships or sexuality. Many relationship terms are gendered (e.g. sister, brother), so their presence in the list only shows that the terms are not rare. The parent relationship is interesting because the greater use by females (10.8% use a synonym of mother) than males (7.0% use a synonym of father) is probably a presentational difference rather than reflecting a demographic difference. There are substantially more mothers than fathers in the UK (UK men are 3 years older when they have children (ONS, 2019) and die 3.7 years earlier (ONS, 2020) so live 6.7 fewer years as a father. Of people in the UK that have children and are at least 13 (the Twitter minimum age), females spend 48% of their lives as mothers and males spend 42% as fathers. Based on this, and other factors being equal, Twitter mothers could be expected to outnumber Twitter fathers by 1.15 to 1 (per female or per male) but biography-declared mothers outnumber biography-declared fathers by 1.54 to 1 (per female or per male).
The greater tendency for females to report relationships and sexuality reflects a greater tendency to share personal information on Twitter (Weathers et al., 2014). The corresponding tendency for nonbinary people does not seem to have been researched before.
Biographies sometimes reported political or religious beliefs. Despite the possibility of astroturfing, there were no clearly fake profiles. Males are more likely to make their political affiliation public on social media (Bode, 2017), and less likely to discuss politics on Twitter (Koc-Michalska et al., 2019), but the results suggest that females are more willing to announce support for specific campaigns. The inclusion of several nonbinary terms perhaps reflects a greater politicisation of nonbinary people (e.g. Ndelu, Dlakavu, & Boswell, 2017), as a result of recent gender-related political conflicts about transgender rights, including around the term TERF (Trans-Exclusionary Radical Feminist) (Pearce, Erikainen, & Vincent, 2020; Weber, 2016).
Women have been historically more religious in the West than men (Miller & Stark, 2002), which aligns with
Biographies sometimes linked to or mentioned other social media sites, positioning the profile as part of a networked online presence or declaring other interests beyond Twitter. The sites reflect some known gender differences in uptake, with females being more likely to use Instagram (Pew Research Center, 2019).
Some biographies contained disclaimers, such as “views/comments my own” or “retweets not endorsements”, explained who would be blocked, or finished with a kiss. Disclaimers and blocking threats seem to be most used by males. This would align with male tweets having a less personal focus so that content might offend others or express a controversial opinion.
The home is the traditional female sphere, which may explain the greater female concern with food in profile descriptions. Women spend more time on food preparation (Tashiro & Lo, 2012), are still twice as likely as UK men to cook and shop for food (FSA, 2017) and showing an interest in food my act as a display of femininity (Rodney et al., 2017). The greater body image pressure on females may also be a contributing factor. The barbeque seems to be the traditionally most male cooking activity in some Western cultures (e.g. Molina, 2014). A male preference for spicy food has been found in some previous studies (Byrnes & Hayes, 2015; Spinelli et al., 2018) although not an association between curries and masculinity. The masculine curry association may be specifically British.
Declaring an alcohol preference may be a sociability signal in Twitter profiles. Gender differences in alcohol preferences are unsurprising since alcohol marketing is highly gendered (Atkinson et al., 2017). An early study of US students suggested that males were more likely to declare alcohol consumption online (Peluchette & Karl, 2008).
Biographies often described one or more hobbies or non-work interests, whether to characterise the user's personality or announce likely Twitter topics to potential followers. Several groups of interests only have terms from one gender, but this is partly a side-effect of the narrow categories used. There are too many interests to comment on all, but a focus on crafts is a traditionally feminine trait because of its association with the home, females read for pleasure more (DoE, 2012) and women are more likely to be key decision makers for holidays (Mottiar & Quinn, 2004), whereas males have a greater interest in “things” (Su & Rounds, 2015), including machines, and gamble more (Gambling Commission, 2019).
The nonbinary interest in games and roleplay (including the combination) does not seem to have been noted before, although nonbinary issues have been discussed in the context of gaming (Jaroszewski et al., 2018). Roleplay may seem to be a natural interest to nonbinary people since they presumably lived the early part of their lives in a gender role that did not fit them. Cosplay (dressing up as a character) is a form of roleplay that supports both femininity and gender fluidity (Nichols, 2019).
Jobs reflect interests to some extent, although gender differences in job choices have been analysed extensively elsewhere (Cortes & Pan, 2018) and so are not discussed here. The results (see Appendix) confirm systematic employment gender differences, such as female domination of health-related careers (ONS, 2018).
Male terms dominate the physical activity broad category, although the sport-related terms may refer mainly to spectator sports. Competitiveness is a male trait (e.g. Buser, Niederle, & Oosterbeek, 2014). The male habit of declaring sporting team affiliations aligns with a far greater interest in watching competitive sport for UK males (Abraham, 2020), but contrasts with German student social network users, from which women were more likely to declare group affiliations online, albeit social network groups rather than sporting groups (Haferkamp et al., 2012). Sports teams may also serve as identity markers for males. All the non-competitive physical activity words are female in the list. Fishing, swimming and cycling in the sports list can also be non-competitive physical activities, and two of these are male.
The results have many limitations. Gender identities are probably expressed differently in other social web sites and in other countries. Moreover, the differences found here for the UK may change over time. The first name heuristic used to identify gender will not work for people with names from minority cultures that are not in the list, as well as for people using fake or shortened names. The results are dependent on the variety of words available to express a meaning (e.g. soccer, football, footy), the specificity of the terms used (e.g. individual football team names, vs the sport name) and whether key terms have substantial polysemy. Thus, despite the systematic method used to identity gender differences, there may be gaps in the results. Additional gaps are also likely for nonbinary users because of the small non-binary sample size reduces the statistical power of the word association test used. The results are also affected by whether there are gendered versions of relevant words that could normally not be used by at least one gender. Thus, nongendered words (e.g. parent) are more powerful indicators of underlying gender differences in the theme than gendered words (e.g. mum).
At a deeper level, word frequency differences may reflect underlying gender differences in interests, presentation styles or linguistic choice of word to represent a concept. The most likely causes of false positives (spurious gender differences) may be imitation of bio styles between users or multiple profiles for the same user.
The word comparison approach has revealed, for the first time, that there are many fine-grained gender differences in the Twitter profiles of male, female and nonbinary UK users. These differences were clustered into 33 distinct themes. In the case of males and females, most of the themes align with prior research and confirm that these traits transfer online. Other themes, such as the prominent reporting of qualifications by females, do not seem to have been noticed before in any context and could be an online-only gender expression or a previously unnoticed offline gender expression. For nonbinary Twitter users, the results are the first systematic evidence of gender expression differences. These show that, despite the many non-binary gender identities, there may be some nonbinary gender expression commonalities compared to males and females. The results also serve as list of fine-grained male, female and nonbinary gender expression differences related to interests and online presentations of the self. Of course, the lists are statistical associations and not in any sense prescriptive or ideal characteristics.
The information about gender-associated aspects of self-presentation in the social web may be useful for those that need to communicate online or analyse online communication. For example, the lists may help organisations identify messages with unintended gender bias through addressing interests associated with one gender. They may also help generate gender-targeted messages when these are needed.