People use online platforms to seek out support for their informational and
emotional needs. A key step in interacting with others is choosing a name to
represent one's self. Here, we ask what effect does revealing one’s gender
have on receiving support? Are people who choose gender-indicative names more
likely to receive support than those with relative gender-anonymous names
(e.g., user23901 or pizzamagic) and when disparagement is
given, who is more likely to receive it?
To answer this research question, we create (i) a new dataset and
method for identifying supportive replies and (ii) new methods for inferring
gender from its performance in text and name. We apply these methods to create a new massive
corpus of 102M online interactions with gender-labeled users, each rated by
degree of supportiveness. Our analysis shows wide-spread and consistent
disparity in sup- port: identifying as a woman is associated with higher rates
of support—but also higher rates of disparagement.
As a part of the paper, we are releasing our annotated dataset for social
support, all code and materials used to construct the classifiers for support,
a pre-trained version of the gender performance model used in the paper, and, upon
request, a massive new dataset of Reddit post-reply comment pairs labeled by
gender.
Getting started (Code and Models)
-
All the code and resources for the support classifier are available on GitHub.
-
A pre-trained version of the gender performance classifier is available on pip here and its code is at on GitHub.
1. Crowdsourced annotated social support
· Aggregated support annotations (3.4MB)
This data contains aggregated support ratings for the 9,032 instances of
social media data from Reddit, StackExchange sites, and Wikipedia. If you
want to train a model, you probably want this file.
2. Crowdsourced annotated social support
· Raw crowdsourced annotations
This data contains the 9,032 instances of social media data from Reddit,
StackExchange sites, and Wikipedia, rated for supportiveness. This file is
only the raw annotator ratings.
3. Gender-labeled Reddit Conversations
· Reddit comment IDs (42MB)
This data contains the Reddit comment IDs for 102M pairs of interactions
where at least one participant was labeled by the classifier with either a
high-confidence gender prediction (p > 0.9 or p < 0.1) or a
low-confidence (0.45 < p < 0.55) prediction. Out of respect for the
users, we are only making this data available for researchers at non-profit
institutions who fill out the following form.
Citing the paper, data, or classifier
@inproceedings{wang2018its,
title={It's going to be okay: Measuring Access to Support in Online Communities},
author={Wang, Zijian and Jurgens, David},
booktitle={Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)},
year={2018}
}
1. Gender performance displays matter
Users can choose relatively gender-anonymous names when they create their
accounts, e.g., user1234 or pizzamagic. These names
provide some degree, yet it might affect how other people interact. We
find that, indeed, other individuals treat gender-anonymous users
differently than those users whose name signals some gender. Most
notably, gender-anonymous accounts tend to receive fewer supportive
replies than accounts that signal female in their name and writing. This
result suggests in terms of receipt of support, the assumed default gender
online is not male.
2. Individuals performing female in name and writing receive more support--but also more disparagement!
Looking at how male- and female-performing users differ in their receipt of
support, we find that accounts with female performances receive
substantially more support than accounts with male performances—even
when controlling for the subreddit in which those accounts converse.
However, accounts with female performances are also subjected to more
unsupportive, disparaging comments.
3. The differences in support isn't explained by the gender performance of the replier!
Could the difference in receipt of support be due to users who perform the
same gender having some behavioral preference (e.g., female-performing users
being more likely to be supportive to other female-performing users)? No,
in fact, we don't observe that the gender performance of the commenter
explains much of this variance. While there are some gender interactions,
we did not observe any consistent trends. However, we do believe that more
research on this area could be a fruitful area.
GitHub: The support classifier code is available on
GitHub. For bug reports and patches on the code or for any issues you
might run into with the data, please file a GitHub issues. We also
welcome any pull requests for new features or to make the pipeline work
with other kinds of data.