It's going to be okay: Measuring Access to Support in Online Communities Zijian Wang,   David Jurgens

Introduction

People use online platforms to seek out support for their informational and emotional needs. A key step in interacting with others is choosing a name to represent one's self. Here, we ask what effect does revealing one’s gender have on receiving support? Are people who choose gender-indicative names more likely to receive support than those with relative gender-anonymous names (e.g., user23901 or pizzamagic) and when disparagement is given, who is more likely to receive it?


To answer this research question, we create (i) a new dataset and method for identifying supportive replies and (ii) new methods for inferring gender from its performance in text and name. We apply these methods to create a new massive corpus of 102M online interactions with gender-labeled users, each rated by degree of supportiveness. Our analysis shows wide-spread and consistent disparity in sup- port: identifying as a woman is associated with higher rates of support—but also higher rates of disparagement.


As a part of the paper, we are releasing our annotated dataset for social support, all code and materials used to construct the classifiers for support, a pre-trained version of the gender performance model used in the paper, and, upon request, a massive new dataset of Reddit post-reply comment pairs labeled by gender.


Getting started (Code and Models)

  • All the code and resources for the support classifier are available on GitHub.
  • A pre-trained version of the gender performance classifier is available on pip here and its code is at on GitHub.

Data for download

1.   Crowdsourced annotated social support
 ·  Aggregated support annotations (3.4MB)


This data contains aggregated support ratings for the 9,032 instances of social media data from Reddit, StackExchange sites, and Wikipedia. If you want to train a model, you probably want this file.


2.   Crowdsourced annotated social support
 ·  Raw crowdsourced annotations


This data contains the 9,032 instances of social media data from Reddit, StackExchange sites, and Wikipedia, rated for supportiveness. This file is only the raw annotator ratings.


3.   Gender-labeled Reddit Conversations
 · Reddit comment IDs (42MB)


This data contains the Reddit comment IDs for 102M pairs of interactions where at least one participant was labeled by the classifier with either a high-confidence gender prediction (p > 0.9 or p < 0.1) or a low-confidence (0.45 < p < 0.55) prediction. Out of respect for the users, we are only making this data available for researchers at non-profit institutions who fill out the following form.


Citing the paper, data, or classifier

It's going to be okay: Measuring Access to Support in Online Communities. Zijian Wang and David Jurgens. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). 2018.

@inproceedings{wang2018its, title={It's going to be okay: Measuring Access to Support in Online Communities}, author={Wang, Zijian and Jurgens, David}, booktitle={Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)}, year={2018} }

Highlights

1.   Gender performance displays matter

Users can choose relatively gender-anonymous names when they create their accounts, e.g., user1234 or pizzamagic. These names provide some degree, yet it might affect how other people interact. We find that, indeed, other individuals treat gender-anonymous users differently than those users whose name signals some gender. Most notably, gender-anonymous accounts tend to receive fewer supportive replies than accounts that signal female in their name and writing. This result suggests in terms of receipt of support, the assumed default gender online is not male.


2.   Individuals performing female in name and writing receive more support--but also more disparagement!

Looking at how male- and female-performing users differ in their receipt of support, we find that accounts with female performances receive substantially more support than accounts with male performances—even when controlling for the subreddit in which those accounts converse. However, accounts with female performances are also subjected to more unsupportive, disparaging comments.


3.   The differences in support isn't explained by the gender performance of the replier!

Could the difference in receipt of support be due to users who perform the same gender having some behavioral preference (e.g., female-performing users being more likely to be supportive to other female-performing users)? No, in fact, we don't observe that the gender performance of the commenter explains much of this variance. While there are some gender interactions, we did not observe any consistent trends. However, we do believe that more research on this area could be a fruitful area.


Bugs/Issues/Discussion

GitHub: The support classifier code is available on GitHub. For bug reports and patches on the code or for any issues you might run into with the data, please file a GitHub issues. We also welcome any pull requests for new features or to make the pipeline work with other kinds of data.

David Jurgens |

Site design courtesy of Will Hamilton via Jason Chuang via Jeffrey Pennington