The Bringing together Language and Behavior for Large-scale Analytical Breakthroughs Laboratory (Blablablab)

Why do we say the things we do and what does our language say about us in return? The Blablablab develops new scientific methods to answer socially-impactful questions about human behavior from what we observe people say and do. Our work broadly intersects with computational social science, natural language processing, computational sociolinguistics, data mining, and social computing. Along the way, we release new software tools and datasets for the community to help reveal the social dimensions of language, and, with a bit of luck, make our online world a better place.

The Blablablab is a part of both the University of Michigan School of Information (UMSI) and the Department of Computer Science (CSE). We widely collaborate on interdisciplinary projects with researchers and students in the Departments of Psychology and Linguistics, the School of Public Health, the Center for the Study of Complex Systems, and the Michigan Institute for Data Science. But mostly, we work with amazing people doing socially-impactful work.

It's the Blablablab (in winter 2024)!

Some Blablablabers: (left-most) Bowen Yi, Neele Falk, Hong Chen, Hua Shen, Ben Litterer, Mingqian Zheng, Lechen Zhang, Haotian Zhang, David Jurgens, Michael Jiang, Sushrita Rakshit, Bangzhao Shu, Jiaxin Pei.

UMSI

News

  • August 2024: New PhD student Eleanor Lin joins! Welcome 👋 .
  • July 2024: Five talks/posters at IC2S2 this year on a range of things from podcasts to pandemics. Come say hello to us in Philly!
  • June 2024: In a new survey paper, we analyze the state of Human-AI alignment and argue that this is best viewed as a bidirectional alignment—humans align their values with AI too! This was a great collaboration with a big team of folks from Stanford, CMU, and Google.
  • May 2024: Lots of new folks coming to the Blablablab. Postdocs Shivani Kumar and Aparup Khatua both join and we're fortunate to have four new visitors, PhD student Anders Giovanni Møller (ITU Copenhagen), postdoc Dustin Wright (U Copenhagen), and undergraduates Jonathan Ivey (Arkansas) and Jiayu Liu (UIUC)! Welcome 👋 .
  • May 2024: New papers at WWW and ICWSM that extend our work on global news to study synchrony across Europe during the pandemic and a new multilingual dataset analyzing the framing of these news articles.
  • April 2024: NLP models are often conceptualized separately from the social environment in which they operate. In a new preprint with Diyi Yang, Dirk Hovy, and Barbara Plank, we argue that NLP as a field needs to directly incorporate social awareness into its models both for understanding and considering the implications of the models .
  • April 2024: Visiting student Anna Wegmann joins! Welcome 👋 .
  • March 2024: Three papers accepted to NAACL, on estimate LLM personas via psychometrics (short story: they're not meaningful), on memes, and on empathetic alignment. Two of these wrap up effort by Blablablab alumni Naitian Zhou and Jiamin Yang. The third is a first paper by masters student co-first author Bangzhao Shu and Lechen Zhang. Congrats all!
  • January 2024: Visiting student Neele Faulk joins! Welcome 👋.
  • December 2023: Lots of new preprints out on why most LLMs don't have actually personalities, how LLMs answer differently depending on who you ask them to be, how LLMs answers on subjective tasks are more correlated with certain groups of people, and memes, so many memes (with sociolinguistics!).
  • November 2023: New work by Minje looking at how people react and behavior when revealing new aspects of themselves in social networks; and new work with a collaboration of folks at Williams College and AI2 showing that causal inference with text is hard but we can now evaluate it better
  • October 2023: Two papers accepted to EMNLP/Findings on assessing how well LLMs understand social knowledge (spoiler: most do not do well!) and quantify what happens when the news' collective attentions gets focused on one event—a media storm! Congrats to Minje, Jiaxin, Sagar, and Ben!
  • September 2023: New folks join the Blablablab: the fantastic incoming PhD student Nancy Xu and the amazing Hua Shen as a postdoc. Looking forward to seeing your research dreams come true!
  • August 2023: Dr. Minje Choi has graduated! The first Blablablaber is off to see the world, with a first stop as a postdoc at GaTech. Congratulations Minje!!!
  • July 2023: Our paper at the Social Influence in Conversations Workshop won Best Paper!! This paper was the result of a whole-lab effort during a two-day "Research Jam" to be creative and have fun researching together as a group—what a great result! Details on the Research Jam and paper are coming soon too!
  • June 2023: Our paper on the role of multilinguals in bridging communication won Best Methodology Paper at ICWSM 2023! Congrats to all!
  • June 2023: Lots of new papers: ACL paper on why shouldn't say "I love you" to your boss (the contextual appropriateness of messages), LAW paper on how annotator demographics influence different judgments, SICon paper looking at how social influence manifests in style change, and a SemEval task paper on intimacy.
  • May 2023 ICWSM 2023 papers are now live showing how people reach out to different social ties during shocks and the role of multilinguals in bridging communication. Exciting!
  • March 2023: Amazing news that Lavinia has won an NSF Graduate Research Program Fellowship! So proud and looking forward to seeing your research vision come to life! Thanks NSF!
  • March 2023: David gets a grant from the Center for Research on Learning and Teaching (CRLT) for the Improvement of Teaching to work on making his Information Retrieval class more ethical and more technical! Shout out to Safiya Umoja Noble for insight into thorny issues in IR and Nicki Washington for the 3C Fellows and motivation to put ideas to practice—and CRLT for the funding!
  • March 2023: New grant from DSO to study language dialects and culture! Super excited to make more crazy cool culture maps. Thanks DSO!
  • February 2023: David is off to the University of Stuttgart for a talk and meeting lots of great folks.
  • January 2023: Crunch time for the lab's collaboration with Snap Inc Research on a Removal task on Intimacy in different languages! Get those submission in.
  • November 2022: David gives a keynote at the Sharing Stores, Lessons Learned workshop on doing interdisciplinary research. This keynote was focused on story-time and used Dall-e to generate storybook images slides.
  • November 2022: Three papers at EMNLP looking at science journalism in the news, the state of empathy research in NLP, and a brand new annotation tool (Potato). Excited to see these all out!
  • October 2022: Excited to start a new project on understanding linguistic style with folks from USC, U Maryland, and U Birmingham (UK) as a part of the IARPA HIATUS program! Lots of great computational sociolinguistics work to come.
  • September 2022: New grant from LG AI to study how chatbots can combine our knowledge of themselves with real-world knowledge to communicate better. Thanks LG AI!.
  • August 2022: So many new Blablablabers this year! A warm welcome to Kenan Alkiek, Hong Chen, Lavinia Dunagan, Ben Litterer, Agrima Seth, and Jason Yan! Wow, so much exciting work to come this year.
  • July 2022: Five presentations by Blablablab folks at IC2S2 this year (on many different topics) with great lab attendance in person (wow!). Feel free to stay hi if you're around!!
  • July 2022: Christina Lu's amazing work on trans-exclusionary radical feminists (TERFs) gets presented at the Workshop on Online Abuse and Harms (WOAH) at NAACL!!
  • March 2022: David got awarded an NSF CAREER grant to look at prosocial behavior and hopefully make the world a better place through some crazy RCTs. Thanks NSF!!
  • February 2022: Blablablabbers Kenan Alkiek and Bohan Zhang celebrate their acceptance to ACL Findings with a new paper looking at political affiliation in Reddit.
  • December 2021: Naitian Zhou and Xingyao Wang have both received an honorable mention by the CRA for the Outstanding Undergraduate Researcher Award. The Blablablab could not be prouder of these two! Naitian and Xingyao have continued the tradition of award-winning Blablablab-ers, joining Sky Wang and Sayan Ghosh who received an honorable mentions in previous years.
  • December 2021: Naitian Zhou and Xingyao Wang have both received an honorable mention by the CRA for the Outstanding Undergraduate Researcher Award. The Blablablab could not be prouder of these two! Naitian and Xingyao have continued the tradition of award-winning Blablablab-ers, joining Sky Wang and Sayan Ghosh who received an honorable mentions in previous years.
  • November 2021: Sayan Ghosh wins best paper at W-NUT workshop for his work on identifying cultural biases in toxicity models. Great work, Sayan!
  • November 2021: New NIH R01 grant with folks from the School of Public Health and Michigan Medicine looking at how to understand reports from completed suicides across all life stages to identify new risk factors and preventative opportunaties! Thanks NIH!
  • September 2021: Kicking off a new NSF Convergence grant to look at how to better reach consensus when online platformers flag something for removal (and have everyone agree the process is fair). Looking forward to working with folks from UW and MIT. Thanks NSF!
  • September 2021: New paper at W-NUT with Sayan Ghosh and Google collaborators Dylan Baker and Vinod Prabhakaran where we show how to uncover geographic biases in pretrained toxicity models—and show, unfortunately, that common sense approaches to fixing the biases in a model don't actually change much.
  • September 2021: Welcome to new PhD student Leopele Raabe, co-advised with Misha Teplitskiy!
  • August 2021: Five papers at EMNLP this year! Congrats to Blablablabers Sky Wang, Jian Zhu, Xingyao Wang, and Jiaxin Pei. More details to come!
  • July 2021: Three talks at IC2S2 this year: one by Jiaxin Pei on his work on intimacyimmigration framing work and upcoming work on bilinguals and Looking forward to seeing all the great IC2S2 keynotes and talks.
  • June 2021: The Blablablab welcomes three REU students this summer: Athena Aghighi, Michael Geraci, and Jackson Sergeant! Welcome to summer research
  • May 2021: Congrats to MSI student Kenan Alkiek for winning the Theresa Noel Urban Blaurock Research Award for his outstanding work—Well-deserved recognition!
  • April 2021: The lab is recruiting two students for NSF funded REU positions this summer. Please see the REU page for details and how to apply!
  • April 2021: Two students from the lab were awared NSF GRFP fellowships this year: Sky Wang, who has worked on multiple research projects, and Zhizhuo Zhou, who did amazing work on the Alexa Prize team. Fantastic news and congrats to both!
  • March 2021: More good news! UMSI PhD student Minje Choi has his paper on social relationships accepted to ICWSM. Minjes work shows that different types of relationships have strong behavioral differences on Twitter, that these can be predicted, and that the nature of the relationship aids in predicting information diffusion. A great action-packed paper!
  • March 2021: Two papers accepted at NAACL! One second on computational sociolinguistics with Linguistics PhD student Jian Zhu showing how the structure of online communities modulates the rates at which they adopt new terms. The second in computational social science (and political communications!) with UMSI PhD student Julia Mendelsohn looking at how discussions of immigrants on social media are framed and the impact that has on audience engagement. Congrats to both!
  • March 2021: David gave a talk at GESIS on some of research on the framing of marginalized/politicized people! One of the silver lining of pandemic times is getting to easily connect to colleagues in Europe (who had great questions)!
  • January 2021: Our paper on prosocial conversations was accepted at the Web Conference (WWW) based on work from U-M undegrads Jiajun Bao and Yiming Zhang and summer visitor Junjie Wu, in collaboration with (now-professor!) Eshwar Chandrasekharan. We look at different dimensions of what can go right in a conversation and show that the prosocial direction of a conversation is actually predicable from its onset. Congrats to all!
  • December 2020: David gives the keynote at the PEOPLES workshop at COLING. What a wonderful group of folks and many interesting conversations and questions. Thank you Malvina, Viviana, and Barbara for the invitation!
  • December 2020: Sky Wang receives an honorable mention by the CRA for the Outstanding Undergraduate Researcher Award. Amazing work, Sky! Blablablab has been incredibly fortunate to have so many fantastic undergraduates and Sky joins Sayan Ghosh who received an honorable mention last year.o
  • September 2020: Two long papers accepted to EMNLP this year: First-year PhD student Jiaxin Pei's work on quantifying intimacy in language (with lots of cool Social Psych) and junior Naitian Zhou's work on condolence and empathy in online communities (work done as a sophormore!). Congrats to both and more details, data, and models to come soon!
  • July 2020: Whoa—our paper on identfying Russian trolls on Twitter was Best Paper Runner Up at WebSci! Congrats to all the co-authors!
  • July 2020: David gave a talk at the AKBC workshop on NLP for Scientific Texts (SciNLP) on bias in which authors are mentioned in the news stories on their published papers, based on work with Hao Peng and Misha Teplitskiy. These informal citations matter and add up to who we think of as a scientist. You can check out all the cool talks here too.
  • July 2020: The NSF has graciously awarded David and co-PI Daniel Romero an NSF grant to study the communicative and behavioral dynamics of social relationships. Thanks NSF for your support!!
  • June 2020: Summer is here!! 😎🌞⛱️ (...well, beginning after the EMNLP submission deadline) Welcome to Christina Lu and Kenan Alkiek who are joining us for the summer.
  • May 2020: Congrats to all the graduating seniors this year: Jiajun Bao (→CMU LTI, MS), Justin Chen (→GaTech, MS), Shengyu Feng, Thomas Horak, Sam Lee, Wenhao Li (→UNC PhD), Junjie Wu (→HKUST PhD), Yiming Zhang, and Zach Zipper (→U-M, MS)!
  • April 2020: Wowza, seven IC2S2 abstracts from the lab made it in! Time to get those talks and posters ready. Looking forward to seeing how the virtual conference turns out and excited we can share videos of this work in the future! ✨
  • April 2020: Congrats to Blablablab collaborators Jane Im, Eshwar Chandrasekharan, Jackson Sargent, Paige Lighthammer, Taylor Denby, and Ankit Bhargava on getting our paper on detecting Russian trolls on Twitter accepted to WebSci 2020! This is the first paper for Jackson, Paiges, Taylor, and Ankit and hopefully the start of a great journey.
  • Febrary 2020: Congrats to Blablablab collaborator Ashwin for being awarded a Facebook Fellowship for Computational Social Science! Looking foward to seeing all the great things you'll do.
  • January 2020: Congrats to Minje for getting a paper accepted to the Web Conference based on his summer internship at NOKIA Bell Labs! Great work!
  • December 2019: Sayan was selected for an Honorable Mention by the CRA Outstanding Undergraduate Research awards! Amazing work and he made it in the CSE News — so famous 🤩 !
  • November 2019: Blablablab collaborator Yulia Tsvetkov (CMU) presents our paper on microagressions at EMNLP! A tough but important classification problem for reducing incivility online. The paper reads pretty well for being written by academics too.
  • November 2019: Our paper at SocInfo on cross-cultural norms for social roles was nominated for best paper!
  • October 2019: How can we make beautiful maps like the one below that show regional variation in language?
    Map of the word 'but' in the US
    We have the answer for you! David is at NWAV presenting a tutorial on this with Jack Grieve. Check out the code and materials here, or just peek at the html-exported Juypter notebook online! Happy map making!
  • September 2019: A new fall season brings new PhD students. Welcome Aparna and Jiaxin!! 👋
  • July-August 2019: Blablablabbers Zijian and Sayan present the group's research at IC2S2 and ACL respectively. Look at that science delivery—such grace, such poise!
  • July 2019: What attributes to people ascribe to social roles and what those roles do? New paper on cross-cultural norms for social roles examines this question and was accepted at Social Informatics with CSE collaborators MeiXing Dong, Carmen Banea, and Rada Mihalcea! Congrats to MeiXing on her first conference paper!
  • June 2019: New paper out with collaborators from the School of Public Health showing systematic underrecognition of suicide amoung people transitioning to or living in long-term care facilities like nursing homes—serious stuff!—using NLP. Individuals' loss of identity and agency in this setting can have this profound and negative outcome and needs to be better recognized.
  • June 2019: Ashwin, Ram, and David win Best Paper at ICWSM for their work on measuring attitudes about caste discrimination through intercaste marriage!! Incredibly proud of this work!
  • June 2019: Wowza! Our UMich team was one of the 10 teams selected for the Amazon Alexa Prize Socialbot Grand Challenge 3! Way to go team! Can't wait to introduce the world to our super social Audrey!
  • May 2019: The poster for our WebConf paper on demographic inference for more accurate surveys won the Best Poster Presentation award! Check out the poster here--be careful of the sea monsters!
  • May 2019: Two papers accepted at ACL 2019! Congrats to co-authors MSI student Innocent Ndubuisi-Obi and CSE undergrad Sayan Ghosh for our work on understanding English-Naija code-switching and to co-authors UMSI Faculty Libby Hemphill and visiting student Eshwar Chandrasekha for our position paper arguing what should be the next steps for the NLP community in tackling abusive behavior.
  • April 2019: The NSF has graciously awarded David a CRII grant to study the language of social relationships. Thanks NSF!
  • March 2019: Blablablab celebrates an ICWSM acceptance on a paper about caste discrimination with collaborator Ashwin!
  • February 2019: David travels to UCLA to talk with their Computational Sociology group. Great visit and wonderful folks doing amazing research there!
  • January, 2019: Blablablab celebrates even more as two Web Conference long papers are accepted! Congrats to now-alumnus Zijian for being first author on one!
  • January, 2019: Blablablab celebrates the new year in style... and then furiously sprints to get those ICWSM papers in! Great job everyone!
  • December, 2018: You can't spell breakthrough without break, so the group gets some much-needed rest over the holiday season.
  • November, 2018: Just in time for the election, Jane showed that Russian trolls are still active on Twitter and trying to interact with major news reporters. Timely stuff!
  • November, 2018: Jane presents her work on Wikipedia conflict resolution at CSCW and David and Zijian meet up in Brussels to talk about access to support in online communities. The lab hopes that Jane will bring back Montreal-style bagels too.
  • October, 2018: David is off to NWAV47 to talk about Computational Sociolinguistics! He came back with a mountain of Montreal-style Bagels and a new appreciation for the Northern Cities Vowel Shift.
  • September, 2018: New PhD rotating students Jane Im and Minje Choi arrive at UMSI! Welcome Jane and Minje!
  • August, 2018: New EMNLP paper on supportive/unsupportive language accepted with undergraduate first author Zijian Wang! In online conversations, users who indicate they are women really do receive more unsupportive replies--yet they also receive more supportive replies. Lots of interesting follow-up questions on gendered interactions online #FoodForThought
  • August, 2018: Visiting students Akshita Jha, Qi Sun, and Nan Gu depart physically but remain with us in spirit and co-authorship. Wonderful having you here with us this summer!

People

Professors
Postdocs

MIDAS

School of Information
 

Aparup Khatua
School of Information

School of Information
 
PhD Students

School of Information
   
🏆Theresa Noel Urban Blaurock Research Award

School of Information

Junghwan Kim
Computer Science

Computer Science
 

School of Information
 

School of Information
Masters Students

Bangzhao Shu
MS Information
→ PhD @ Northeastern University

Lechen Zhang
MS Information

Mingqian Zheng
MS Survey and Data Science
→ PhD @ CMU LTI
Undergraduates

Michael Jiang
BS Computer Science

Yijun Pan
BS Computer Science

Rohan Raju
BS Computer Science

Sushrita Rakshit
BS Computer Science

Daniel Tian
BS Computer Science

Omkar Yadev
BS Computer Science

Bowen Yi
BS Computer Science

Haotian Zhang
BS Data Science
External Student/Postdoc Collaborators

Postdoc at GESIS

PhD @ UMSI → Postdoc at GaTech
 

Postdoc at University of Stuttgart

PhD student at IT University of Copenhagen
   

Jonathan Ivey
BS Computer Science @ U Arkansas

Jiayu Liu
BS Statistics @ UIUC

PhD @ UMSI → Faculty at NUS (gap year as Postdoc at Stanford)
   

PhD student at the KU Leuven

PhD student at Technical University of Munich

PhD student at Utrecht University

Postdoc at University of Copenhagen

Alumni

(In reverse order of graduation with approximate date of graduation/collaboration)

School of Information
2027
→ PhD in Social Work
   

School of Information
2025
→ Postdoc @ University of Washington
 

Irena Yi
BS Computer Science
2025

Jenny Lee
BS Computer Science
2024

School of Information
2023

🏆NSF Graduate Research Program Fellow

Catherine Huang
BS Computer Science
2023
→ Scale AI

School of Information
2023
🏆National Intelligence University Professor of Strategic Intelligence Candidate

School of Information
2023
   
🏆2022 Snap Fellowship Honorable Mention

Huaman Sun
MS Computer Science
2023
→ PhD in Sociology @ University of Toronto

Jason Yan
School of Information
2023

Jia Zhu
BS Computer Science
2023
→ MS CS @ UMich

Haley Johnson
Information Science
2022

2022

2022
University of Mannheim; intern at AI2, co-advised with Kyle Lo and Arman Cohan → Postdoc at Bocconi University

2022
PhD @ UMSI (2022) → Postdoc @ Northwestern

Jackson Sergent
BS Computer Science
2022
 

Computer Science
2022
→ PhD @ UIUC
   
🏆CRA Outstanding Undergraduate Researcher, Honorable Mention

Jiamin Yang
Data Science
2022

Bohan Zhang
MS Computer Science
2022
Computer Science MS @ UMich → PhD at UMSI

Computer Science
2022
→ PhD @ University of California Berkeley
 
🏆CRA Outstanding Undergraduate Researcher, Honorable Mention
🏆NSF Graduate Research Program Fellow

Linguistics
2022
Linguistics PhD @ UMich → Faculty at the University of British Columbia

2021
Social Psychology PhD @ UMich → Google Research

MS Data Science
2021

Computer Science PhD @ UMich
2021

Lingyun Gao
MS Information
2021

BS Computer Science
2021
→ PhD @ Northwestern University

Sayan Ghosh
Computer Science
2021
→ PhD @ USC
🏆CRA Outstanding Undergraduate Researcher, Honorable Mention

Michelle Lee
BS Computer Science
2021

Xingyu Lu
Computer Science
2021

Talia Rizika
Cognitive Science
2021
→ Israeli Defense Force
🏆College Honors

Computer Science
2021
→ Columbia University, PhD
 
🏆NSF Graduate Research Program Fellow
🏆U-M CSE Outstanding Research Award
🏆CRA Outstanding Undergraduate Researcher, Honorable Mention

Zach Zipper
MS Computer Science
2021
→ Reservoir Labs

BS Computer Science
2020
→ CMU LTI, MS

MS Data Science
2020

BA in Computer Science and Sociology
2020
→ GaTech MS

Shengyu Feng
BS Computer Science
2020

Thomas Horak
BS Linguistics and Computer Science
2020
U-M → Office of Academic Innovation

Sam Lee
BS Computer Science
2020

Trevor Li
BS Computer Science
2020

Christina Lu
BS Computer Science, Dartmouth
2020

Adi Mannari
BS Computer Science
2020

Yiming Zhang
BS Computer Science
2020

2019
(Computer Science PhD @ Geogia Tech; visiting at UMSI → Faculty at UIUC)

Lingding Chen
BS Computer Science
2019

Rex Chen
BA Linguistics
2019
→ Amazon SDE

Sanjana Kolisetty
BS Computer Science
2019

Yaoyang Lin
BS Computer Science (with Honors)
2019
→ Harvard, MS in Data Science
🏆College Honors

2019
PhD @ UMSI (2022) → Faculty at UT Austin, Communications

🏆Facebook Fellowship

BS Computer Science
2019
→ PhD, Carnegie Mellon University
🏆CRA Outstanding Undergraduate Researcher, Honorable Mention

Innocent Ndubuisi-Obi
Master of Information Science
2018
→ University of Washington, PhD in Computer Science

BS Computer Science
2018
→ Stanford, MS in Symbolic Systems

Xinyi Wu
BS Computer Science
2018
→ Univ Washington, MA in Computational Linguistics

Visiting Students


Michael Geraci
BS Computer Science, University of Buffalo
2022

Athena Aghighi
BA Sociology, University of California, Davis
2021

Kevin Henner
MA Computational Linguistics, University of Washington
2019
→ Seasalt.ai, Senior NLP Engineer

BS Computer Science and Technology, Tsinghua University
2019
→ Univeristy of North Carolina PhD

BS Statistics, Sun Yat-sen University
2019
→ HKUST PhD
 

Nan Gu
BS Electrical Engineering @ Tsinghua University
2018

MS Computer Science @ IIIT
2018
→ Virginia Tech, PhD in CS

Qi Sun
BS Computer Science @ Peking University
2018

Shangming Zhao
BS Software Engineering @ Tsinghua University
2018

Projects

The Blablablab is involved in many intersecting directions under the broader umbrella of studying people and language. In all our work, we aim to advance both NLP methodology and our understanding of the social world. Below are a few of our current themes.

Social NLP

NLP is increasingly present in social settings and LLMs have created new opportunities to interact with these models not as tools but as conversation partners. This new interaction style brings NLP even more into the social realm. How do we design models that recognize the social components of interpersonal communication in different settings—and do this in an ethical and responsible way?

NLP and Psychology

Psychology provides a rich set of theories for understanding human behavior. How can we use these approaches to improve how NLP models also understand humans? Conversely, how can we use NLP models to better understand humans? A core thread of this research focus is how help people behave in healthier more prosocial ways.

Computational Sociolinguistics

Identities are complex and in any given situation, we may choose different language--or even pronunciation--to signal different aspects of who we are. A core project in the Blablablab is focused on the interplay between linguistic style and the construction of identity. Here, work in the emerging field of Computational Sociolinguistics to help bring together theories, observational analyses, and large scale models to advance the fields of Linguistics and NLP.

Information Ecosystems

The interconnected and rapid nature of news and social media means that people can get new information almost anywhere, anytime. How does this news spread and who does it reach, especially as it cross social, linguistic, or medium boundaries? Our work studies whole ecosystems of how the language of information changes and the social process by which it emerges and evolves.

Publications

2024
The podcast ecosystem, colored by topic. Mapping the Podcast Ecosystem with the Structured Podcast Research Corpus.
Ben Litterer, David Jurgens, and Dallas Card.
preprint.
paper  ·  code (data)  ·  code (paper)  ·  data
Not all good Wikipedia articles stay good. Why is that? Read our paper to find out. A Test of Time: Predicting the Sustainable Success of Online Collaboration in Wikipedia.
Abraham Israeli, David Jurgens, and Daniel Romero.
preprint.
paper  ·  code and data
Optimizing the system and task parts of the prompt can have huge benefits SPRIG: Improving Large Language Model Performance by System Prompt Optimization.
Lechen Zhang, Tolga Ergen, Lajanugen Logeswaran, Moontae Lee, and David Jurgens.
preprint.
paper  ·  code and data
The prompt matters in how human an LLM can seem Real or Robotic? Assessing Whether LLMs Accurately Simulate Qualities of Human Responses in Dialogue.
Johnathan Ivey, Shivani Kumar, Jiayu Liu, Hua Shen, Sushrita Rakshit, Rohan Raju, Haotian Zhang, Aparna Ananthasubramaniam, Junghwan Kim, Bowen Yi, Dustin Wright, Abraham Israeli, Anders Giovanni Møller, Lechen Zhang, David Jurgens.
preprint.
paper  ·  code and data
Pathways of linguistic diffusion seen on Twitter Networks and Identity Drive Geographic Properties of the Diffusion of Linguistic Innovation
Aparna Ananthasubramaniam, David Jurgens, Daniel M. Romero.
npj Complexity. 2024.
pdf
The pipeline for collecting data of traumatic events The Language of Trauma: Modeling Traumatic Event Descriptions Across Domains with Explainable AI
Miriam Schirmer, Tobias Leemann, Gjergji Kasneci, Jürgen Pfeffer, and David Jurgens.
Findings of EMNLP. 2024.
pdf
Communities respond differently to the same message depending on their underlying values ValueScope: Unveiling Implicit Norms and Values via Return Potential Model of Social Interactions
Chan Young Park, Shuyue Stella Li, Hayoung Jung, Svitlana Volkova, Tanushree Mitra, David Jurgens, and Yulia Tsvetkov.
Findings of EMNLP. 2024.
paper  ·  code and data
Tables are data too. Maybe they can be text as well! Tab2Text - A framework for deep learning with tabular data
Tong Lin*, Jason Yan*, David Jurgens, and Sabina Tomkins.
Findings of EMNLP. 2024.
preprint forthcoming
LLMs answer questions more or less accurately depending on the social roles in the question prompt Is "A Helpful Assistant" the Best Role for Large Language Models? A Systematic Evaluation of Social Roles in System Prompts
Mingqian Zheng, Jiaxin Pei, Lajanugen Logeswaran, Moontae Lee, and David Jurgens.
Findings of EMNLP. 2024.
paper  ·  code and data
Human-AI Alignment is bidirectional Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions.
Hua Shen, Tiffany Knearem, Reshmi Ghosh, Kenan Alkiek, Kundan Krishna, Yachuan Liu, Ziqiao Ma, Savvas Petridis, Yi-Hao Peng, Li Qiwei, Sushrita Rakshit, Chenglei Si, Yutong Xie, Jeffrey P. Bigham, Frank Bentley, Joyce Chai, Zachary Lipton, Qiaozhu Mei, Rada Mihalcea, Michael Terry, Diyi Yang, Meredith Ringel Morris, Paul Resnick, and David Jurgens.
preprint.
paper
Socially aware language technologies and their connections with linguistics, social sciences, and NLP The Call for Socially Aware Language Technologies.
Diyi Yang, Dirk Hovy, David Jurgens, and Barbara Plank.
preprint.
paper
A Multilingual Similarity Dataset for News Article Frame.
Xi Chen, Mattia Samory, Scott Hale, David Jurgens, Przemyslaw A Grabowicz Proceedings of the International AAAI Conference on Web and Social Media (ICWSM).
paper  ·  data
Large language models are bad at answering psychological questionnaires consistently You don't need a personality test to know these models are unreliable: Assessing the Reliability of Large Language Models on Psychometric Instruments
Bangzhao Shu*, Lechen Zhang*, Minje Choi, Lavinia Dunagan, Dallas Card, and David Jurgens.
Proceedings of the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics.
paper  ·  code and data
Memes are multimodal constructions where the base image template and additional text fills both have semantic value. Social Meme-ing: Measuring Linguistic Variation in Memes
Naitian Zhou, David Jurgens, and David Bamman.
Proceedings of the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics.
paper  ·  code and data
The empathetic alignment between an author and responder on Reddit shows most people just give advice. Modeling Empathetic Alignment in Conversation
Jiamin Yang and David Jurgens.
Proceedings of the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics.
paper  ·  code, models, and data  ·  Jiamin's amazing annotation tool
Strong influence connections in the global news network. Global News Synchrony During the Start of the COVID-19 Pandemic
Xi Chen, Scott A. Hale, David Jurgens, Mattia Samory, Ethan Zuckerman, Przemyslaw Adam Grabowicz.
Proceedings of the 2024 Web Conference.
paper  ·  code and data
The network model for estimating contextual informativeness. Finding Educationally Supportive Contexts for Vocabulary Learning with Attention-Based Models
Sungjin Nam, Kevyn Collins-Thompson, David Jurgens and Xin Tong.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING).
paper
Author mentions in science news reveal widespread disparities across name-inferred ethnicities.
Hao Peng, Misha Teplitskiy, David Jurgens.
Journal of Quantitative Social Sciences.
pdf (preprint)
2023
Characteristics of and variation in suicide mortality related to retirement during the Great Recession: perspectives from the National Violent Death Reporting System.
Aparna Ananthasubramaniam, David Jurgens, Eskira Kahsay, and Briana Mezuk.
The Gerontologist gnae015. 2024.
pdf
2023
The answers of LLMs align with the perceptions of specific social groups. Aligning with Whom? Large Language Models Have Gender and Racial Biases in Subjective NLP Tasks
Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens.
preprint.
paper  ·  code and data
zero-shot LLM performance on social language tasks Do LLMs Understand Social Knowledge? Evaluating the Sociability of Large Language Models with SocKET Benchmark
Minje Choi,* Jiaxin Pei,* Sagar Kumar, Chang Shu and David Jurgens.
Proceedings of the Empirical Methods in Natural Language Processing (EMNLP). 2023.
paper  ·  code and data
Media storms over time with labels When it Rains, it Pours: Modeling Media Storms and the News Ecosystem
Ben Litterer, David Jurgens, and Dallas Card.
Proceedings of the Empirical Methods in Natural Language Processing (EMNLP). 2023.
paper ċ data and code
The probability that, given an appropriate message for the relationships represented by a row, the message will also be appropriate in another relationship listed in the column. Probabilities are calculated across the entire data Your spouse needs professional help: Determining the Contextual Appropriateness of Messages through Modeling Social Relationships
David Jurgens,* Agrima Seth,* Jackson Sargent, Athena Aghighi, and Michael Geraci..
Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL). 2023.
paper  ·  code and data
zero-shot LLM performance on social language tasks
Relative use of politeness strategies when annotators rewrite emails to be more polite When Do Annotator Demographics Matter? Measuring The Influence of Annotator Demographics with the POPQUORN Dataset
Jiaxin Pei and David Jurgens.
Proceedings of the 17th Linguistic Annotation Workshop (LAW-XVII) at ACL. 2023.
paper  ·  code and data
The causal-estimated effect of banning on users matching the style of others Exploring Linguistic Style Matching in Online Communities: The Role of Social Context and Conversation Dynamics
Aparna Ananthasubramaniam, Hong Chen, Jason Yan, Kenan Alkiek, Jiaxin Pei, Agrima Seth, Lavinia Dunagan, Minje Choi, Benjamin Litterer and David Jurgens.
(Best Paper)
Proceedings of the 1st Workshop on Social Influence in Conversations (SICon) at ACL. 2023.
paper  ·  code and data
Overall performance on each language. The box indicates the lower quartile to the upper quartile and the whisker indicates the maximum and the minimum. Outliers are shown as dots. Participants generally achieve better performances on languages in the training set and achieved good performance on Arabic and Dutch. Predicting intimacy in Hindi and Korean remains challenging. Moreover, performances on unseen languages generally have larger variances. SemEval 2023 Task 9: Multilingual Tweet Intimacy Analysis
Jiaxin Pei, Vítor Silva, Maarten Bos, Yozon Liu, Leonardo Neves, David Jurgens, and Francesco Barbieri.
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval).
paper  ·  data and competition
The effects of personal shocks on people's social media activities Analyzing the Engagement of Social Relationships During Life Event Shocks in Social Media
Minje Choi, David Jurgens, and Daniel Romero.
Proceedings of the International Conference on Web and Social Media (ICWSM). 2023.
paper  ·  code and data
The influence of multilingual individuals on social connectedness in Europe Bridging Nations: Quantifying the Role of Multilinguals in Communication on Social Media
Julia Mendelsohn, Sayan Ghosh, David Jurgens, and Ceren Budak.
(Best Methodology Paper)
Proceedings of the International Conference on Web and Social Media (ICWSM). 2023.
paper  ·  code and data
2022
Work Expectations, Depressive Symptoms, and Passive Suicidal Ideation Among Older Adults: Evidence From the Health and Retirement Study
Briana Mezuk, Linh Dang, David Jurgens, Jacqui Smith.
The Gerontologist 62 (10), 1454-1465 2022.
paper
The way the press portrays certain scientific results differs by where those results were described in the paper Modeling Information Change in Science Communication with Semantically Matched Paraphrases
Dustin Wright, Jiaxin Pei, David Jurgens, and Isabelle Augenstein.
Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). 2022.
paper  ·  code and data
Not all empathy papers use empathy in the same way A Critical Reflection and Forward Perspective on Empathy and Natural Language Processing
Allison Claire Lahnala, Charles Welch, David Jurgens, and Lucie Flek.
Proceedings of the Findings of Empirical Methods in Natural Language Processing (EMNLP Findings). 2022.
paper
Potatoes are delicious POTATO: The Portable Text Annotation Tool
Jiaxin Pei, Aparna Kamakshi Ananthasubramaniam, Xingyao Wang, Naitian Zhou, Apostolos Dedeloudis, Jackson Sargent and David Jurgens.
Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP): Systems Demonstrations. 2022.
paper  ·  code
Citation context sizes MultiCite: Modeling realistic citations requires moving beyond the single-sentence single-label setting
Anne Lauscher, Brandon Ko, Bailey Kuhl, Sophie Johnson, Arman Cohan, David Jurgens, Kyle Lo.
Proceedings of the 2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). 2022.
paper  ·  code and data
Citation context sizes The subtle language of exclusion: Identifying the Toxic Speech of Trans-exclusionary Radical Feminists
Christina Lu and David Jurgens.
Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022.
paper  ·  code and data
Correlations between the ways in which two news articles can be similar. SemEval-2022 Task 8: Multilingual news article similarity
Xi Chen, Ali Zeynali, Chico Camargo, Fabian Flöck, Devin Gaffney, Przemyslaw Grabowicz, Scott Hale, David Jurgens, and Mattia Samory.
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022). 2022.
paper  ·  data
The effect of curriculum ordering on word similarity tasks An Attention-Based Model for Predicting Contextual Informativeness and Curriculum Learning Applications
Sungjin Nam, David Jurgens, and Kevyn Collins-Thompson.
in submission. 2022.
pdf
The effects of mentorship Diversifying the Professoriate
Bas Hofstra, Daniel A. McFarland, Sanne Smith, David Jurgens.
Socius. 2022.
pdf
Similarities in Redditor political affiliations and commenting activity Classification without (Proper) Representation: Political Heterogeneity in Social Media and Its Implications for Classification and Behavioral Analysis
Kenan Alkik, Bohan Zhang, and David Jurgens.
ACL Findings. 2022.
pdf  ·  code
Multilingual performance on grapheme to phoneme conversion ByT5 model for massively multilingual grapheme-to-phoneme conversion
Jian Zhu, Cong Zhang, and David Jurgens.
Interspeech 2022.
pdf · code
Food healthiness ratings Language in Popular American Culture Constructs the Meaning of Healthy and Unhealthy Eating: Narratives of Craveability, Excitement, and Social Connection in Movies, Television, Social Media, Recipes, and Food Reviews
Bradley P. Turnwald, Margaret A. Perry, David Jurgens, Vinodkumar Prabhakaran, Dan Jurafsky, Hazel R. Markus, Alia J. Crum.
Appetitte. 2022.
pdf
Phone-to-audio alignment without text: A Semi-supervised Approach Phone-to-audio alignment without text: A Semi-supervised Approach
Jian Zhu, Cong Zhang, and David Jurgens.
Proceedings of the 2022 IEEE International Conference on Acoustics, Speech and Signal Processing.
pdf · code
2021
Latent classes of biased words and their effects on toxicity Detecting Cross-Geographic Biases in Toxicity Modeling on Social Media
Sayan Ghosh, Dylan Baker, David Jurgens, and Vinodkumar Prabhakaran.
(Best Paper)
Proceedings of the 7th Workshop on Noisy User-generated Text (W-NUT).
pdf
Using Sociolinguistic Variables to Reveal Changing Attitudes Towards Sexuality and Gender.
Sky Wang and David Jurgens.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP).
pdf
Idiosyncratic but not Arbitrary: Learning Idiolects in Online Registers Reveals Distinctive yet Consistent Individual Styles.
Jian Zhu and David Jurgens..
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP).
pdf  ·  code and data
Measuring Sentence-Level and Aspect-Level Certainty in Science Communications
Jiaxin Pei and David Jurgens.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP).
pdf  ·  code and data
Detecting Community Sensitive Norm Violations in Online Conversations.
Chan Young Park, Julia Mendelsohn, Karthik Radhakrishnan, Kinjal Jain, Tushar Kanakagiri, David Jurgens and Yulia Tsvetkov.
Proceedings of the Findings of the 2021 Conference on Empirical Methods in Natural Language Processing (Findings of EMNLP).
pdf
An Animated Picture Says at Least a Thousand Words: Selecting Gif-based Replies in Multimodal Dialog..
Xingyao Wang and David Jurgens.
Proceedings of the Findings of the 2021 Conference on Empirical Methods in Natural Language Processing (Findings of EMNLP).
pdf  ·  code and data  ·  Slack gif-bot App
Driving cessation pipeline A Data Science Approach to Estimating the Frequency of Driving Cessation Associated Suicide in the US: Evidence From the National Violent Death Reporting System
Tomohiro M. Ko,, Viktoryia A. Kalesnikava, David Jurgens, and Briana Mezuk.
Frontiers in Public Health.
pdf
Teaching is serious business Learning PyTorch Through A Neural Dependency Parsing Exercise
David Jurgens.
Proceedings of the Fifth Workshop on Teaching NLP, 2021.
pdf
Teaching is serious business Learning about Word Vector Representations and Deep Learning through Implementing Word2vec
David Jurgens.
Proceedings of the Fifth Workshop on Teaching NLP, 2021.
pdf
Temporal dynamics of relationships on Twitter More than meets the tie: Examining the Role of Interpersonal Relationships in Social Networks
Minje Choi, Ceren Budak, Daniel Romero, and David Jurgens.
International Conference on Web and Social Media (ICWSM), 2021.
pdf  ·  code
The structure of two online subreddits, which is predictive of their rate of lexical change The Structure of Online Social Networks Modulates the Rate of Lexical Change
Jian Zhu and David Jurgens.
Proceedings of the North American Meeting of the Association for Computational Linguistics (NAACL), 2021.
pdf  ·  code
The effects of framing on audience response to immigration tweets Modeling Framing in Immigration Discourse on Social Media
Julia Mendelsohn, Ceren Budak, and David Jurgens.
Proceedings of the North American Meeting of the Association for Computational Linguistics (NAACL), 2021.
pdf  ·  code
The main architecture for forecasting prosocial behavior Conversations Gone Alright: Quantifying and Predicting Prosocial Outcomes in Online Conversations
Jiajun Bao*, Junjie Wu*, Yiming Zhang*, Eshwar Chandrasekharan, and David Jurgens.
Proceedings of the Web Conference (WebConf), 2021.
pdf  ·  code
2020
Quantifying Intimacy In Language
Jiaxin Pei and David Jurgens.
Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020.
pdf  ·  project webpage  ·  code  ·  pip-installable package
Condolence and Empathy in Online Communities
Naitian Zhou and David Jurgens.
Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020.
pdf  ·  project webpage
Still out there: Modeling and Identifying Russian Troll Accounts on Twitter.
(Best Paper Runner-Up)
Jane Im, Eshwar Chandrasekharan, Jackson Sargent, Paige Lighthammer, Taylor Denby, Ankit Bhargava, Libby Hemphill, David Jurgens, Eric Gilbert.
Proceedings of Web Science, 2020.
pdf
Measuring the predictability of life outcomes with a scientific mass collaboration.
Matthew J. Salganik, Ian Lundberg, Alexander T. Kindel, Caitlin E. Ahearn, Khaled Al-Ghoneim, Abdullah Almaatouq, Drew M. Altschul, Jennie E. Brand, Nicole Bohme Carnegie, Ryan James Compton, Debanjan Datta, Thomas Davidson, Anna Filippova, Connor Gilroy, Brian J. Goode, Eaman Jahani, Ridhi Kashyap, Antje Kirchner, Stephen McKay, Allison C. Morgan, Alex “Sandy” Pentland, Kivan Polimis, Louis Raes, Daniel E. Rigobon, Claudia V. Roberts, Diana M. Stanescu, Yoshihiko Suhara, Adaner Usmani, Erik H. Wang, Muna Adem, Abdulla Alhajri, Bedoor AlShebli, Redwane Amin, Ryan B. Amos, Lisa P. Argyle, Livia Baer-Bositis, Moritz Büchi, Bo-Ryehn Chung, William Eggert, Gregory Faletto, Zhilin Fan, Jeremy Freese, Tejomay Gadgil, Josh Gagné, Yue Gaobj, Andrew Halpern-Manners, Sonia P. Hashim, Sonia A. Hausen, Guanhua He, Kimberly Higuera, Bernie Hogan, Ilana M. Horwitz, Lisa M. Hummel, Naman Jain, Kun Jin, David Jurgens, Patrick C. Kaminski, Areg Karapetyan, E. H. Kim, Ben Leizman, Naijia Liu, Malte Möser, Andrew E. Mack, Mayank Mahajan, Noah Mandell, Helge-Johannes Marahrens, Diana Mercado-Garcia, Viola Mocz, Katariina Mueller-Gastell, Ahmed Musse, Qiankun Niu, William P. Nowak, Hamidreza Omidvar, Andrew Or, Karen Ouyang, Katy M. Pinto, Ethan Porter, Kristin E. Porter, Crystal Qian, Tamkinat Rauf, Anahit Sargsyan, Thomas Schaffner, Landon Schnabel, Bryan Schonfeld, Ben Sender, Jonathan D. Tang, Emma Tsurkov, Austin van Loon, Onur Varol, Xiafei Wang, Zhi Wang, Julia Wang, Flora Wang, Samantha Weissman, Kirstie Whitaker, Maria K Wolters, Wei Lee Woon, James Wu, Catherine Wu, Kengran Yang, Jingwen Yin, Bingyu Zhao, Chenyun Zhu, Jeanne Brooks-Gunn, Barbara E. Engelhardt, Moritz Hardt, Dean Knox, Karen Levy, Arvind Narayanan, Brandon M. Stewart, Duncan J. Watts, and Sara McLanahan.

Proceedings of the National Academy of Sciences. Mar 2020, 201915006; DOI: 10.1073/pnas.1915006117 pdf
2019
Finding Microaggressions in the Wild: A Case for Locating Elusive Phenomena in Social Media Posts
Luke Breitfeller, Emily Ahn, David Jurgens, and Yulia Tsvetkov.
Proceedings of 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019.
pdf
Perceptions of social roles across cultures.
(Nominated for Best Paper)
Meixing Dong, David Jurgens, Carmen Banea and Rada Mihalcea.
Proceedings of Social Informatics (SocInfo), 2019.
pdf
Suicide Among Older Adults Living in or Transitioning to Residential Long-term Care, 2003 to 2015
Briana Mezuk, Tomohiro M. Ko, Viktoryia A. Kalesnikava, and David Jurgens.
JAMA Network Open 2019;2(6):e195627
pdf
Wetin dey with these comments? Modeling Sociolinguistic Factors Affecting Code-switching Behavior in Nigerian Online Discussions
Innocent Ndubuisi-Obi*, Sayan Ghosh*, David Jurgens.
Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2019
pdf
The spectrum of abusive behaviors A Just and Comprehensive Strategy for Using NLP to Address Online Abuse
David Jurgens, Libby Hemphill and Eshwar Chandrasekharan.
Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2019
pdf
Caste attitudes Smart, Responsible, and Upper Caste Only:Measuring Caste Attitudes through Large-Scale Analysis of Matrimonial Profiles
(Best Paper Award)
Ashwin Rajadesingan, Ramaswami Mahalingam, David Jurgens.
Proceedings of the AAAI International Conference on Web and Social Media (ICWSM), 2019
pdf
Population inference Demographic Inference and Representative Population Estimates from Multilingual Social Media Data.
Zijian Wang, Scott Hale, David Ifeoluwa Adelani, Przemyslaw Grabowicz, Timo Hartmann, Fabian Flöck and David Jurgens*.
Proceedings of the Web Conference, 2019
*Corresponding senior author
pdf  ·  demo  ·  code
Group success Are All Successful Communities Alike? Characterizing and Predicting the Success of Online Communities.
Tiago Cunha, David Jurgens, Chenhao Tan and Daniel Romero.
Proceedings of the Web Conference, 2019
pdf
2018
It's going to be okay: Measuring Access to Support in Online Communities.
Zijian Wang and David Jurgens.
Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2018
pdf  ·  supplementary  ·  website and data  ·  code
RtGender: A Corpus of Responses to Gender for Studying Gender Bias.
Rob Voigt, David Jurgens, Vinodkumar Prabhakaran, Dan Jurafsky, and Yulia Tsvetkov.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC), 2018
pdf  ·  data
Measuring the Evolution of a Scientific Field through Citation Frames.
David Jurgens, Srijan Kumar, Raine Hoover, Dan McFarland, Dan Jurafsky.
Transactions of the Association for Computational Linguistics (TACL). 2018.
pdf  ·  website and data  ·  code

We showed up to UMSI!

Software + Data

Software
  • M3: Multi-modal, multilingual, multi-attribute demographic inference. You can also get it pypi as a pip-installable package!
  • GenderPerformr: Infers gender performance for any kind of online username, name, handle, etc.
  • Social Support: Infers how supportive, neutral or unsupportive is an online reply
  • Citation Function: Infers why an author cited another paper
  • Immigration Framing: Infers which frames are used in a short text message about immigration. Neural models available via HuggingFace!
  • Social Relationships: Infers the type of social relationship between two Twitter users based on dialog and network structure
  • Gif-Reply Models: Multimodal models that will pick an appropriate gif reply for a given message.
  • Pepe the Gif-Reply Bot: A Slack App that you can deploy to automatically reply to messages on your workspace with gifs
  • Certainty Estimator (also on pip): A neural library for estimating the certainty/uncertainty of a statement along with which aspects are certain or uncertain (this works much better than hedges.
  • Intimacy Estimator (also on pip): A neural library for estimating the intimacy level of questions in conversation. Very useful for looking at social distance and social norms in conversation!
Data
  • Citation Function Data: labeled training data of citations by their rhetorical function.
  • Gender-Labeled Online Conversations: 100M online dyadic conversations from Reddit, Wikipedia, and StackExchange, labeled by gender salience. To respect the privacy and dignity of individuals, this dataset is available for non-commercial research purposes online; please email the lab PI to obtain access.
  • Reply Supportiveness Ratings: ~9K ratings of replies to authors rated on a 5-point Likert scale for how supportive (or unsupportive) they are. Items are balanced across Reddit, Wikipedia, and StackExchange interactions, as well as by length (equal amounts of short, medium, and long comments).
  • MultiCiteNew citation dataset with multiple intents per context and variable sized contexts expressing that intent (e.g., multiple sentences are needed to understand why an author cited that paper).
  • Gif-based dialog 1.56M conversation turns on Twitter where the replying user has responded with a gif. Gifs are canonicalized to an image hash, which is re-usable for matching new gifs.

Prospective Students

Graduate students

Prospective PhD graduate students interested in joining Blablablab should apply to one of our affiliated programs. We typically accept most students through UMSI though we occasionally will accept students who apply to the Computer Science division of EECS.

Current U-M graduate students of any program are welcome to email about potential research collaborations. Blablablab is highly interdisciplinary and we especially enjoy working with social scientists.

Undergraduates

Current U-M undergraduates who want to do research during the school year should contact the lab and describe a bit about their background (e.g., have you done research before? what classes have you taken? why do you want to research?) and what kind of project they want to be on. Scanning our current list of publications will give you a sense of what topics we research and can be a good jumping off point for a new project. We often have a few spots for the year for all levels of experience but please know we expect at least 10 hours per week of research. Research takes time and it will be difficult for you (or anyone) to make progress with just a few hours per week.

Undergraduates interested for summer research opportunities should look for an announcement sometime in the early to mid-winter period with a description of the projects. We have hosted lots of wonderful undergraduate students throughout the summer. Women and underrepresented minorities are encouraged to contact us early, as we often can apply for special summer funding opportunities (e.g., through the B.A. Rudolph Foundation or the NSF) to provide financial support for your summer stay.

Visiting Students

We often have self-funded visiting students for the summer and (rarely) during the academic year. In these cases, usually the student has some prior research experience, similar research interests, and (due to luck) there is a current project going on in the Blablablab where they would be a good fit. If you are a self-funded student who wants to come join us for the summer, you should send an email in February or March before the summer that

  • Describes your research experience and state what parts are relevant to the work going on in the Blablablab.
  • Clearly states why you want to work in the Blablablab.
  • Discusses what you want to get out of a summer research experience and what you want to learn — this helps us make sure that the trip is a success for you as well.

We don't consider self-funded visiting students as "free labor" and strongly want to make sure that your stay is productive and a success for your career goals.

Postdocs

We have no openings at this time for postdocs. :(

©2018 Regents of the University of Michigan