The Blablablab

News

July 2025: Five papers at ACL on podcasts, accommodation, citations, morals, demographically-aware LLMs, and two Findings paper on tokenization and new multilingual data for author attribution. Many folks will be at ACL presenting (including David) so come say hi!
July 2025: New grant from OpenAI to study making smaller models better reasoners. Thanks OpenAI!
June 2025: GenAI tools can potentially support us in many ways on social media. What effect do these have on authors and on readers? We actually ran an experiment to find out.!Check out the answers in our new preprint here.
May 2025: So many new Blablablabers this summer working on a random of topics in morals, podcasts, and human-AI collaboration. Welcome all!
April 2025: How should we evaluate AI systems? Benchmarks aren't enough as we argue in our whitepaper written with the AI Lab. Check out our the whitepaper here.
March 2025: Federal science funding has been cut and faces even deeper cuts with proposed executive and legislative actions. David has written an op-ed on the topic for the Decatur Herald & Review (screenshot). as a part of the Science Homecoming effort. Please consider calling your representatives to oppose these cuts; if you're not sure who to reach out to, 5calls.org can help get you started.
January 2025: Two NAACL and one WWW papers accepted! Congrats to all the authors.
November 2024: Four papers accepted to EMNLP Findings on implicit norms, the language of trauma, LLM personas, and using LLMs with tabular data. Congrats to all the authors.
October 2024: Are you one of the hundrends of millions of people that listen to podcasts? Ever wondered what they say about you? Check out our new paper with a massive dataset of transcribed podcasts.
October 2024: Visitor Manon Reusens joins from KU Leuven! Welcome 👋 .
September 2024: Visitor Dennis Assenmacher joins from GESIS! Welcome 👋 .
August 2024: New PhD student Eleanor Lin joins! Welcome 👋 .
July 2024: Five talks/posters at IC2S2 this year on a range of things from podcasts to pandemics. Come say hello to us in Philly!
June 2024: In a new survey paper, we analyze the state of Human-AI alignment and argue that this is best viewed as a bidirectional alignment—humans align their values with AI too! This was a great collaboration with a big team of folks from Stanford, CMU, and Google.
May 2024: Lots of new folks coming to the Blablablab. Postdocs Shivani Kumar and Aparup Khatua both join and we're fortunate to have four new visitors, PhD student Anders Giovanni Møller (ITU Copenhagen), postdoc Dustin Wright (U Copenhagen), and undergraduates Jonathan Ivey (Arkansas) and Jiayu Liu (UIUC)! Welcome 👋 .
May 2024: New papers at WWW and ICWSM that extend our work on global news to study synchrony across Europe during the pandemic and a new multilingual dataset analyzing the framing of these news articles.
April 2024: NLP models are often conceptualized separately from the social environment in which they operate. In a new preprint with Diyi Yang, Dirk Hovy, and Barbara Plank, we argue that NLP as a field needs to directly incorporate social awareness into its models both for understanding and considering the implications of the models.
April 2024: Visiting student Anna Wegmann joins! Welcome 👋 .
March 2024: Three papers accepted to NAACL, on estimate LLM personas via psychometrics (short story: they're not meaningful), on memes, and on empathetic alignment. Two of these wrap up effort by Blablablab alumni Naitian Zhou and Jiamin Yang. The third is a first paper by masters student co-first author Bangzhao Shu and Lechen Zhang. Congrats all!
January 2024: Visiting student Neele Faulk joins! Welcome 👋.
December 2023: Lots of new preprints out on why most LLMs don't have actually personalities, how LLMs answer differently depending on who you ask them to be, how LLMs answers on subjective tasks are more correlated with certain groups of people, and memes, so many memes (with sociolinguistics!).
November 2023: New work by Minje looking at how people react and behavior when revealing new aspects of themselves in social networks; and new work with a collaboration of folks at Williams College and AI2 showing that causal inference with text is hard but we can now evaluate it better.
October 2023: Two papers accepted to EMNLP/Findings on assessing how well LLMs understand social knowledge (spoiler: most do not do well!) and quantify what happens when the news' collective attentions gets focused on one event—a media storm! Congrats to Minje, Jiaxin, Sagar, and Ben!
September 2023: New folks join the Blablablab: the fantastic incoming PhD student Nancy Xu and the amazing Hua Shen as a postdoc. Looking forward to seeing your research dreams come true!
August 2023: Dr. Minje Choi has graduated! The first Blablablaber is off to see the world, with a first stop as a postdoc at GaTech. Congratulations Minje!!!
July 2023: Our paper at the Social Influence in Conversations Workshop won Best Paper!! This paper was the result of a whole-lab effort during a two-day "Research Jam" to be creative and have fun researching together as a group—what a great result! Details on the Research Jam and paper are coming soon too!
June 2023: Our paper on the role of multilinguals in bridging communication won Best Methodology Paper at ICWSM 2023! Congrats to all!
June 2023: Lots of new papers: ACL paper on why shouldn't say "I love you" to your boss (the contextual appropriateness of messages), LAW paper on how annotator demographics influence different judgments, SICon paper looking at how social influence manifests in style change, and a SemEval task paper on intimacy.
May 2023: ICWSM 2023 papers are now live showing how people reach out to different social ties during shocks and the role of multilinguals in bridging communication. Exciting!
March 2023: Amazing news that Lavinia has won an NSF Graduate Research Program Fellowship! So proud and looking forward to seeing your research vision come to life! Thanks NSF!
March 2023: David gets a grant from the Center for Research on Learning and Teaching (CRLT) for the Improvement of Teaching to work on making his Information Retrieval class more ethical and more technical! Shout out to Safiya Umoja Noble for insight into thorny issues in IR and Nicki Washington for the 3C Fellows and motivation to put ideas to practice—and CRLT for the funding!
March 2023: New grant from DSO to study language dialects and culture! Super excited to make more crazy cool culture maps. Thanks DSO!
February 2023: David is off to the University of Stuttgart for a talk and meeting lots of great folks.
January 2023: Crunch time for the lab's collaboration with Snap Inc Research on a Removal task on Intimacy in different languages! Get those submission in.
November 2022: David gives a keynote at the Sharing Stores, Lessons Learned workshop on doing interdisciplinary research. This keynote was focused on story-time and used Dall-e to generate storybook images slides.
November 2022: Three papers at EMNLP looking at science journalism in the news, the state of empathy research in NLP, and a brand new annotation tool (Potato). Excited to see these all out!
October 2022: Excited to start a new project on understanding linguistic style with folks from USC, U Maryland, and U Birmingham (UK) as a part of the IARPA HIATUS program! Lots of great computational sociolinguistics work to come.
September 2022: New grant from LG AI to study how chatbots can combine our knowledge of themselves with real-world knowledge to communicate better. Thanks LG AI!.
August 2022: So many new Blablablabers this year! A warm welcome to Kenan Alkiek, Hong Chen, Lavinia Dunagan, Ben Litterer, Agrima Seth, and Jason Yan! Wow, so much exciting work to come this year.
July 2022: Five presentations by Blablablab folks at IC2S2 this year (on many different topics) with great lab attendance in person (wow!). Feel free to stay hi if you're around!!
July 2022: Christina Lu's amazing work on trans-exclusionary radical feminists (TERFs) gets presented at the Workshop on Online Abuse and Harms (WOAH) at NAACL!!
March 2022: David got awarded an NSF CAREER grant to look at prosocial behavior and hopefully make the world a better place through some crazy RCTs. Thanks NSF!!
February 2022: Blablablabbers Kenan Alkiek and Bohan Zhang celebrate their acceptance to ACL Findings with a new paper looking at political affiliation in Reddit.
December 2021: Naitian Zhou and Xingyao Wang have both received an honorable mention by the CRA for the Outstanding Undergraduate Researcher Award. The Blablablab could not be prouder of these two! Naitian and Xingyao have continued the tradition of award-winning Blablablab-ers, joining Sky Wang and Sayan Ghosh who received an honorable mentions in previous years.
December 2021: Naitian Zhou and Xingyao Wang have both received an honorable mention by the CRA for the Outstanding Undergraduate Researcher Award. The Blablablab could not be prouder of these two! Naitian and Xingyao have continued the tradition of award-winning Blablablab-ers, joining Sky Wang and Sayan Ghosh who received an honorable mentions in previous years.
November 2021: Sayan Ghosh wins best paper at W-NUT workshop for his work on identifying cultural biases in toxicity models. Great work, Sayan!
November 2021: New NIH R01 grant with folks from the School of Public Health and Michigan Medicine looking at how to understand reports from completed suicides across all life stages to identify new risk factors and preventative opportunaties! Thanks NIH!
September 2021: Kicking off a new NSF Convergence grant to look at how to better reach consensus when online platformers flag something for removal (and have everyone agree the process is fair). Looking forward to working with folks from UW and MIT. Thanks NSF!
September 2021: New paper at W-NUT with Sayan Ghosh and Google collaborators Dylan Baker and Vinod Prabhakaran where we show how to uncover geographic biases in pretrained toxicity models—and show, unfortunately, that common sense approaches to fixing the biases in a model don't actually change much.
September 2021: Welcome to new PhD student Leopele Raabe, co-advised with Misha Teplitskiy!
August 2021: Five papers at EMNLP this year! Congrats to Blablablabers Sky Wang, Jian Zhu, Xingyao Wang, and Jiaxin Pei. More details to come!
July 2021: Three talks at IC2S2 this year: one by Jiaxin Pei on his work on intimacy and two by collaborator Julia Mendelsohn on her immigration framing work and upcoming work on bilinguals and Looking forward to seeing all the great IC2S2 keynotes and talks.
June 2021: The Blablablab welcomes three REU students this summer: Athena Aghighi, Michael Geraci, and Jackson Sergeant! Welcome to summer research
May 2021: Congrats to MSI student Kenan Alkiek for winning the Theresa Noel Urban Blaurock Research Award for his outstanding work—Well-deserved recognition!
April 2021: The lab is recruiting two students for NSF funded REU positions this summer. Please see the REU page for details and how to apply!
April 2021: Two students from the lab were awared NSF GRFP fellowships this year: Sky Wang, who has worked on multiple research projects, and Zhizhuo Zhou, who did amazing work on the Alexa Prize team. Fantastic news and congrats to both!
March 2021: More good news! UMSI PhD student Minje Choi has his paper on social relationships accepted to ICWSM. Minjes work shows that different types of relationships have strong behavioral differences on Twitter, that these can be predicted, and that the nature of the relationship aids in predicting information diffusion. A great action-packed paper!
March 2021: Two papers accepted at NAACL! One second on computational sociolinguistics with Linguistics PhD student Jian Zhu showing how the structure of online communities modulates the rates at which they adopt new terms. The second in computational social science (and political communications!) with UMSI PhD student Julia Mendelsohn looking at how discussions of immigrants on social media are framed and the impact that has on audience engagement. Congrats to both!
March 2021: David gave a talk at GESIS on some of research on the framing of marginalized/politicized people! One of the silver lining of pandemic times is getting to easily connect to colleagues in Europe (who had great questions)!
January 2021: Our paper on prosocial conversations was accepted at the Web Conference (WWW) based on work from U-M undegrads Jiajun Bao and Yiming Zhang and summer visitor Junjie Wu, in collaboration with (now-professor!) Eshwar Chandrasekharan. We look at different dimensions of what can go right in a conversation and show that the prosocial direction of a conversation is actually predicable from its onset. Congrats to all!
December 2020: David gives the keynote at the PEOPLES workshop at COLING. What a wonderful group of folks and many interesting conversations and questions. Thank you Malvina, Viviana, and Barbara for the invitation!
December 2020: Sky Wang receives an honorable mention by the CRA for the Outstanding Undergraduate Researcher Award. Amazing work, Sky! Blablablab has been incredibly fortunate to have so many fantastic undergraduates and Sky joins Sayan Ghosh who received an honorable mention last year.o
September 2020: Two long papers accepted to EMNLP this year: First-year PhD student Jiaxin Pei's work on quantifying intimacy in language (with lots of cool Social Psych) and junior Naitian Zhou's work on condolence and empathy in online communities (work done as a sophormore!). Congrats to both and more details, data, and models to come soon!
July 2020: Whoa—our paper on identfying Russian trolls on Twitter was Best Paper Runner Up at WebSci! Congrats to all the co-authors!
July 2020: David gave a talk at the AKBC workshop on NLP for Scientific Texts (SciNLP) on bias in which authors are mentioned in the news stories on their published papers, based on work with Hao Peng and Misha Teplitskiy. These informal citations matter and add up to who we think of as a scientist. You can check out all the cool talks here too.
July 2020: The NSF has graciously awarded David and co-PI Daniel Romero an NSF grant to study the communicative and behavioral dynamics of social relationships. Thanks NSF for your support!!
June 2020: Summer is here!! 😎🌞⛱️ (...well, beginning after the EMNLP submission deadline) Welcome to Christina Lu and Kenan Alkiek who are joining us for the summer.
May 2020: Congrats to all the graduating seniors this year: Jiajun Bao (→CMU LTI, MS), Justin Chen (→GaTech, MS), Shengyu Feng, Thomas Horak, Sam Lee, Wenhao Li (→UNC PhD), Junjie Wu (→HKUST PhD), Yiming Zhang, and Zach Zipper (→U-M, MS)!
April 2020: Wowza, seven IC2S2 abstracts from the lab made it in! Time to get those talks and posters ready. Looking forward to seeing how the virtual conference turns out and excited we can share videos of this work in the future! ✨
April 2020: Congrats to Blablablab collaborators Jane Im, Eshwar Chandrasekharan, Jackson Sargent, Paige Lighthammer, Taylor Denby, and Ankit Bhargava on getting our paper on detecting Russian trolls on Twitter accepted to WebSci 2020! This is the first paper for Jackson, Paiges, Taylor, and Ankit and hopefully the start of a great journey.
February 2020: Congrats to Blablablab collaborator Ashwin for being awarded a Facebook Fellowship for Computational Social Science! Looking foward to seeing all the great things you'll do.
January 2020: Congrats to Minje for getting a paper accepted to the Web Conference based on his summer internship at NOKIA Bell Labs! Great work!
December 2019: Sayan was selected for an Honorable Mention by the CRA Outstanding Undergraduate Research awards! Amazing work and he made it in the CSE News — so famous 🤩 !
November 2019: Blablablab collaborator Yulia Tsvetkov (CMU) presents our paper on microagressions at EMNLP! A tough but important classification problem for reducing incivility online. The paper reads pretty well for being written by academics too.
November 2019: Our paper at SocInfo on cross-cultural norms for social roles was nominated for best paper!
October 2019: How can we make beautiful maps like the one below that show regional variation in language? We have the answer for you! David is at NWAV presenting a tutorial on this with Jack Grieve. Check out the code and materials here, or just peek at the html-exported Juypter notebook online! Happy map making!
September 2019: A new fall season brings new PhD students. Welcome Aparna and Jiaxin!! 👋
July 2019: Blablablabbers Zijian and Sayan present the group's research at IC2S2 and ACL respectively. Look at that science delivery—such grace, such poise!
July 2019: What attributes to people ascribe to social roles and what those roles do? New paper on cross-cultural norms for social roles examines this question and was accepted at Social Informatics with CSE collaborators MeiXing Dong, Carmen Banea, and Rada Mihalcea! Congrats to MeiXing on her first conference paper!
June 2019: New paper out with collaborators from the School of Public Health showing systematic underrecognition of suicide amoung people transitioning to or living in long-term care facilities like nursing homes—serious stuff!—using NLP. Individuals' loss of identity and agency in this setting can have this profound and negative outcome and needs to be better recognized.
June 2019: Ashwin, Ram, and David win Best Paper at ICWSM for their work on measuring attitudes about caste discrimination through intercaste marriage!! Incredibly proud of this work!
June 2019: Wowza! Our UMich team was one of the 10 teams selected for the Amazon Alexa Prize Socialbot Grand Challenge 3! Way to go team! Can't wait to introduce the world to our super social Audrey!
May 2019: The poster for our WebConf paper on demographic inference for more accurate surveys won the Best Poster Presentation award! Check out the poster here--be careful of the sea monsters!
May 2019: Two papers accepted at ACL 2019! Congrats to co-authors MSI student Innocent Ndubuisi-Obi and CSE undergrad Sayan Ghosh for our work on understanding English-Naija code-switching and to co-authors UMSI Faculty Libby Hemphill and visiting student Eshwar Chandrasekha for our position paper arguing what should be the next steps for the NLP community in tackling abusive behavior.
April 2019: The NSF has graciously awarded David a CRII grant to study the language of social relationships. Thanks NSF!
March 2019: Blablablab celebrates an ICWSM acceptance on a paper about caste discrimination with collaborator Ashwin!
February 2019: David travels to UCLA to talk with their Computational Sociology group. Great visit and wonderful folks doing amazing research there!
January 2019: Blablablab celebrates even more as two Web Conference long papers are accepted! Congrats to now-alumnus Zijian for being first author on one!
January 2019: Blablablab celebrates the new year in style... and then furiously sprints to get those ICWSM papers in! Great job everyone!
December 2018: You can't spell breakthrough without break, so the group gets some much-needed rest over the holiday season.
November 2018: Just in time for the election, Jane showed that Russian trolls are still active on Twitter and trying to interact with major news reporters. Timely stuff!
November 2018: Jane presents her work on Wikipedia conflict resolution at CSCW and David and Zijian meet up in Brussels to talk about access to support in online communities. The lab hopes that Jane will bring back Montreal-style bagels too.
October 2018: David is off to NWAV47 to talk about Computational Sociolinguistics! He came back with a mountain of Montreal-style Bagels and a new appreciation for the Northern Cities Vowel Shift.
September 2018: New PhD rotating students Jane Im and Minje Choi arrive at UMSI! Welcome Jane and Minje!
August 2018: New EMNLP paper on supportive/unsupportive language accepted with undergraduate first author Zijian Wang! In online conversations, users who indicate they are women really do receive more unsupportive replies--yet they also receive more supportive replies. Lots of interesting follow-up questions on gendered interactions online #FoodForThought
August 2018: Visiting students Akshita Jha, Qi Sun, and Nan Gu depart physically but remain with us in spirit and co-authorship. Wonderful having you here with us this summer!

People

Professors

David Jurgens

Postdocs

Abraham Israeli

School of Information

Shivani Kumar

School of Information

PhD Students

Kenan Alkiek School of Information 🏆Theresa Noel Urban Blaurock Research Award	Hong Chen School of Information	Junghwan Kim Computer Science	Eleanor Lin Computer Science 🏆NSF Graduate Research Program Fellow
Benjamin Litterer School of Information	Nancy Xu School of Information

Masters Students

Undergraduates

Adarsh Bharathrawaj BS Computer Science	Michael Jiang BS Computer Science	Xinyi Li BS Computer Science	Rose Seidl BS Computer Science
Shea Shin BS Computer Science	Evan Wang BS Computer Science	Nicky Nguyen Yu BS Computer Science	Yusheng Yu BS Computer Science

External Student/Postdoc Collaborators

Dennis Assenmacher Postdoc at GESIS	Neele Falk Postdoc at University of Stuttgart	Anders Giovanni Møller PhD student at IT University of Copenhagen
Jonathan Ivey BS Computer Science @ U Arkansas → PhD Student at Johns Hopkins 🏆CRA Outstanding Undergraduate Researcher	Aparup Khatua	Jiayu Liu BS Statistics @ UIUC
Jiaxin Pei PhD @ UMSI → Postdoc at Stanford	Manon Reusens PhD student at the KU Leuven	Miriam Schirmer PhD student at Technical University of Munich
Anna Wegmann PhD student at Utrecht University	Dustin Wright Postdoc at University of Copenhagen

Alumni

(In reverse order of graduation with approximate date of graduation/collaboration)

Aparna Ananthasubramaniam School of Information 2027 → PhD in Social Work	Rohan Raju BS Computer Science 2026	Bowen Yi BS Computer Science 2026	Mohna Chakraborty MIDAS 2025	Yijun Pan BS Computer Science 2025
Sushrita Rakshit BS Computer Science 2025	Hua Shen School of Information 2025 → Postdoc @ University of Washington	Daniel Tian BS Computer Science 2025	Omkar Yadev BS Computer Science 2025	Irena Yi BS Computer Science 2025
Haotian Zhang BS Data Science 2025	Jenny Lee BS Computer Science 2024	Bangzhao Shu MS Information 2024 → PhD @ Northeastern University	Lechen Zhang MS Information 2024 → PhD @ UIUC	Mingqian Zheng MS Survey and Data Science 2024 → PhD @ CMU LTI
Minje Choi School of Information 2023	Lavinia Dunagan School of Information 2023 🏆NSF Graduate Research Program Fellow	Catherine Huang BS Computer Science 2023 → Scale AI	Leopele Raabe School of Information 2023 🏆National Intelligence University Professor of Strategic Intelligence Candidate	Agrima Seth School of Information 2023 🏆2022 Snap Fellowship Honorable Mention
Huaman Sun MS Computer Science 2023 → PhD in Sociology @ University of Toronto	Jason Yan School of Information 2023	Jia Zhu BS Computer Science 2023 → MS CS @ UMich	Haley Johnson Information Science 2022	Sagar Kumar 2022
Anne Lauscher 2022 University of Mannheim; intern at AI2, co-advised with Kyle Lo and Arman Cohan → Postdoc at Bocconi University	Hao Peng 2022 PhD @ UMSI (2022) → Postdoc @ Northwestern	Jackson Sergent BS Computer Science 2022	Xingyao Wang Computer Science 2022 → PhD @ UIUC 🏆CRA Outstanding Undergraduate Researcher, Honorable Mention	Jiamin Yang Data Science 2022
Bohan Zhang MS Computer Science 2022 Computer Science MS @ UMich → PhD at UMSI	Naitian Zhou Computer Science 2022 → PhD @ University of California Berkeley 🏆CRA Outstanding Undergraduate Researcher, Honorable Mention 🏆NSF Graduate Research Program Fellow	Jian Zhu Linguistics 2022 Linguistics PhD @ UMich → Faculty at the University of British Columbia	Susannah Chandhok 2021 Social Psychology PhD @ UMich → Google Research	Huiyang Ding (丁慧洋) MS Data Science 2021
MeiXing Dong Computer Science PhD @ UMich 2021	Lingyun Gao MS Information 2021	Lily Ge BS Computer Science 2021 → PhD @ Northwestern University	Sayan Ghosh Computer Science 2021 → PhD @ USC 🏆CRA Outstanding Undergraduate Researcher, Honorable Mention	Michelle Lee BS Computer Science 2021
Xingyu Lu Computer Science 2021	Talia Rizika Cognitive Science 2021 → Israeli Defense Force 🏆College Honors	Sky Wang Computer Science 2021 → Columbia University, PhD 🏆NSF Graduate Research Program Fellow 🏆U-M CSE Outstanding Research Award 🏆CRA Outstanding Undergraduate Researcher, Honorable Mention	Zach Zipper MS Computer Science 2021 → Reservoir Labs	Jiajun Bao BS Computer Science 2020 → CMU LTI, MS
Shivika Bisen MS Data Science 2020	Justin Chen BA in Computer Science and Sociology 2020 → GaTech MS	Shengyu Feng BS Computer Science 2020	Thomas Horak BS Linguistics and Computer Science 2020 U-M → Office of Academic Innovation	Sam Lee BS Computer Science 2020
Trevor Li BS Computer Science 2020	Christina Lu BS Computer Science, Dartmouth 2020	Adi Mannari BS Computer Science 2020	Yiming Zhang BS Computer Science 2020	Eshwar Chandrasekharan 2019 (Computer Science PhD @ Geogia Tech; visiting at UMSI → Faculty at UIUC)
Lingding Chen BS Computer Science 2019	Rex Chen BA Linguistics 2019 → Amazon SDE	Sanjana Kolisetty BS Computer Science 2019	Yaoyang Lin BS Computer Science (with Honors) 2019 → Harvard, MS in Data Science 🏆College Honors	Ashwin Rajadesingan 2019 PhD @ UMSI (2022) → Faculty at UT Austin, Communications 🏆Facebook Fellowship
Carol Zhang BS Computer Science 2019 → PhD, Carnegie Mellon University 🏆CRA Outstanding Undergraduate Researcher, Honorable Mention	Innocent Ndubuisi-Obi Master of Information Science 2018 → University of Washington, PhD in Computer Science	Zijian Wang BS Computer Science 2018 → Stanford, MS in Symbolic Systems	Xinyi Wu BS Computer Science 2018 → Univ Washington, MA in Computational Linguistics

Visiting Students

Michael Geraci BS Computer Science, University of Buffalo 2022	Athena Aghighi BA Sociology, University of California, Davis 2021	Kevin Henner MA Computational Linguistics, University of Washington 2019 → Seasalt.ai, Senior NLP Engineer	Wenhao Li BS Computer Science and Technology, Tsinghua University 2019 → Univeristy of North Carolina PhD	Junjie Wu BS Statistics, Sun Yat-sen University 2019 → HKUST PhD
Nan Gu BS Electrical Engineering @ Tsinghua University 2018	Akshita Jha MS Computer Science @ IIIT 2018 → Virginia Tech, PhD in CS	Qi Sun BS Computer Science @ Peking University 2018	Shangming Zhao BS Software Engineering @ Tsinghua University 2018

Projects

The Blablablab studies people and language. In all our work, we aim to advance both NLP methodology and our understanding of the social world. Below are our current themes. If you're interested in joining the lab, we're looking for students who are interested in any of these topics!

Social Reasoning

Social settings are complex. We study how people reason about social situations, and how language and behavior change in social contexts. Our work is grounded in social and cognitive psychology, and develops new computational methods to study language and mental models.

Human-AI Collaboration in Evaluation

Most NLP models are designed to do one or more tasks. To train or assess how good those models are, we need some kind of ground truth to evaluate. Creating this ground truth can be very challenging! Our work examines how and when we can use humans and AI systems together and individually to better evaluate NLP models for even the most complex tasks.

Information Ecosystems

The interconnected and rapid nature of news and social media means that people can get new information almost anywhere, anytime. How does this news spread and who does it reach, especially as it cross social, linguistic, or medium boundaries? Our work studies whole ecosystems of how the language of information changes and the social process by which it emerges and evolves.

Publications

2025

NUTMEG is an alternative to MACE for identifying ground truth when groups of annotators systematically disagree.

NUTMEG: Separating Signal From Noise in Annotator Disagreement
Jonathan Ivey, Susan Gauch, and David Jurgens
preprint
paper

There are so many papers on linguistic coordination. We tried to make sense of them.

Coordinating Chaos: A Structured Review of Linguistic Coordination Methodologies.
Ben Litterer, David Jurgens, and Dallas Card.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL)
paper

The podcast ecosystem, colored by topic.

Mapping the Podcast Ecosystem with the Structured Podcast Research Corpus.
Ben Litterer, David Jurgens, and Dallas Card.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL)
paper · data · code · code

Moral reasoning is complex and we introduce a new dataset that captures multiple aspects of moral reasoning

Are Rules Meant to be Broken? Understanding Multilingual Moral Reasoning as a Computational Pipeline with UniMoral
Shrivani Kumar, David Jurgens
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL)
paper

LLM performance on subjective tasks when fine-tuned on demographic information

Beyond Demographics: Fine-tuning Large Language Models to Predict Individuals' Subjective Text Perceptions
Matthias Orlikowski, Jiaxin Pei, Paul Röttger, Philipp Cimiano, David Jurgens, and Dirk Hovy
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL)
paper

The Noisy Path from Source to Citation: Measuring How Scholars Engage with Past Research
Hong Chen, Misha Teplitskiy, David Jurgens
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL)
paper

Authorship attribution models currently do poorly in cross-lingual settings.

The Million Authors Corpus: A Cross-Lingual and Cross-Domain Wikipedia Dataset for Authorship Verification
Abraham Israeli, Shuai Liu, Jonathan May, and David Jurgens
Findings of ACL
paper

Tokenization methods vary in their sensitivity to language variation.

Tokenization is Sensitive to Language Variation
Anna Wegmann, Dong Nguyen, and David Jurgens
Findings of ACL
paper

Generative AI can change how people use social media for better or worse

The Impact of Generative AI on Social Media: An Experimental Study
Anders Giovanni Møller, Daniel M. Romero, David Jurgens, Luca Maria Aiello
preprint
paper

Evaluation Framework for AI Systems in "the Wild"
Sarah Jabbour, Trenton Chang, Anindya Das Antar, Joseph Peper, Insu Jang, Jiachen Liu, Jae-Won Chung, Shiqi He, Michael Wellman, Bryan Goodman, Elizabeth Bondi-Kelly, Kevin Samy, Rada Mihalcea, Mosharaf Chowhury, David Jurgens, Lu Wang
AI Lab Whitepaper
paper

Who Reaps All the Superchats? A Large-Scale Analysis of Income Inequality in Virtual YouTuber Livestreaming
Ruijing Zhao, Brian Diep, Jiaxin Pei, Dongwook Yoon, David Jurgens, Jian Zhu
Proceedings of the 2025 Conference on Human Factors in Computing Systems (CHI), 2025
paper

Different genres plotted according to their Biber-derived stylistic regularity

Neurobiber: Fast and Interpretable Stylistic Feature Extraction
Kenan Alkiek, Anna Wegmann, Jian Zhu, David Jurgens
preprint
paper

Unstructured Evidence Attribution for Long Context Query Focused Summarization
Dustin Wright, Zain Muhammad Mujahid, Lu Wang, Isabelle Augenstein, David Jurgens
preprint
paper

Not all definitions of empathy are useful for modeling empathy in language, but some are!

The Muddy Waters of Modeling Empathy in Language: The Practical Impacts of Theoretical Constructs
Allison Lahnala, Charlie Welch, David Jurgens, Lucie Flek
preprint
paper

The persuasive role of generic-you in online interactions
Minxue Niu, Emily Mower Provost, David Jurgens, Susan A. Gelman, Ethan Kross, and Ariana Orvell
Scientific Reports 15(1), 1347
paper

Hashtags spread differently depending on the network structure and the identity of the users who use them

The Role of Network and Identity in the Diffusion of Hashtags
Aparna Ananthasubramaniam, Yufei 'Louise' Zhu, David Jurgens, Daniel Romero
The Web Conference, 2025
paper

When you read an email, does it matter more who you are or how the email is written if you want a reply? Read our paper to find out!

Causally Modeling the Linguistic and Social Factors that Predict Email Response
Yinuo Xu, Hong Chen, Sushrita Rakshit, Aparna Ananthasubramaniam, Omkar Yadav, Mingqian Zheng, Michael Jiang, Lechen Zhang, Bowen Yi, Kenan Alkiek, Abraham Israeli, Bangzhao Shu, Hua Shen, Jiaxin Pei, Haotian Zhang, Miriam Schirmer, and David Jurgens.
Proceedings of the 2025 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL).
paper

The answers of LLMs align with the perceptions of specific social groups.

Aligning with Whom? Large Language Models Have Gender and Racial Biases in Subjective NLP Tasks
Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens.
Proceedings of the 2025 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL).
paper · data · code

People are more likely to read science news depending on how it is written

Modeling Public Perceptions of Science in Media
Jiaxin Pei, Dustin Wright, Isabelle Augenstin, David Jurgens
preprint
paper

Socially aware language technologies and their connections with linguistics, social sciences, and NLP

The Call for Socially Aware Language Technologies.
Diyi Yang, Dirk Hovy, David Jurgens, and Barbara Plank.
Computational Sociolinguistics 51(2).
paper

2024

Not all good Wikipedia articles stay good. Why is that? Read our paper to find out.

A Test of Time: Predicting the Sustainable Success of Online Collaboration in Wikipedia.
Abraham Israeli, David Jurgens, and Daniel Romero.
preprint.
paper · data · code

Optimizing the system and task parts of the prompt can have huge benefits

SPRIG: Improving Large Language Model Performance by System Prompt Optimization.
Lechen Zhang, Tolga Ergen, Lajanugen Logeswaran, Moontae Lee, and David Jurgens.
preprint.
paper · data · code

The prompt matters in how human an LLM can seem

Real or Robotic? Assessing Whether LLMs Accurately Simulate Qualities of Human Responses in Dialogue.
Johnathan Ivey, Shivani Kumar, Jiayu Liu, Hua Shen, Sushrita Rakshit, Rohan Raju, Haotian Zhang, Aparna Ananthasubramaniam, Junghwan Kim, Bowen Yi, Dustin Wright, Abraham Israeli, Anders Giovanni Møller, Lechen Zhang, and David Jurgens.
preprint.
paper · data · code

Pathways of linguistic diffusion seen on Twitter

Networks and Identity Drive Geographic Properties of the Diffusion of Linguistic Innovation
Aparna Ananthasubramaniam, David Jurgens, Daniel M. Romero.
npj Complexity. 2024.
paper

The pipeline for collecting data of traumatic events

The Language of Trauma: Modeling Traumatic Event Descriptions Across Domains with Explainable AI
Miriam Schirmer, Tobias Leemann, Gjergji Kasneci, Jürgen Pfeffer, and David Jurgens.
Findings of EMNLP. 2024.
paper

LLM agents can simulate human trust behaviors

Can Large Language Model Agents Simulate Human Trust Behaviors?
Chengxing Xie, Canyu Chen, Feiran Jia, Ziyu Ye, Kai Shu, Adel Bibi, Ziniu Hu, David Jurgens, James Evans, Philip Torr, Bernard Ghanem, and Guohao Li
NeurIPS 2024
paper

Communities respond differently to the same message depending on their underlying values

ValueScope: Unveiling Implicit Norms and Values via Return Potential Model of Social Interactions
Chan Young Park, Shuyue Stella Li, Hayoung Jung, Svitlana Volkova, Tanushree Mitra, David Jurgens, and Yulia Tsvetkov.
Findings of EMNLP. 2024.
paper · data · code

Tables are data too. Maybe they can be text as well!

Tab2Text - A framework for deep learning with tabular data
Tong Lin*, Jason Yan*, David Jurgens, and Sabina Tomkins.
Findings of EMNLP. 2024.
paper

LLMs answer questions more or less accurately depending on the social roles in the question prompt

Is "A Helpful Assistant" the Best Role for Large Language Models? A Systematic Evaluation of Social Roles in System Prompts
Mingqian Zheng, Jiaxin Pei, Lajanugen Logeswaran, Moontae Lee, and David Jurgens.
Findings of EMNLP. 2024.
paper · data · code

Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions.
Hua Shen, Tiffany Knearem, Reshmi Ghosh, Kenan Alkiek, Kundan Krishna, Yachuan Liu, Ziqiao Ma, Savvas Petridis, Yi-Hao Peng, Li Qiwei, Sushrita Rakshit, Chenglei Si, Yutong Xie, Jeffrey P. Bigham, Frank Bentley, Joyce Chai, Zachary Lipton, Qiaozhu Mei, Rada Mihalcea, Michael Terry, Diyi Yang, Meredith Ringel Morris, Paul Resnick, and David Jurgens.
preprint.
paper

A Multilingual Similarity Dataset for News Article Frame
Xi Chen, Mattia Samory, Scott Hale, David Jurgens, Przemyslaw A Grabowicz
Proceedings of the International AAAI Conference on Web and Social Media (ICWSM).
paper · data

Large language models are bad at answering psychological questionnaires consistently

You don't need a personality test to know these models are unreliable: Assessing the Reliability of Large Language Models on Psychometric Instruments
Bangzhao Shu*, Lechen Zhang*, Minje Choi, Lavinia Dunagan, Dallas Card, and David Jurgens.
Proceedings of the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics.
paper · data · code

Memes are multimodal constructions where the base image template and additional text fills both have semantic value.

Social Meme-ing: Measuring Linguistic Variation in Memes
Naitian Zhou, David Jurgens, and David Bamman.
Proceedings of the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics.
paper · data · code

The empathetic alignment between an author and responder on Reddit shows most people just give advice.

Modeling Empathetic Alignment in Conversation
Jiamin Yang and David Jurgens.
Proceedings of the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics.
paper · data · code
Jiamin's amazing annotation tool: https://github.com/jessicayjm/span_alignment_annotation_tool

Strong influence connections in the global news network.

Global News Synchrony During the Start of the COVID-19 Pandemic
Xi Chen, Scott A. Hale, David Jurgens, Mattia Samory, Ethan Zuckerman, Przemyslaw Adam Grabowicz.
Proceedings of the 2024 Web Conference.
paper · data · code

The network model for estimating contextual informativeness.

Finding Educationally Supportive Contexts for Vocabulary Learning with Attention-Based Models
Sungjin Nam, Kevyn Collins-Thompson, David Jurgens and Xin Tong.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING).
paper

2023

Characteristics of and variation in suicide mortality related to retirement during the Great Recession: perspectives from the National Violent Death Reporting System.
Aparna Ananthasubramaniam, David Jurgens, Eskira Kahsay, and Briana Mezuk.
The Gerontologist gnae015. 2024.
paper

zero-shot LLM performance on social language tasks

Do LLMs Understand Social Knowledge? Evaluating the Sociability of Large Language Models with SocKET Benchmark
Minje Choi,* Jiaxin Pei,* Sagar Kumar, Chang Shu and David Jurgens.
Proceedings of the Empirical Methods in Natural Language Processing (EMNLP). 2023.
paper · data · code

When it Rains, it Pours: Modeling Media Storms and the News Ecosystem
Ben Litterer, David Jurgens, and Dallas Card.
Proceedings of the Empirical Methods in Natural Language Processing (EMNLP). 2023.
paper · data · code

The probability that, given an appropriate message for the relationships represented by a row, the message will also be appropriate in another relationship listed in the column. Probabilities are calculated across the entire data

Your spouse needs professional help: Determining the Contextual Appropriateness of Messages through Modeling Social Relationships
David Jurgens,* Agrima Seth,* Jackson Sargent,† Athena Aghighi,† and Michael Geraci.†
Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL). 2023.
paper · data · code

Relative use of politeness strategies when annotators rewrite emails to be more polite

When Do Annotator Demographics Matter? Measuring The Influence of Annotator Demographics with the POPQUORN Dataset
Jiaxin Pei and David Jurgens.
Proceedings of the 17th Linguistic Annotation Workshop (LAW-XVII) at ACL. 2023.
paper · data · code

The causal-estimated effect of banning on users matching the style of others

Exploring Linguistic Style Matching in Online Communities: The Role of Social Context and Conversation Dynamics
Aparna Ananthasubramaniam, Hong Chen, Jason Yan, Kenan Alkiek, Jiaxin Pei, Agrima Seth, Lavinia Dunagan, Minje Choi, Benjamin Litterer and David Jurgens.
Proceedings of the 1st Workshop on Social Influence in Conversations (SICon) at ACL. 2023.
paper · data · code

Overall performance on each language. The box indicates the lower quartile to the upper quartile and the whisker indicates the maximum and the minimum. Outliers are shown as dots. Participants generally achieve better performances on languages in the training set and achieved good performance on Arabic and Dutch. Predicting intimacy in Hindi and Korean remains challenging. Moreover, performances on unseen languages generally have larger variances.

SemEval 2023 Task 9: Multilingual Tweet Intimacy Analysis
Jiaxin Pei, Vítor Silva, Maarten Bos, Yozon Liu, Leonardo Neves, David Jurgens, and Francesco Barbieri.
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval).
paper · data

The effects of personal shocks on people's social media activities

Analyzing the Engagement of Social Relationships During Life Event Shocks in Social Media
Minje Choi, David Jurgens, and Daniel Romero.
Proceedings of the International Conference on Web and Social Media (ICWSM). 2023.
paper · data · code

The influence of multilingual individuals on social connectedness in Europe

Bridging Nations: Quantifying the Role of Multilinguals in Communication on Social Media
Julia Mendelsohn, Sayan Ghosh, David Jurgens, and Ceren Budak.
Proceedings of the International Conference on Web and Social Media (ICWSM). 2023.
paper · data · code

2022

Work Expectations, Depressive Symptoms, and Passive Suicidal Ideation Among Older Adults: Evidence From the Health and Retirement Study
Briana Mezuk, Linh Dang, David Jurgens, Jacqui Smith.
The Gerontologist 62 (10), 1454-1465 2022.
paper

The way the press portrays certain scientific results differs by where those results were described in the paper

Modeling Information Change in Science Communication with Semantically Matched Paraphrases
Dustin Wright, Jiaxin Pei, David Jurgens, and Isabelle Augenstein.
Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). 2022.
paper · data · code

Not all empathy papers use empathy in the same way

A Critical Reflection and Forward Perspective on Empathy and Natural Language Processing
Allison Claire Lahnala, Charles Welch, David Jurgens, and Lucie Flek.
Proceedings of the Findings of Empirical Methods in Natural Language Processing (EMNLP Findings). 2022.
paper

POTATO: The Portable Text Annotation Tool
Jiaxin Pei, Aparna Kamakshi Ananthasubramaniam, Xingyao Wang, Naitian Zhou, Apostolos Dedeloudis, Jackson Sargent and David Jurgens.
Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP): Systems Demonstrations. 2022.
paper · code

MultiCite: Modeling realistic citations requires moving beyond the single-sentence single-label setting
Anne Lauscher, Brandon Ko, Bailey Kuhl, Sophie Johnson, Arman Cohan, David Jurgens, Kyle Lo.
Proceedings of the 2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). 2022.
paper · data · code

The subtle language of exclusion: Identifying the Toxic Speech of Trans-exclusionary Radical Feminists
Christina Lu and David Jurgens.
Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022.
paper · data · code

Correlations between the ways in which two news articles can be similar.

SemEval-2022 Task 8: Multilingual news article similarity
Xi Chen, Ali Zeynali, Chico Camargo, Fabian Flöck, Devin Gaffney, Przemyslaw Grabowicz, Scott Hale, David Jurgens, and Mattia Samory.
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022). 2022.
paper · data

The effect of curriculum ordering on word similarity tasks

An Attention-Based Model for Predicting Contextual Informativeness and Curriculum Learning Applications
Sungjin Nam, David Jurgens, and Kevyn Collins-Thompson.
in submission. 2022.
paper

Diversifying the Professoriate
Bas Hofstra, Daniel A. McFarland, Sanne Smith, David Jurgens.
Socius. 2022.
paper

Similarities in Redditor political affiliations and commenting activity

Classification without (Proper) Representation: Political Heterogeneity in Social Media and Its Implications for Classification and Behavioral Analysis
Kenan Alkik, Bohan Zhang, and David Jurgens.
ACL Findings. 2022.
paper · code

Multilingual performance on grapheme to phoneme conversion

ByT5 model for massively multilingual grapheme-to-phoneme conversion
Jian Zhu, Cong Zhang, and David Jurgens.
Interspeech 2022.
paper · code

Language in Popular American Culture Constructs the Meaning of Healthy and Unhealthy Eating: Narratives of Craveability, Excitement, and Social Connection in Movies, Television, Social Media, Recipes, and Food Reviews
Bradley P. Turnwald, Margaret A. Perry, David Jurgens, Vinodkumar Prabhakaran, Dan Jurafsky, Hazel R. Markus, Alia J. Crum.
Appetitte. 2022.
paper

Phone-to-audio alignment without text: A Semi-supervised Approach
Jian Zhu, Cong Zhang, and David Jurgens.
Proceedings of the 2022 IEEE International Conference on Acoustics, Speech and Signal Processing.
paper · code

2021

Modeling Framing in Immigration Discourse on Social Media
Julia Mendelsohn, Ceren Budak, David Jurgens
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). 2021.
paper · code

Latent classes of biased words and their effects on toxicity

Detecting Cross-Geographic Biases in Toxicity Modeling on Social Media
Sayan Ghosh, Dylan Baker, David Jurgens, and Vinodkumar Prabhakaran.
Proceedings of the 7th Workshop on Noisy User-generated Text (W-NUT).
paper

Using Sociolinguistic Variables to Reveal Changing Attitudes Towards Sexuality and Gender
Sky Wang and David Jurgens.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP)
paper

Idiosyncratic but not Arbitrary: Learning Idiolects in Online Registers Reveals Distinctive yet Consistent Individual Styles
Jian Zhu and David Jurgens..
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP)
paper · data · code

Measuring Sentence-Level and Aspect-Level Certainty in Science Communications
Jiaxin Pei and David Jurgens.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP)
paper · data · code

Detecting Community Sensitive Norm Violations in Online Conversations
Chan Young Park, Julia Mendelsohn, Karthik Radhakrishnan, Kinjal Jain, Tushar Kanakagiri, David Jurgens and Yulia Tsvetkov.
Proceedings of the Findings of the 2021 Conference on Empirical Methods in Natural Language Processing (Findings of EMNLP)
paper

An Animated Picture Says at Least a Thousand Words: Selecting Gif-based Replies in Multimodal Dialog.
Xingyao Wang and David Jurgens.
Proceedings of the Findings of the 2021 Conference on Empirical Methods in Natural Language Processing (Findings of EMNLP)
paper · data · code
Slack gif-bot App: https://github.com/xingyaoww/gif-reply-slack-bot

A Data Science Approach to Estimating the Frequency of Driving Cessation Associated Suicide in the US: Evidence From the National Violent Death Reporting System
Tomohiro M. Ko,, Viktoryia A. Kalesnikava, David Jurgens, and Briana Mezuk.
Frontiers in Public Health
paper

Learning PyTorch Through A Neural Dependency Parsing Exercise
David Jurgens.
Proceedings of the Fifth Workshop on Teaching NLP, 2021.
paper

Learning about Word Vector Representations and Deep Learning through Implementing Word2vec
David Jurgens.
Proceedings of the Fifth Workshop on Teaching NLP, 2021.
paper

2020

Author mentions in science news reveal widespread disparities across name‐inferred ethnicities.
Hao Peng, Misha Teplitskiy, David Jurgens.
Journal of Quantitative Social Sciences.
paper
(preprint)

Quantifying Intimacy In Language
Jiaxin Pei and David Jurgens.
Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020.
paper · code
project webpage: https://blablablab.si.umich.edu/projects/intimacy/; pip-installable package: https://pypi.org/project/question-intimacy/

Condolence and Empathy in Online Communities
Naitian Zhou and David Jurgens.
Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020.
paper
project webpage: https://blablablab.si.umich.edu/projects/condolence/

Still out there: Modeling and Identifying Russian Troll Accounts on Twitter.
Jane Im, Eshwar Chandrasekharan, Jackson Sargent, Paige Lighthammer, Taylor Denby, Ankit Bhargava, Libby Hemphill, David Jurgens, Eric Gilbert.
Proceedings of Web Science, 2020.
paper

Measuring the predictability of life outcomes with a scientific mass collaboration.
Matthew J. Salganik, Ian Lundberg, Alexander T. Kindel, Caitlin E. Ahearn, Khaled Al-Ghoneim, Abdullah Almaatouq, Drew M. Altschul, Jennie E. Brand, Nicole Bohme Carnegie, Ryan James Compton, Debanjan Datta, Thomas Davidson, Anna Filippova, Connor Gilroy, Brian J. Goode, Eaman Jahani, Ridhi Kashyap, Antje Kirchner, Stephen McKay, Allison C. Morgan, Alex “Sandy” Pentland, Kivan Polimis, Louis Raes, Daniel E. Rigobon, Claudia V. Roberts, Diana M. Stanescu, Yoshihiko Suhara, Adaner Usmani, Erik H. Wang, Muna Adem, Abdulla Alhajri, Bedoor AlShebli, Redwane Amin, Ryan B. Amos, Lisa P. Argyle, Livia Baer-Bositis, Moritz Büchi, Bo-Ryehn Chung, William Eggert, Gregory Faletto, Zhilin Fan, Jeremy Freese, Tejomay Gadgil, Josh Gagné, Yue Gaobj, Andrew Halpern-Manners, Sonia P. Hashim, Sonia A. Hausen, Guanhua He, Kimberly Higuera, Bernie Hogan, Ilana M. Horwitz, Lisa M. Hummel, Naman Jain, Kun Jin, David Jurgens, Patrick C. Kaminski, Areg Karapetyan, E. H. Kim, Ben Leizman, Naijia Liu, Malte Möser, Andrew E. Mack, Mayank Mahajan, Noah Mandell, Helge-Johannes Marahrens, Diana Mercado-Garcia, Viola Mocz, Katariina Mueller-Gastell, Ahmed Musse, Qiankun Niu, William P. Nowak, Hamidreza Omidvar, Andrew Or, Karen Ouyang, Katy M. Pinto, Ethan Porter, Kristin E. Porter, Crystal Qian, Tamkinat Rauf, Anahit Sargsyan, Thomas Schaffner, Landon Schnabel, Bryan Schonfeld, Ben Sender, Jonathan D. Tang, Emma Tsurkov, Austin van Loon, Onur Varol, Xiafei Wang, Zhi Wang, Julia Wang, Flora Wang, Samantha Weissman, Kirstie Whitaker, Maria K Wolters, Wei Lee Woon, James Wu, Catherine Wu, Kengran Yang, Jingwen Yin, Bingyu Zhao, Chenyun Zhu, Jeanne Brooks-Gunn, Barbara E. Engelhardt, Moritz Hardt, Dean Knox, Karen Levy, Arvind Narayanan, Brandon M. Stewart, Duncan J. Watts, and Sara McLanahan.
Proceedings of the National Academy of Sciences. Mar 2020, 201915006; DOI: 10.1073/pnas.1915006117
paper

2019

Finding Microaggressions in the Wild: A Case for Locating Elusive Phenomena in Social Media Posts
Luke Breitfeller, Emily Ahn, David Jurgens, and Yulia Tsvetkov.
Proceedings of 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019.
paper

Perceptions of social roles across cultures.
Meixing Dong, David Jurgens, Carmen Banea and Rada Mihalcea.
Proceedings of Social Informatics (SocInfo), 2019.
paper

Suicide Among Older Adults Living in or Transitioning to Residential Long-term Care, 2003 to 2015
Briana Mezuk, Tomohiro M. Ko, Viktoryia A. Kalesnikava, and David Jurgens.
JAMA Network Open 2019;2(6):e195627
paper

Wetin dey with these comments? Modeling Sociolinguistic Factors Affecting Code-switching Behavior in Nigerian Online Discussions
Innocent Ndubuisi-Obi*, Sayan Ghosh*, David Jurgens.
Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2019
paper

A Just and Comprehensive Strategy for Using NLP to Address Online Abuse
David Jurgens, Libby Hemphill and Eshwar Chandrasekharan.
Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2019
paper

Smart, Responsible, and Upper Caste Only:Measuring Caste Attitudes through Large-Scale Analysis of Matrimonial Profiles
Ashwin Rajadesingan, Ramaswami Mahalingam, David Jurgens.
Proceedings of the AAAI International Conference on Web and Social Media (ICWSM), 2019
paper

Demographic Inference and Representative Population Estimates from Multilingual Social Media Data.
Zijian Wang, Scott Hale, David Ifeoluwa Adelani, Przemyslaw Grabowicz, Timo Hartmann, Fabian Flöck and David Jurgens*.
Proceedings of the Web Conference, 2019
paper · code
*Corresponding senior author; demo: http://www.euagendas.org/m3demo/

Are All Successful Communities Alike? Characterizing and Predicting the Success of Online Communities.
Tiago Cunha, David Jurgens, Chenhao Tan and Daniel Romero.
Proceedings of the Web Conference, 2019
paper

2018

It's going to be okay: Measuring Access to Support in Online Communities.
Zijian Wang and David Jurgens.
Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2018
paper · data · code
supplementary: http://anthology.aclweb.org/attachments/D/D18/D18-1004.Attachment.pdf

RtGender: A Corpus of Responses to Gender for Studying Gender Bias.
Rob Voigt, David Jurgens, Vinodkumar Prabhakaran, Dan Jurafsky, and Yulia Tsvetkov.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC), 2018
paper · data

Measuring the Evolution of a Scientific Field through Citation Frames.
David Jurgens, Srijan Kumar, Raine Hoover, Dan McFarland, Dan Jurafsky.
Transactions of the Association for Computational Linguistics (TACL). 2018.
paper · data · code

Software + Data

Software

M3: Multi-modal, multilingual, multi-attribute demographic inference. You can also get it pypi as a pip-installable package!
GenderPerformr: Infers gender performance for any kind of online username, name, handle, etc.
Social Support: Infers how supportive, neutral or unsupportive is an online reply
Citation Function: Infers why an author cited another paper
Immigration Framing: Infers which frames are used in a short text message about immigration. Neural models available via HuggingFace!
Social Relationships: Infers the type of social relationship between two Twitter users based on dialog and network structure
Gif-Reply Models: Multimodal models that will pick an appropriate gif reply for a given message.
Pepe the Gif-Reply Bot: A Slack App that you can deploy to automatically reply to messages on your workspace with gifs
Certainty Estimator (also on pip): A neural library for estimating the certainty/uncertainty of a statement along with which aspects are certain or uncertain (this works much better than hedges.
Intimacy Estimator (also on pip): A neural library for estimating the intimacy level of questions in conversation. Very useful for looking at social distance and social norms in conversation!

Data

Citation Function Data: labeled training data of citations by their rhetorical function.
Gender-Labeled Online Conversations: 100M online dyadic conversations from Reddit, Wikipedia, and StackExchange, labeled by gender salience. To respect the privacy and dignity of individuals, this dataset is available for non-commercial research purposes online; please email the lab PI to obtain access.
Reply Supportiveness Ratings: ~9K ratings of replies to authors rated on a 5-point Likert scale for how supportive (or unsupportive) they are. Items are balanced across Reddit, Wikipedia, and StackExchange interactions, as well as by length (equal amounts of short, medium, and long comments).
MultiCiteNew citation dataset with multiple intents per context and variable sized contexts expressing that intent (e.g., multiple sentences are needed to understand why an author cited that paper).
Gif-based dialog 1.56M conversation turns on Twitter where the replying user has responded with a gif. Gifs are canonicalized to an image hash, which is re-usable for matching new gifs.

Prospective Students

Masters and Undergraduate Students

For current students, due to the technical work that we do in the lab, I typically require students to have taking some class on NLP or advanced Machine Learning to give them the requisite skills. Without those classes, we end up teaching you many of the same techniques in a less principled way which takes more time. If feel like you already have significant research experience (just not in NLP or ML), please explain what you've done. Students are typically expected to join our group's research meetings. During the school year, we'll typically have one meeting a week that is also with the project's co-supervisor (a PhD student or postdoc). Interest students should apply through this form

PhD students

I admit roughly one PhD student per year. Sometimes students are co-admitted or co-advised, so the number of admits can vary.

For prospective PhDs, I especially like students who come with a strong computational background with some experience in social science. There are no set criteria, but you're much better off towards admissions if you've contacted me (or your advisor has) and let me know of your interests and goals. Make sure to look over these pages carefully; the match should be pretty strong. A PhD student is very costly - in time and money - and I select students for my research group carefully.

If you're a current PhD student outside of CSE or SI, I'm open to collaborations. One of the best things about SI and CSE are the interdisciplinary environments and I'm potentially open to hosting students outside my home departments (but inside UM) in lab or co-advising on projects where it makes sense. Regardless, I'd love to hear from you and you're always welcome to come take my classes

PhD Students not at the University of Michigan

If you're a PhD student somewhere else and want to work with me (while being external), this could happen under the right circumstances. Typically, your advisor at your primary institution and I would co-advise you on a specific project. I typically only do these kinds of arrangements when I know your advisor (more common) or when the collaborative project make sense (rare). To get this started, have your advisor email me (not you directly) about what the project is.

Non-PhD External Students not at the University of Michigan

I unfortunately rarely work with highschoolers, undergraduates or master students who are not physically at the University of Michigan. I still get emails from external students asking if we could together on something remotely and I really would love to, but my priority is to advise the current students at UM given the limited bandwith I have for advising. Your best bet to work with me is to get admitted to one of our programs and then drop me an email.

Postdocs

I would love to have you all in my lab but this is generally dependent on funding (but seriously, I would take you all if I could). Email me if you think you're a good match and tell me why and we might be able to figure something out. That said, at the moment, I'm not currently actively seeking postdocs (due to funding, of course). If you're coming with your own funding, that changes everything, so drop me a line then.

The Bringing together Language and Behavior for Large-scale Analytical Breakthroughs Laboratory (Blablablab)

Professors

Postdocs

PhD Students

Masters Students

Undergraduates

External Student/Postdoc Collaborators

Alumni

Visiting Students

Social Reasoning

Human-AI Collaboration in Evaluation

Information Ecosystems

2025

2024

2023

2022

2021

2020

2019

2018

Software

Data

Masters and Undergraduate Students

PhD students

PhD Students not at the University of Michigan

Non-PhD External Students not at the University of Michigan

Postdocs