The RAISE Lab @ Penn State is a group of Artificial Intelligence (AI) researchers that make foundational contributions to the field of Responsible AI for Social Emancipation; our goal is to advance the state-of-the-art in AI tools and algorithms to solve critical challenges faced by marginalized communities around the world, while ensuring that our algorithms do not exacerbate existing health/social/economic inequities in society.
Our research is highly interdisciplinary; we closely collaborate with domain experts in public health, social work, agronomy, conservation, and public safety and security (among others) to develop an understanding of key societal issues in these domains; we then develop state-of-the-art AI tools and algorithms which address these societal issues. In particular, we conduct fundamental AI research in the sub-fields of spatiotemporal deep learning, social network analysis, game theory, and FAT-ML (fairness, accountability, and transparency in ML), while using techniques from multi-agent systems and operations research. Aiming to address the most pressing problems in current-day societies, the RAISE Lab intends to bridge the divide between theory and practice in AI research by not only providing strong methodological contributions, but also by building applications that are tested and deployed in the real world. These applications have fundamentally altered practices in the domains that we have worked in. A unique aspect of our research is that we spend a considerable amount of time in the field, whether it is in urban settings in Los Angeles, or in rural settings in Kenya, Ethiopia, and India, to translate theory into practice, and to ensure that our AI models and algorithms actually get deployed in the real-world.
Specifically, we work on advancing AI research motivated by the grand challenges of the American Academy of Social Work and Social Welfare and the UN Sustainable Development agenda. A particular interest of ours is focusing on problems faced by under-served communities around the world, and trying to develop AI-driven tools and techniques to tackle these problems. While developing these solutions, a key focus of our is to ensure that our algorithms do not exacerbate existing health/social/economic inequities in society.
Examples of research projects include (i) AI for raising awareness about HIV among homeless youth; (ii) AI for mitigating substance abuse crisis among homeless youth; (iii) AI for helping smallholder farmers to mitigate the impacts of climate change; and (iv) AI for designing optimal testing policies for COVID-19.
The list of current members and past alumni can be found here. A comprehensive list of all publications that have emerged out of the RAISE Lab can be found here.
Code-mixing(CM) is commonly observed in face-to-face conversations or online communication in multilingual societies. It refers to the juxtaposition of linguistic units from two or more languages during a long conversation or sometimes even during a single utterance. Code-mixing presents unique challenges, including syntactic incongruence and semantic blending, which are not commonly encountered in monolingual contexts. In addition, code-mixing often occurs with low-resource languages, which brings additional difficulty in dealing with these texts. The ability to process and understand code-mixing is essential for developing truly inclusive AI systems that could accommodate the linguistic diversity of users worldwide, especially speakers of low-resource languages. In recent years, large language models (LLMs) have transformed the natural language processing (NLP) area by providing previously unheard-of capacities for comprehending, producing, and interacting with human language. These models are trained on large text corpora, enabling them to grasp a wide array of linguistic patterns and nuances. LLMs are remarkably adept at handling monolingual text, furthermore, multilingual language models have shown decent performance in multilingual settings and natural language understanding tasks. Unfortunately, the effectiveness of current state-of-the-art monolingual and multilingual LLMs has not yet been explored fully on code-mixed text. We open up a new research agenda focused around the design and development of large language models that can handle code-mixed text.
Existing work in Explainable Artificial Intelligence (XAI) has been focused on developing techniques to interpret decisions made by black-box machine learning (ML) models. In particular, counterfactual (CF) explanation methods find a new counterfactual example, which is similar to the input instance but gets a different/opposite prediction from the ML model. Counterfactual explanation techniques are often preferred by human end-users because of their ability to provide actionable recourse to individuals who are negatively impacted by algorithm-mediated decisions. For example, CF explanation techniques can be used to provide algorithmic recourse for impoverished loan applicants who have been denied a loan by a bank’s ML algorithm, etc. Existing CF explanation methods suffer from two major limitations. First, to our best knowledge, all prior methods belong to the post-hoc explanation paradigm which are designed for use with proprietary ML models. As a result, their procedure for generating CF explanations is uninformed by the training of the ML model, which leads to misalignment between model predictions and explanations. Secondly, existing CF explanation methods generate recourses under the assumption that the underlying target ML model remains stationary over time. However, due to commonly occurring distributional shifts in training data, ML models constantly get updated in practice, which might render previously generated recourses invalid and diminish end-users trust in our algorithmic framework. To address these problem, we propose two novel frameworks: (i) CounterNet, a novel end-to-end learning framework which integrates Machine Learning (ML) model training and the generation of corresponding counterfactual (CF) explanations into a single end-to-end pipeline; (ii) RoCourseNet, a training framework that jointly optimizes predictions and recourses that are robust to future data shifts. Counterfactual and recourse explanations offer a contrastive case, i.e., they attempt to find the smallest modification to the feature values of an instance that changes the prediction of the ML model on that instance to a predefined output. CounterNet makes a novel departure from the prevalent post-hoc paradigm (of generating CF explanations) by integrating predictive model training and the generation of counterfactual (CF) explanations into a single pipeline. Built on top of CounterNet, RoCourseNet addresses the issue of data shifts by leveraging adversarial training to solve a tri-level robust recourse generation problem. CounterNet and RoCourseNet outperform state-of-the-art baselines by achieving high (robust) CF validity and low proximity scores.
More than 300,000 farmers have committed suicide in India since 1995. Infact, this problem is not specific to India, it is quite common in many developing countries. This happens primarily because of financial hardships faced due to inability to grow crops successfully, and an inability to sell crops successfully (at a profit). In this project, our goal is to develop an easy-to-use AI based decision support system for poor illiterate farmers which can give them data-driven recommendations at multiple stages of their crop-growing lifecycle. For example, when should they plant their crops? When should they irrigate their farm, is their farm at-risk of pest invasion? After the crop is harvested, when and where should they sell their crops for maximum profit? From a technical perspective, problems in this space translate to spatio-temporal machine learning problems because of an abundance of remote-sensed and crowdsourced data that we have access to (due to our collaborators help at PlantVillage).
We focus on three problems faced by hard to reach populations (such as homeless youth). We develop AI/ML algorithmic interventions for (i) HIV (and STI) prevention among homeless youth; (ii) substance abuse prevention among homeless youth; and (iii) suicide prevention among homeless youth. All three projects focus on the study of diffusion processes in friendship-based social networks of homeless youth, and how these processes can be harnessed in order to achieve desirable behavior. On a humanitarian level, the end goal of this project is to demonstrably reduce the suffering of disadvantaged populations by influencing and inducing behavior change in homeless youth populations that drives them towards safer practices, such as regular HIV testing, lesser substance abuse, etc. On a scientific level, the goal is not only to model these influence spread phenomena, but to also develop decision support systems (and the necessary tools/algorithms/mechanisms) using which algorithmic interventions can be conducted in social networks of homeless youth in the most efficient manner. Our primary focus in this project is to develop algorithms and tools which are actually usable and deployable in the real world, i.e., algorithms which can actually benefit society for good. In fact, we strive to validate all our models, algorithms and techniques in the real world by testing it out with actual homeless youth (specifically youth in Los Angeles). Over the past seven years, we have been collaborating with social workers from Safe Place for Youth (SPY) and My Friend’s Place (homeless shelters in Los Angeles) to conduct pilot deployment studies of our algorithms with actual homeless youth.
COVID-19 is the greatest public health crisis that the world has experienced in the last century. Tackling it requires the collective will of experts from a variety of disciplines. While a lot of efforts have been made by AI researchers in developing agent-based models for simulating the transmission of COVID-19, we believe that AI's enormous potential can (and should) be leveraged to design decision support systems (e.g., in the allocation of limited healthcare resources such as testing kits) which can assist epidemiologists and policy makers in their fight against this pandemic. In particular, COVID-19 testing kits are extremely limited especially in developing countries. Therefore, it is very important to utilize these limited testing resources in the most effective manner. In this project, we research adaptive testing policies to optimally mitigate the COVID-19 epidemic in low-resource developing countries like Panama. Our work is informed through multiple discussions with epidemiologists.
Security is a critical concern around the world, whether it is the challenge of protecting ports, airports and other critical national infrastructure, or protecting wildlife/forests and fisheries, or suppressing crime in urban areas. In many of these cases, limited security resources prevent full security coverage at all times. Instead, these limited resources must be allocated and scheduled efficiently, avoiding predictability, while simultaneously taking into account an adversary’s response to the security coverage, the adversary’s preferences and potential uncertainty over such preferences and capabilities. Computational game theory can help us build decision-aids for such efficient security resource allocation problems.