On January 25th, 2019, Iowa’s Innovation, Business and Law Center hosted University of Iowa computer science Professor Padmini Srinivasan for an informative presentation as part of the IBL Technology for Lawyers Series. Professor Srinivasan’s presentation, titled “Text Mining: A Field of Opportunities and Challenges,” examined the process and challenges of using automated algorithms to collect the text from numerous digital sources (social media posts, emails, technical writings, etc.) and autonomously parse them. Professor Srinivasan also explored the application of this text mining to numerous different domains.

            Professor Srinivasan began by providing a high-level conceptual view of a “web crawler”—a program that uses the structure of the web to move from webpage to webpage, at a pace of thousands of pages in minutes. This system is a very powerful tool that has assisted Professor Srinivasan in some of her earlier projects.

            Professor Srinivasan described her use of this technology and other text mining tools in researching the biomedical field. One project she described was the use of automated text processing in the analysis of patient records and antimicrobial prescription information. Using this data, Professor Srinivasan and her team ascertained the de facto norms of prescribing antimicrobial medication. Over the course of this project, Professor Srinivasan and her team were able to produce a model that would predict whether or not a patient would be subscribed antimicrobial medication.

            Another example of Professor Srinivasan’s work was text mining of companies’ financial information, in an attempt to predict the fiscal future of those company. By text mining six years of data, from the 10-K filings of over 1300 companies, Professor Srinivasan’s team produced a model that predicted the fiscal outcome of the companies more effectively than a baseline of analyst recommendations.

            Professor Srinivasan’s work also encompasses text mining social media. For instance, Professor Srinivasan has worked to develop text mining programs capable of examining social media (specifically, Twitter) and attempting to gauge the overall life satisfaction/dissatisfaction of the platform’s users, based on the linguistics of the users’ tweets. Another project employing automated text mining studied the positivity/negativity of given social media topics at given times (such as when there are substantially more positive tweets on a topic than negative tweets). Using a text mining and language-based data analysis of social media posts, Professor Srinivasan analyzed the public’s perception of Barack Obama and Mitt Romney during the 2012 presidential election on particular voter issues (such as the social media participants’ view of the honesty of the candidates).

            To provide deeper context, Professor Srinivasan also outlined many of the challenges facing the field. One challenge is automated programs’ capacity to analyze whether text truly means what it says or if it is instead merely ironic or sarcastic (such as “I am so glad that it is -35 degrees today”). Another challenge is ensuring that the program is capable of determining the meaning of a text, in light of the expansive synonyms in the English language (“I am so happy/glad/thankful/delighted/blessed/etc. for X”). The field of text mining is working to overcome these challenges, as they can be an important factor in successful textual data analysis.

            In her conclusion, Professor Srinivasan described the broad and expansive capabilities that text mining may bring and outlined its potential application to various other fields. As one example, text mining may have applications in the legal field, allowing for “tracing” the key terms and legal concepts through large numbers of cases. An example of this is the groundbreaking work of the College of Law’s Professor Lea VanderVelde, who is working on a major project involving text mining of antebellum-era texts. This project involves text mining the United States Territorial Papers and travel diaries written in the early 1800s in order to analyze the legal and economic mechanisms of the American frontier. With this knowledge, Professor VanderVelde will contribute a greater understanding of the expansion of empires and the American identity.