Mihai Lintean, Ph.D
Research Scientist /
Full Stack Developer

Carney Labs LLC
Alexandria, VA

Research Interests

The overall topic of my research is in Natural Language Processing (NLP) with applications in dialog based, intelligent tutoring systems, and a particular focus on statistical analysis of texts, text-to-text semantic similarity assessment, machine learning and automatic question generation. I am also interested in related fields such as Human-Computer Interraction, Information Retrieval, Data Mining and Artificial Intelligence in general.

In my graduate research work at the University of Memphis I've been involved in several NLP related problems:

I. Detection of Functional Tags from parsed PennTreeBank syntactic trees. Functional tags can be very useful in some common NLP tasks such as Summarization or Question Answering.

II. Natural language understanding through semantic similarity assessment. My main research work had been addressing the problem of language understanding using a text-to-text similarity approach. Some notable outcomes of my work on semantic similarity are: with being the topic of my doctoral dissertation ("Measuring Semantic Similarity: Representations and Methods") and a number of journal and conference papers

III. Natural Language Processing in Intelligent Tutoring Systems. At The University of Memphis' Institute of Intelligent Systems (IIS), I collaborated closely with researchers from various departments (i.e. computer science, mathematics, psychology, linguistics) in developing cutting-edge dialogue-based tutoring systems. Specifically, I have worked on two projects, MetaTutor, a multi-agent, adaptive system that trains high school and college students' use of self-regulatory processes in the context of learning about complex science topics, such as human body systems, and DeepTutor, which aims at developing an advanced dialogue-based tutoring system that fosters student's deep understanding of complex science topics through quality interaction and instruction

IV. Question Generation, as the task of automatically generating questions from various inputs such as raw text, database, or semantic representation. Question generation can prove to be an important language processing component in various contexts pertaining to advanced learning technologies, dialogue systems, automated assessment or search interfaces. Other specific problems I've been working on:

1. Detection of biomedical entities in biomedical articles, using different statistical prediction models (mainly Bayesian models, and decision trees)

2. Extraction of statistical information characteristic to usage of words in English language (IDF weighting), from the Wikipedia collection of documents.

3. Creation of a data set with pairs of question-answer extracted from YahooAnswers.com with the purpose of creating a data set to be used in a shared task on future QuestionGeneration workshops.

4. Automatic Detection of Student Mental Models During Prior Knowledge Activation in MetaTutor