NLP: The IR Perspective - Final Search Engine project


Lectures Search Engine: about this project
 Query:    
 Stemmer:  Stopwords:  Matcher:

What is this
This search engine was implemented as a final project for the course "NLP: The IR Perspective". The data indexed are the lecture summaries written by different student groups from the course in plain-text format.
Our search engine uses an inverse-index table as its document collection. This collection is used to index new documents and to perform queries.
Our re-ranking is based on LF-IDF.
The documents are broken into terms via a number of techniques (stemming and stopwords).
We've created collections for all possible supported configurations of stemmers and stopwords.
Thus, you can run a query with all supported stemmers and with/without ignoring stopwords. This capability gives insight into the differences between these tokenization techniques.

Supported stemmers:

We've found that porter stemmer best performs among these stemmers, but try for yourself.

Stopwords
We've used Verity's stopword list (here).
We've created collections with and without stopwords so you can view the difference it makes on the results.

Source code
Due to many requests I'm placing the source-code for this search-engine online.
It compiles both under Windows (using Visual Studio) and under Linux (Makefile).

  Search Engine project source-code (500KB)


Back to my homepage