Lectures Search Engine: | about this project | |
Query: | ||
Stemmer: Stopwords: Matcher: |
What is this
This search engine was implemented as a final project for the
course "NLP: The
IR Perspective". The data indexed are the lecture summaries
written by different student groups from the course in plain-text
format.
Our search engine uses an inverse-index table as its document
collection. This collection is used to index new documents and to
perform queries.
Our re-ranking is based on LF-IDF.
The documents are broken into terms via a number of techniques
(stemming and stopwords).
We've created collections for all possible supported configurations
of stemmers and stopwords.
Thus, you can run a query with all supported stemmers and
with/without ignoring stopwords. This capability gives insight into
the differences between these tokenization techniques.
Stopwords
We've used Verity's stopword list (here).
We've created collections with and without stopwords so you can
view the difference it makes on the results.
Source code
Due to many requests I'm placing the source-code for this
search-engine online.
It compiles both under Windows (using Visual Studio) and under
Linux (Makefile).
Search Engine project source-code (500KB)