The Use of Information Retrieval for a Searchable Database of Audio Teachings

Location

CSU 253/4/5

Start Date

4-4-2011 1:30 PM

End Date

4-4-2011 3:00 PM

Student's Major

Computer Information Science

Student's College

Science, Engineering and Technology

Mentor's Name

Rebecca Bates

Mentor's Department

Integrated Engineering

Mentor's College

Science, Engineering and Technology

Description

Traditional search engines match key words or phrases based on word frequency, location, similarity, and update-time. The ranking of the results based on these criteria does not necessarily reflect the trends of a user‘s interest. Latent semantic indexing assigns each word contained in a series of documents a column in a matrix corresponding to rows organized by document. Based upon the frequencies represented in the matrix, search results can then be ranked to more accurately match the user‘s query. In this work, a database of audio recordings was created for a small, non-profit organization and augmented with a simple tagging and search engine in order to provide a framework for semantic indexing and problem domain specific search. Many of the audio files had brief summaries of the audio with some keywords, while others did not. The goal was more accurate search results using semantic indexing. This approach, implemented on a desktop system to ensure data privacy, provides better relatedness and faster retrieval of the desired files and demonstrates how a small organization can benefit from easy search and indexing of their large audio library. Several open- source APIs and packages were used to implement the project. A Java-based implementation of Lucene was used as a search engine. Apache Derby was the database used to contain the words extracted from the files and Apache POI was used to extract relevant words from keyword files. Usability testing was performed.

Future work includes comparison with other search techniques to evaluate performance.

This document is currently not available here.

Share

COinS
 
Apr 4th, 1:30 PM Apr 4th, 3:00 PM

The Use of Information Retrieval for a Searchable Database of Audio Teachings

CSU 253/4/5

Traditional search engines match key words or phrases based on word frequency, location, similarity, and update-time. The ranking of the results based on these criteria does not necessarily reflect the trends of a user‘s interest. Latent semantic indexing assigns each word contained in a series of documents a column in a matrix corresponding to rows organized by document. Based upon the frequencies represented in the matrix, search results can then be ranked to more accurately match the user‘s query. In this work, a database of audio recordings was created for a small, non-profit organization and augmented with a simple tagging and search engine in order to provide a framework for semantic indexing and problem domain specific search. Many of the audio files had brief summaries of the audio with some keywords, while others did not. The goal was more accurate search results using semantic indexing. This approach, implemented on a desktop system to ensure data privacy, provides better relatedness and faster retrieval of the desired files and demonstrates how a small organization can benefit from easy search and indexing of their large audio library. Several open- source APIs and packages were used to implement the project. A Java-based implementation of Lucene was used as a search engine. Apache Derby was the database used to contain the words extracted from the files and Apache POI was used to extract relevant words from keyword files. Usability testing was performed.

Future work includes comparison with other search techniques to evaluate performance.

Recommended Citation

Ologunde, Adedayo. "The Use of Information Retrieval for a Searchable Database of Audio Teachings." Undergraduate Research Symposium, Mankato, MN, April 4, 2011.
https://cornerstone.lib.mnsu.edu/urs/2011/poster-session-C/17