The Use of Information Retrieval for a Searchable Database of Audio Teachings
Location
CSU 253/4/5
Start Date
4-4-2011 1:30 PM
End Date
4-4-2011 3:00 PM
Student's Major
Computer Information Science
Student's College
Science, Engineering and Technology
Mentor's Name
Rebecca Bates
Mentor's Department
Integrated Engineering
Mentor's College
Science, Engineering and Technology
Description
Traditional search engines match key words or phrases based on word frequency, location, similarity, and update-time. The ranking of the results based on these criteria does not necessarily reflect the trends of a user‘s interest. Latent semantic indexing assigns each word contained in a series of documents a column in a matrix corresponding to rows organized by document. Based upon the frequencies represented in the matrix, search results can then be ranked to more accurately match the user‘s query. In this work, a database of audio recordings was created for a small, non-profit organization and augmented with a simple tagging and search engine in order to provide a framework for semantic indexing and problem domain specific search. Many of the audio files had brief summaries of the audio with some keywords, while others did not. The goal was more accurate search results using semantic indexing. This approach, implemented on a desktop system to ensure data privacy, provides better relatedness and faster retrieval of the desired files and demonstrates how a small organization can benefit from easy search and indexing of their large audio library. Several open- source APIs and packages were used to implement the project. A Java-based implementation of Lucene was used as a search engine. Apache Derby was the database used to contain the words extracted from the files and Apache POI was used to extract relevant words from keyword files. Usability testing was performed.
Future work includes comparison with other search techniques to evaluate performance.
The Use of Information Retrieval for a Searchable Database of Audio Teachings
CSU 253/4/5
Traditional search engines match key words or phrases based on word frequency, location, similarity, and update-time. The ranking of the results based on these criteria does not necessarily reflect the trends of a user‘s interest. Latent semantic indexing assigns each word contained in a series of documents a column in a matrix corresponding to rows organized by document. Based upon the frequencies represented in the matrix, search results can then be ranked to more accurately match the user‘s query. In this work, a database of audio recordings was created for a small, non-profit organization and augmented with a simple tagging and search engine in order to provide a framework for semantic indexing and problem domain specific search. Many of the audio files had brief summaries of the audio with some keywords, while others did not. The goal was more accurate search results using semantic indexing. This approach, implemented on a desktop system to ensure data privacy, provides better relatedness and faster retrieval of the desired files and demonstrates how a small organization can benefit from easy search and indexing of their large audio library. Several open- source APIs and packages were used to implement the project. A Java-based implementation of Lucene was used as a search engine. Apache Derby was the database used to contain the words extracted from the files and Apache POI was used to extract relevant words from keyword files. Usability testing was performed.
Future work includes comparison with other search techniques to evaluate performance.
Recommended Citation
Ologunde, Adedayo. "The Use of Information Retrieval for a Searchable Database of Audio Teachings." Undergraduate Research Symposium, Mankato, MN, April 4, 2011.
https://cornerstone.lib.mnsu.edu/urs/2011/poster-session-C/17