![]() | "Cubum autem in duos cubos, aut quadrato-quadratum in duos quadrato-quadratos, et generaliter nullam in infinitum ultra quadratum potestatem in duos eiusdem nominis fas est dividere cuius rei demonstrationem mirabilem sane detexi. Hanc marginis exiguitas non caperet" ---Pierre de Fermat |

I enjoy reading and studying Latin, particularly its Ecclesastical form. I wrote a little program to help me study the vulgate, which you can play with here. (It parses the vulgate and the Douay-Rheims translation, so that you can read the verses in alternating Latin/English, while also providing access to a morphological analysis tool on each word---so that you can determine which inflected form is being used). Have fun! :)
If I had all the free time in the world, I'd also be working on the Free Learn Conversational Latin Project, a project I started to create freely available audio lessons for conversational Latin (and the tools you'd need to create such recordings/materials for other languages).
You might also enjoy these useful latin links.
| [1] |
J. Scott Olsson.
Combining speech retrieval results with generalized additive models.
In ACL '08, 2008.
[ bib ]
Rapid and inexpensive techniques for automatic transcription of speech have the potential to dramatically expand the types of content to which information retrieval techniques can be productively applied, but limitations in accuracy and robustness must be overcome before that promise can be fully realized. Combining retrieval results from systems built on various errorful representations of the same collection offers some potential to address these challenges. This paper explores that potential by applying Generalized Additive Models to optimize the combination of ranked retrieval results obtained using transcripts produced automatically for the same spoken content by substantially different recognition systems. Topic-averaged retrieval effectiveness better than any previously reported for the same collection was obtained, and even larger gains are apparent when using an alternative measure emphasizing results on the most difficult topics.
|
| [2] |
J. Scott Olsson, Jonathan Wintrode, and Matthew Lee.
Fast Unconstrained Audio Search in Numerous Human Languages.
In ICASSP'07: Proceedings of the IEEE International Conference
on Acoustics, Speech, and Signal Processing, 2007.
[ bib |
.ps.gz |
.pdf ]
We present a system to index and search conversational speech using a scoring heuristic on the expected posterior counts of phone n-grams in recognition lattices. We report significant improvements in retrieval effectiveness on five human languages over a strong 1-best baseline. The method is shown to improve the utility (mean average precision) of the retrieved lattices' rank order and to do so with a search cost negligible compared to the fastest yet known methods for the linear scanning of phonetic lattices.
|
| [3] | J. Scott Olsson. Combining Evidence for Improved Speech Retrieval. In NAACL-HLT '07: Doctoral Consortium and Poster Proceedings, 2007. [ bib ] |
| [4] |
J. Scott Olsson and Douglas W. Oard.
Improving Text Classification for Oral History Archives with
Temporal Domain Knowledge.
In SIGIR '07: Proceedings of the 30th annual international ACM
SIGIR conference on Research and development in information retrieval,
2007.
[ bib |
.pdf slides |
.ps.gz |
.pdf ]
This paper describes two new techniques for increasing the accuracy of topic label assignment to conversational speech from oral history interviews using supervised machine learning in conjunction with automatic speech recognition. The first, time-shifted classification, leverages local sequence information from the order in which the story is told. The second, temporal label weighting, takes the complementary perspective by using the position within an interview to bias label assignment probabilities. These methods, when used in combination, yield between 6% and 15% relative improvements in classification accuracy using a clipped R-precision measure that models the utility of label sets as segment summaries in interactive speech retrieval applications.
|
| [5] |
J. Scott Olsson.
Improved measures for predicting the usefulness of recognition
lattices in ranked utterance retrieval.
In SIGIR '07: Workshop on Searching Spontaneous Conversational
Speech, 2007.
[ bib |
.pdf slides |
.ps.gz |
.pdf ]
We consider the problem of evaluating automatic speech recognition lattices to predict their usefulness in speech retrieval applications. In particular, we focus on ranking utterances by our confidence that they contain a query term. Our purpose is to close the gap between recognition efforts, which have traditionally focused on producing one-best transcripts, and recent retrieval systems, which may utilize multiple transcript hypotheses in indexing and search. We present a simple framework for comparing the ability of two measures to predict how well a system can retrieve a matching lattice. In a comparison with the traditional measure, simple accuracy (or word error rate), we show with statistical significance that two new measures are superior at predicting a vocabulary independent utterance retrieval system's rank ordering of speech utterances.
|
| [6] |
J. Scott Olsson and Douglas W. Oard.
Combining feature selectors for text classification.
In CIKM '06: Proceedings of the 15th ACM international
conference on Information and knowledge management, pages 798-799, New
York, NY, USA, 2006. ACM Press.
[ bib |
DOI |
.ps.gz |
.pdf ]
We introduce several methods of combining feature selectors for text classification. Results from a large investigation of these combinations are summarized. Easily constructed combinations of feature selectors are shown to improve peak R-precision and F1 at statistically significant levels.
|
| [7] |
J. Scott Olsson.
An Analysis of the Coupling between Training Set and Neighborhood
Sizes for the kNN Classifier.
In SIGIR '06: Proceedings of the 29th annual international ACM
SIGIR conference on Research and development in information retrieval, New
York, NY, USA, 2006. ACM Press.
[ bib |
.ps.gz |
.pdf ]
We consider the relationship between training set size and the parameter k for the k-Nearest Neighbors (kNN) classifier. When few examples are available, we observe that accuracy is sensitive to k and that best k tends to increase with training size. We explore the subsequent risk that k tuned on partitions will be suboptimal after aggregation and re-training. This risk is found to be most severe when little data is available. For larger training sizes, accuracy becomes increasingly stable with respect to k and the risk decreases.
|
| [8] |
J. Scott Olsson.
Exploring Feature Selection for Multi-Label Text Classification
using Ranked Retrieval Measures.
In UMD Technical Report: LAMP-TR-134, UMIACS-TR-2006-41,
CS-TR-4824, 2006.
[ bib |
.ps.gz |
.pdf ]
While most classifier studies have focused on set-based evaluation measures, multi-label classification techniques that rank alternatives and then apply a threshold to make binary decisions can also be evaluated before thresholding. This can be done using well understood measures from ranked retrieval (R-precision in this case). Rank-based evaluation is first motivated by using a simple simulation to show that thresholding can introduce effects which obscure differences in the rank ordering of topics. The use of ranked retrieval measures for formative evaluation of multi-label classification is then demonstrated by exploring some techniques for combining evidence of term utility to improve feature selection for k-Nearest-Neighbor text classification. Because this ranked list evaluation framework greatly reduces the computational cost per method variant explored, we are able to investigate a large number of feature selection possibilities. Easily constructed combinations were found that proved to be more robust across a range of feature set sizes and that yielded statistically significant improvements in peak R-precision and microaveraged F1 (a commonly reported set-based measure).
|
| [9] | J. Scott Olsson, Douglas W. Oard, and Jan Hajic. Cross-language text classification. In SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 645-646, New York, NY, USA, 2005. ACM Press. [ bib | DOI | .ps.gz | .pdf ] |
This file has been generated by bibtex2html 1.85.
|
|