Publications

Listed in reverse chronological order.

The Value of Bad Recommendations – Injecting Orthogonality in Product Recommendations and Learning Over Time. MIT. Advisor: Dan Ariely. November 2000 pdf

Perhaps the most exciting aspect of electronic commerce is the potential ability to learn the preferences of individual customers and make tailored recommendations to them. When making such recommendations the recommendation system is facing a fundamental dilemma. Should it provide the best recommendation based on its current state of understanding of the customer or should it try to learn more about the customer in order to provide higher potential payoffs in the future? This dilemma is at the heart of the current work. The dilemma facing a recommendation system is presented conceptually, and an approach for ideal learning is proposed and tested. In order to test our hypothesis, we modified one commercially available recommendation engine to consider measures of novelty in an initial learning phase. We analyzed results from the normal and modified engine for different datasets and characteristics of customers.

Modeling Behavior with Personalities. With M. Mezini and Karl Lieberherr. Proceedings of SEKE '99, Kaiserslauten, Germany, June 1999 pdf

Decoupling behavior modeling from a specific inheritance hierarchy has become one of the challenges for object-oriented software engineering. The goal is to encapsulate behavior on its own, and yet be able to freely apply it to a given class structure. We claim that standard object-oriented languages do not directly address this problem and propose the concept of Personalities as a design and programming artifice to model stand alone behavior. Allowing behavior to stand alone enables its reuse in different places in an inheritance hierarchy

A Framework for a Rule-Based Form Validation Engine. Proceedings of ISAS '99. Orlando, Florida, July 1999 pdf

Automatic form validation enables telecommunication carriers to process incoming service order requests more effectively. Validation rules, however, can be nontrivial to test and ultimately depend on the carrier's internal software systems. Traditionally, these validation checks are spread throughout an application's source code, which makes maintaining and evolving the system a very complex task. Our approach to solving this problem involves decoupling the rules, giving them a simple, easy-to-understand representation, and creating an engine to apply these rules to incoming forms automatically. This paper presents our approach in detail, explains its parallelism and briefly presents the differences with other common rule-based engines.

Designing and Programming with Personalities. Northeastern University. Advisor: Karl Lieberherr. November 1998 pdf ppt

Decoupling behavior modeling from a specific inheritance hierarchy is one of the challenges for object-oriented software engineering. The goal is to encapsulate behavior on its own, and yet be able to freely apply it to a given class structure. We claim that standard object-oriented languages do not directly address this problem and propose the concept of Personalities as a design and programming artifice to model stand alone behavior that embodies what we have termed micro-framework style of programming. Allowing behavior to stand alone enables its reuse in different places in an inheritance hierarchy. The micro-framework style ensures that the semantics are preserved during reuse. Furthermore, we show how Personalities can help solve the problem of object migration and how they can easily integrate with frameworks. We present two different Personalities implementations by extending the Java Programming Language.

Prediction of OCR Accuracy Using Simple Image Features. With J. Kanai and T. Nartker. Proceedings of ICDAR '95. Montreal, Canada, 1995 pdf

A classifier for predicting the character accuracy achieved by any Optical Character Recognition (OCR) system on a given page is presented. This classifier is based on measuring the amount of white speckle, the amount of character fragments, and overall size information in the page. No output from the OCR system is used. The given page is classified as either "good" quality (i.e. high OCR accuracy expected) or "poor" (i.e. low OCR accuracy expected). Results of processing 639 pages show a recognition rate of approximately 85%. This performance compares favorably with the ideal-case performance of a prediction method based upon the number of reject-markers in OCR generated text.

Prediction of OCR Accuracy. With J. Kanai, T. Nartker, and J. Gonzalez. Symposium on Document Analysis and Information Retrieval (SDAIR). Las Vegas, Nevada, April 1995 pdf

The accuracy of all contemporary OCR technologies varies drastically as a function of input image quality. Given high quality images, many devices consistently deliver output text in excess of 99% correct. For low quality images, even images which are easily read by a human, output accuracy is frequently below 90%. This extreme sensitivity to quality is well known in the document analysis field and is the subject of much current research. In this ongoing project, we have been interested in developing measures of image quality. We are specially interested in learning to predict OCR accuracy using some combination of image quality measures independent of OCR devices themselves. Reliable algorithms for measuring print quality and predicting OCR accuracy would be valuable in several ways. First, in large scale document conversion operations, they could be used to automatically filter out pages that are more economically recovered via manual entry. Second, they might be employed iteratively as part of an adaptive image enhancement system. At the same time, studies into the nature of image quality can contribute to our overall understanding of the effect of noise on algorithms for classification. In this paper, we propose a prediction technique based upon measuring features associated with degraded characters. In order to limit the scope of the research, the following assumptions are made: (a) Pages are printed in black and white (no color); (b) page images have been segmented, and text regions have been correctly identified. The image-based prediction system extracts information from text regions only. This prediction system simply classifies the input images as either good (i.e. high accuracy expected) or poor (i.e. low accuracy expected).

Evaluation of Page Quality using a Simple Feature Classifier. University of Nevada. Advisor: Junichi Kanai. November 1994 pdf

A classifier to determine page quality from an Optical Character Recognition (OCR) perspective is developed. It classifies a given page image as either "good" (i.e., high OCR accuracy is expected) or "bad" (i.e., low OCR accuracy expected). The classifier is based upon measuring the amount of white speckle, the amount of broken pieces, and the overall size information in the page. Two different sets of test data were used to evaluate the classifier: the Sample 2 dataset containing 439 pages and the Magazines dataset containing 200 pages. The classifier recognized 85%of the pages in the Sample 2 correctly. However, approximately 40%of the low quality pages were misclassified as "good." To solve this problem, the classifier was modified to reject pages containing tables or less than 200 connected components. The modified classifier rejected 41%of the pages, correctly recognized 86%of the remaining pages, and did not misclassify any low quality page as "good". Similarly, it recognized 86.5%of the pages in the Magazine dataset correctly and did not misclassify any low quality page as "good" without any rejections.

Generation, Representation, and Visualization of Spatial Information. Catholic University of Cordoba. With D. Vinas. Advisor: Juan Ronda. September 1992. pdf

This work presents our experience when designing a complete low-end 3D visualization system for objects reconstructed from CAT data. Methods and ideas to: acquire data, digitally process CT images, isolate relevant information from the pictures, represent the 3D object and display it in a computer screen are presented. This work presents our findings in the following areas: (1) Data Acquisition: several suitable ways to collect data from CT scanners are discussed. (2) 2D Image Representation: a file format for 2D images along with the problems associated with the different acquisition systems are explained. The solution to some of these problems is presented (image registration). A methodology to create a virtual 3D data space and locate the CT slices there is introduced. Suggestions to make the system completely automatic are also described. (3) 2D Image Digital Processing: Different issues related to noise filtering are studied. Mean and median filters are presented. Suggestion about the best combination of filters for a specific human part are also given. Some aspects of colorimetry, and its importance in the filtering process are also explained. (4) Object Isolation (Identification): The issues associated with detecting the object of interest in the pictures are described. Our experience recognizing bone tissue is presented. (5) Object Representation: Several approaches (boundary/surface as well as cell enumeration) are introduced. The leveled directed contours and triangular patches representation is presented and the triangulation algorithm is fully described. Some considerations about data compression and other representations are also given. (6) Object Visualization: Our results with a back-to-front display algorithm are presented. Procedures to display and render the different object representations are introduced. (7) Conclusions: a list of pointers/ideas to continue this work.