Free Essay

Ontology Based Web Searching Mechanism for Information Retrieval

In:

Submitted By chamilm
Words 5464
Pages 22
1

Ontology Based Web Searching Mechanism for
Information Retrieval
W.A.C.M. Wickrama Arachchi & K.L. Jayarathne
University of Colombo School of Computing, Sri Lanka chamil.madusanka@gmail.com & klj@ucsc.cmb.ac.lk

Abstract—The largest data repository, World Wide Web is being a popular research domain where many experiments carry on various types of search architectures. This paper explore the ability of applying concept to concept mapping to the search architecture that applied to a semantic model of given domain. This novel search architecture combines classical search techniques with ontological approach. This research presents effective mechanism to represent the result of meaningful web search. For the simplicity, the breast cancer domain has been used. Index Terms—ontology, semantic web, web search, Semantic
Search, concept, keyword extraction

I. I NTRODUCTION

T

HE World Wide Web has been grown up as tree which has spread its branches in all the areas. Thus it can be identified as the largest data repository in the world that presents key driving force for large scale of information technology. With the increase of the amount of content it has been difficult to build an interactive web search with traditional keyword search. The idea presented here is improve the searching process with information extracted from the semantic model of the domain. Ontology is the backbone of semantic web technologies.
One of the greatest problems of the traditional search engines is that typically they are based in keyword processing.
Because of the amount of information and the variety of tools used for searching, find information on the web is always more difficult. And also with the traditional web search, most probably search engines deliver results with including number of mismatched information. This causes low precision in search results in information retrieval. This can be influence to reduce the tendency of information retrieval through web searching.
Thus, this motivation factor influenced to provide ontology based web searching mechanism for particular domain.
The aim of this project is to solve the problem of reducing interaction between human and the web by getting mismatched information when people search on the web (Ordinary key word search). The main goal of this project to present effective mechanism to represent the results of a meaningful web search in particular domain. Currently for the simplicity, the breast cancer domain is used to implement this ontology based web searching system.
This is novel web searching mechanism that based on an ontology which means the proposed system is a semantic enabled web search mechanism. Here, the ontology represents the conceptual space for a particular domain. One of special

feature is that the same domain ontology is used in two different situations to get the best concept for particular set of keywords;
1) Best concept for user keywords
2) Best concept for a document from data collection
The data collection contains with HTML, PDF, MS WORD,
MS POWER POINT, TEXT documents. Here, we consider the multi concept approach which means we consider that each document can have more than one concept. This emerge the necessity of find the best concept of each document prior to the all process get started. Therefore the pre processing mechanism is introduced to this system. Pre-processing mechanism is applied to the data collection by performing keyword extraction, indexing, and gather best concept of each and every document. Finally, the concept to concept mapping component maps the pre-processed data collection that categorized in concept, with the best concept that represents the given keyword by user. The main achievement of this research, conclude in the concept of Concept to Concept mapping for provide more accurate result for the user.
This paper is organized as follows; Section 2 describes the previous works that related to the semantic web, ontology, keywords extraction. Section 3 illustrates the design architecture of the system and the main components of the system. Section
4 describes the implementation aspects of the research. Section
5 describes the evaluation of the system and finally section 6 will be the conclusion.
II. R ELATED W ORK
Augmenting traditional text search by using semantic technique is one of the directions that can be seen in early researches on semantic web enabled search. If consider few approaches that are able to classified into this category, in
2003 Guha et al. [10] proposed a Semantic Search interface of the TAP infrastructure. Here a very simple manner of augmenting traditional keyword search results has been taken.
In addition to that here, the traditional keyword search targeted at a document database while keywords are matched against concept labels in an RDF repository and return them to the located documents.
In Clever Search [16], it has provided the ability to select a particular meaning of a word in the WordNet ontology which resulting in the clarification text of that meaning being added to the search keywords via the Boolean AND operator. Here the implementations are differed mostly according to which

2

properties of the ontology are navigated and which terms are picked [14].
The significant difference of this category of approaches from other later approaches is that here ontological techniques have been used in a multitude ways in order to augment keyword search instead of usually deduce the bulk of the actual knowledge being required to be formally annotated.
Beside that some other researchers have tried to introduce hybrid approaches to semantic web searching. Rocha et al.
[15] presented a search architecture that combines classical search techniques with spread activation techniques applied to a semantic model. This algorithm proposed a way for locating additional information relevant to a query given a starting set that retrieve via text search. First they utilized the traditional text search in to document searching phase and then a process of RDF graph traversal is begun from the annotations of those documents. The traversal is done by using the spread activation algorithm. Prama Ontology [19] is a one of high-level, patient-centric ontology for traditional machine which will describe on existing domain ontologies and allow the integration of data throughout the drug development process. This is used for tailoring drugs with the main goal personalized medicine which means that right patient receives the right dose of right drug. Traditionally separate data sets from early drug discovery through to patients in the clinical setting be integrated, and presented, queried and analyzed collectively are required for such traditional medicine strategies. Ontologies can be used to drive such capabilities; however, at present few ontologies exist that bridge genomics, chemistry and medicine.
The Ontology-based Support for Human Disease Study paper [21] is providing a solution which incorporates a prototype for a Generic Human Disease Ontology that contains common general information regarding human diseases. Physicians and medical researchers are the main users who possess all advantages of the main contribution. And also this approach tries to contribute for study of complex disorders caused by many different factors simultaneously. The biomedical community is mainly focusing on researches on disorders, diagnosis, prognosis, treatments, and Biotechnology. This area has been a growing concern of biomedical research community. They have to use knowledge base in biomedicine. Previously, that knowledge base is only in experts brain. But now this situation is rapidly changing. Current experiments are providing huge amount of information and they are rapidly agglomerating in biomedical databases. This information sources are continually expanding their content and also area of biomedical research generates its own databases. So trusted databases are exists but their schemes are not documented for outsiders.
Sharing information inside the research area and between the different research areas is vital to biomedical community.
Most probably there must have combination among data from different sources to provide desired information to the user. When considering a research team, they examine one type of disorder which has one factor; a research team can examines it easily. But in a case of complex disorders, they have to consider different factors simultaneously. And also it might be combines and examines all genetic factors, all

environmental factors, all genetic factors together. That is the problem they had addressed. But they had not considered about environmental causes which more influence to human diseases.
In Keyword extraction from documents, similarity and categorization [24], it presents the different perspectives of document categorization, the document similarity, and keywords extraction also is approached. The index language is generated automatically from document corpus while every document can be represented as a vector. After documents are represented as vectors many text mining techniques (document clustering, document similarity analysis, and document categorization) are applied Similarity between these documents is measured by the cosine function value of the two vectors.
The index language is generated by completing phases; document tokenizing, eliminate stop-words, stemming word, item weighting, and feature term selection. There are two main approaches to automatic extraction of key phrases from text.
1) Syntactic analysis : useful sequences are identified based on lexical patterns 2) Statistical analysis : frequency counts are used to find word pairs
In two level learning hierarchy of concept based keyword extraction for tag recommendation [27], it presents a method to filter the candidate tag and rank them based on their occurrences in concept existing in the given resources. The multi concept approach which considers more than one concept for each resource is used in this system. The multiconcept approach improves the performance of single-concept approach. The most relevant concept is extracted by using multi-concept approach.
III. D ESIGN AND I MPLEMENTATION
The current system (OntoWS) is designed for support searching on breast cancer domain since mainly for the simplicity and partially cancer is the type of disease with highest death incidence rates in human. Cancer has become a major problem in health care. The early detection of several types of cancer is vital to improve its diagnosis. So the people awareness is the best solution for avoid from cancer.
Figure 1 illustrates the design architecture of the solution.
The user, who wants to search on breast cancer (currently), provides keywords according to their needs. But sometime the user might not be expressed actually what they want to search. This can be influence to reduce the tendency of information retrieval through web searching. However, the optimal keywords are extracted from users query by keyword extraction component. Those optimal keywords are mapped with semantic mapping component and produce a best concept for particular user keyword as the output.
In order to find the best suite documents for best concept that is output of semantic mapping component, the data collection is pre-processed that provides categorized data collection in concept. The keyword weighting mechanism is performed in between keyword extraction component and pre-processing component. Finally, the concept to concept mapping component maps the pre-processed data collection that categorized in concept, with the best concept that represents the given keyword by

3

user. The main achievement of this research, conclude in the concept of Concept to Concept mapping for provide more accurate result for the user.

Figure 1.

Design architecture of the system

A. Components of the system
Proposed system is designed with recent semantic web technologies. Ontology is heavily considered for resulting high throughput for users searching needs. This information retrieval model consists of an ontology based search engine.
Main components of the system are Keyword extraction component, Semantic mapping component, Indexing component,
Pre-processing component for data collection, concept to concept mapping component and cancer domain ontology.
1) Keyword extraction component: The keywords are significant words in a document that gives high-level description of the content. The keywords can be used as a measure of similarity for text categorization because they describe the main points of a text. So keywords are useful for scanning large amount of documents in short time. It gives shortest summary of the document.
There are two usage of keyword extraction component in this system. The first situation is occurred at getting input from the user. The user interface allows users to specify their searching idea in natural language. Every word that users given as the inputs cannot be considered as keywords. This component extracts all the keywords from users query and manipulates those keywords into optimal set of keywords by removing stop-words which are language specific functional words [24].
The other usage of this component is occurred in the pre-processing phase. In pre-processing phase, the keyword extraction component extracts keywords from each and every document and removes stop-keywords as done in the phase of keyword extraction from user inputs. Then words are converted into their stems which are relatively close to concepts in text.
The number of document terms can be reduced by using word stemming while expanding terms [24]. For each document in data collection, there are number of places to extract keywords like page title, page URL, HTML Meta data (only in HTML pages), and the page content.
In both two usages, the textual content in user query or in each document (HTML, PDF, WORD, TEXT, and PPT) cannot be straightly used for the concept mapping. Therefore,

that textual context should be tokenized, stemmed, filtered and converted into a keyword vector. In this approach uses the Word Vector Tool (WvTool) for keyword extraction since
WvTool is a flexible Java library for statistical language modeling and it is simple to use and simple to extend pure java library for creating word vector [28]. It is allow using textual data directly in various experiments which is called
Rapid-miner data mining.
Therefore, WvTool is used to process the textual context which is provided by the user in user query, and the textual context which is extracted from each documents in document collection at the pre-processing phase. The keyword extraction is categorized into six steps;
a) TextLoader: The text loader is responsible for opening a stream for the processed documents. This can be opened both from local files and URLs.
b) Decoder: This is used to convert encoded/wrapped text into plain text. This step has to be complete before vectorization. c) Tokenizer: The Tokenizer splits the whole textual context into individual units. All the non-characters are assumed as the separators and only letters are contained the resulting tokens by deciding whether a character is a letter or not using Unicode specification. There are two types of tokenizers; simple tokenizers and n-gram tokenizers.
d) WordFilter: In this step, the stop words are removed and get the filtered keyword list. Here, this approach uses the
Standard English stop word list that is came with the
WvTool. Separate file also can be specified as the stop word list.
e) Stemmer/Reducer: This step is used to map different grammatical forms of a word into a common term. This step avoids the divergences due to alternations of the same term. Here, Poter stemmer algorithm is used and there is a possibility to use additional own directory or WorldNet thesaurus. After the number of tokens are counted then the actual keyword vector is created
2) Breast cancer Domain Ontology: An ontology consists interconnected concepts (representational terms). Interrelationships among the concepts describe a target world. Breast cancer domain ontology component include the ontology that designed to represent the domain of breast cancer. Currently, it represents only the breast cancer domain because of the simplicity of the system. That means currently this system will be provided the needs of searcher who wants to search on breast cancer. By using, the concepts and the relationships of the breast cancer domain extracts information efficiently by traversing the ontology. This feature of efficient extraction of information through traversing the ontology facilitates representing the most appropriate concepts in the ontology. The standard breast cancer ontology from National Cancer Institute is used for this breast cancer domain ontology because that ontology is in standard way [25]. The Ontology Web language is used for writing the ontology. The domain ontology resides in semantic mapping component.
3) Semantic mapping component (conceptual space): The vocabulary expansion and extension of knowledge is done

4

by a visual strategy called Semantic mapping which uses the strategy of displaying in categories words related to one another. In this approach, the semantic mapping component is used to map the user requested objective with the domain ontology.
There are two usage of semantic mapping component in this system. The first situation is occurred at mapping optimal keywords which are output of the keywords extraction component, onto conceptual space and identifies the best matching concept hierarchy for representing the given keyword set. So this semantic mapping component is responsible for finding best concept from the domain ontology for given keywords.
The other usage of this component is come into operation in the pre-processing phase. In this phase, the semantic mapping component is used to categorize the data collection with concepts, by providing the best concept for each and every document in the data collection. Ultimate requirement of this each usages is that identifies the best matching concept hierarchy for representing the given keyword set.
Therefore, the extracted keywords are mapped onto the conceptual space in semantic mapping component.Each keyword in the keyword vector which is the output of the keyword extraction component is mapped with concepts by using the browser text of the concept. When the match is found, the hierarchy for the matching concept is stored in a vector. The number of matches is indicated at the end of the vector as the last element. This process is iteratively done for all the keywords in the vector.
Then the best region among the matched hierarchy is selected by using the best matching score. The region that has best matching score is selected as the best matching concept hierarchy for that particular keyword list and that concepts are sent to the concept to concept matching component for match the similarities with the best matching hierarchies of each document.
4) Pre-processing component for data collection: This component is responsible for the creation of concept vise categorized document collection. In order to find the best suite documents, the best concept that representing the given keyword set has to match with documents of the data collection.
But for each documents in the data collection can contains more than one concept. This emerge the necessity of find the best concept of each document prior to the all process get started. The pre-processing component has introduced to achieve that requirement. Within the pre-processing component, optimal weighted keywords are extracted first from each document through the keyword extraction component. Using those optimal weighted keywords, the semantic mapping component can be provided the best concept for particular document. This is applied for all documents in data collection. Furthermore this mechanism is also applied to all the newly added documents as well. Finally there will be a data collection which categorized in concept.
5) Indexing component: This component is responsible for indexing the document repository. The indexing component logically resides in between the keyword extraction component and pre-processing component. But it physically resides in

the keyword extraction component. The keywords which are extracted from each document removed stop-words and performed word stemming assigns a value to each keyword. That mechanism is called as keyword weighting. The weights of each keyword depend on importance, position and relevance of the keyword. For an example, keywords in document title, abstract and HTML metadata are more important than keywords in other places. The frequency of keyword is also considerable factor for keyword weighting. For example, if there are two keywords with different frequencies, the keyword with highest frequency is more significant than other [24]. When identifying the content and discriminating the documents, the frequency of the keyword should not only be considered but also number of documents in which the keyword occurs.
The content and the title of the document are extracted and indexed. We have used Apache Tika for the extraction. Apache
Lucene library is used for the document indexing [29]. Lucene has the inverted indexing algorithm for document indexing
[30]. Record-level and word-level are the two type of inverted indexing. The index analyzer is used to tokenize, normalize and remove the stop words. The index writer is used to index the set of documents and create the index files.
6) Concept to concept mapping component: The main achievement of this research, agglomerates in the concept of
Concept to Concept mapping for provides more accurate result for the user. In order to perform this component, the preprocessing component and the semantic mapping component must be come in to operation with giving their outputs. This component maps the similarities between the concept that represents the users query, and the concepts that represent the categorized data collection. In here, the matching hierarchy for particular keyword list matches the similarities with the result hierarchies of the pre-processing component and matches the URLs of documents which already categorized into the concepts. This mechanism will be increased the correctness, relevance of user requirement.
7) Browser Interface: The browser interface provides user to specify their searching requirements that they need to browse. Using Java Desktop Integration Components (JDIC)
[31], the browser-like interface was built with browser functionalities which are typically available in default browser.
The JDIC enables to integration of desktop/Java. It provides facilities and functionalities to Java applications, which are provided by the native desktop. It facilitates with various features; creating JNLP installer packages, embedding the native web browser, creating tray icons on the desktop, registering file type associations, launching the desktop applications and etc [31].
This approach is used the browser component from JDIC that provides browsing capabilities to developer through Java
API. The real standalone browser is embedded to the application rather than implement the browser functionalities in the Java code. The browser component is responsible for communication with the native web browser.
IV. E VALUATION
The prototype which is implemented according to the proposed design is going to test and evaluate in this chapter by

5

putting real web scenario. In any kind of information retrieval
(IR) system, the main goal is to gain a high precision which means the relevance of results that produced. Any kind of information retrieval process of an IR system is begun when a query which is natural statement of information needs is entered to the system by a user. A query can be matched to several set of outcomes in an IR system and it may consist of different degrees of relevancy. In most of IR systems, a numeric score can be computed for measure how well particular objects match the query and by using that score rank the particular object. In this system, we used precision recall method to evaluate the relevancy of results.
The main objective of this experiment was to verify the capability of proposed web searching mechanism produce the most suitable result according to the user queries. There is a main approach to evaluate the precision of IR of Ontology based web search system, which is compare the Ontology based Web searching mechanism against the keyword based web search system with precision-recall values.

TABLE I
AVERAGE P RECISION R ECALL VALUES OF O NTOWS AND K EYWORD
SEARCH

Recall
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1

Average Precision of OntoWS
100%
100%
100%
98%
85.24%
71.64%
61.85%
51.53%
43.72%
36.42%

Average Precision of Keyword Search
100%
61.93%
52.39%
32.57%
44.38%
36.34%
25.45%
8.63%
0%
0%

A. Test Sample and Procedure
The conventional way for evaluate an IR system is select a document collection which are having the keywords in user queries of more than thousand documents. But it is not feasible because of the time line. Therefore I selected about 150 HTML documents collection which includes 10 relevant documents for each user query. That selection is performed based on the occurrence of keywords in the document. Total number of queries which were selected for the evaluation is 10. The same document collection is used in both Ontology based web search mechanism and keyword based web search.
The testing procedure is listed below.
i) Selected document collection is indexed in Ontology based web searching mechanism according to the OntoWS indexing approach and in conventional keyword based web search according to their indexing approach. ii) Each selected 10 queries are submitted to the OntoWS and keyword based web search. iii) Obtained the searching results from OntoWS and conventional keyword search and computed the precision, recall values. iv) According to the results, drew the precision-recall curves.
B. Results Analysis
Selected ten queries were submitted to the OntoWS and conventional keyword search and measures the precision values for particular recall value separately for OntoWS and conventional keyword search. The comparison precision-recall values between OntoWS and conventional keyword search is conducted by using the average of precision value of OntoWS and precision value of keyword search. The comparison data are in I and the comparison between OntoWS and keyword search is plotted in Figure 2..
The variation of precision-recall curves of OntoWS and keyword search is shown in Figure 8. When considering each query separately, OntoWS provides high precision value than conventional keyword search for particular query. At the

Figure 2. Comparative precision curves OntoWS Vs Keyword Search

opening point, more relevant documents were retrieved in both searching approaches that apparent by the high precision values for retrieved documents in both of searching approaches produces. The average precision value of keyword search is lower than average precision value of OntoWS. There are some exceptional cases occurred when obtaining the precision value in keyword search that contradict the main conclusion. The larger sample is required to have a better result in IR systems.
Therefore, here also the reason to have those exceptional states is the size of the sample. According to the precision recall curves OntoWS has the efficiency and the correctness than conventional keyword search.
V. C ONCLUSION
This research was focused on addressing the problem of conventional keyword search that is influence to decrease the interaction between human and the web and present an effective mechanism to represent the results of a meaningful web search. The proposed solution was based on the conceptual space which is an ontology. This proposed solution was totally depending on some domain ontology; currently breast cancer ontology. At the implementation phase, there were many issues to be solved. Finding better domain ontology was the huge challenge to be solved. That requirement of conceptual space has been fulfilled by a standard ontology. I have faced huge challenge when selecting the domain ontology. The selected ontology at the beginning of the research was made many problems while carrying out the implementation and had to select domain ontology. But that was hugely influenced to proceeding of

6

this research. One of other problem was to selecting better
APIs for the implementation of OntoWS. There are various
APIs for ontology processing such as the Jena Ontology API,
OWL API and SOFA (Simple Ontology Framework API). The
OWL API was selected since it is support OWL 2 and has an efficient in-memory reference which is highly useful in a web searching approach. But an additional effort and time hand to be spent on understanding the API.
The proposed Ontology based Web searching Mechanism provides a new approach which is embedded semantic information retrieval, in information retrieval area. The efficiency of Ontology based Web searching approach is higher than the ordinary keyword searching. And also this approach is highly confided on concepts, properties, hierarchy, classes and relationships. Therefore, the evaluation phase for this semantic approach was conducted for the realization of performance of OntoWS against the conventional keyword search. Thus, the OntoWS was produced higher precision-recall values for retrieved document than the conventional keyword searching approach. VI. F UTURE W ORKS
Current ontology based web search mechanism that explore through this research proficient only for single domain. Thus we hope to expand this research in future as it can be applied to wide area of domains. Since an ontology has the feature of importing other ontologies, experimentation and evaluation can be extended to integrate several areas.
Nowadays, personalized web searching is a most prominent requirement of web searching area. Current solution was not facilitated by web personalization. Thus, we hope to expand this research in future as web personalization applied solution.
ACKNOWLEDGMENT
I would like thank all who gave me guidance and support throughout this research.
R EFERENCES
[1] Zhong, Ning, Liu, Jiming, Yao, and Yiyu, ”Web Intelligence, 2003.
[2] J. Hu, N. Zhong, S. Lu, H. Zhou, J. Huang, ”A Human-Web Interaction
Based Trust Model for Trustworthy Web Software Development,” wi-iat, vol. 1, pp.914-917, 2008 IEEE/WIC/ACM International Conference on
Web Intelligence and Intelligent Agent Technology, 2008.
[3] C. Mazzi, P. Ganguly, and M. Kidd, ”Healthcare Applications based on
Software Agents”, 2001.
[4] M.A.M. Nieto, ”An Overview of Ontologies”, 2003 Center for Research in Information and Automation Technologies, Interactive and Cooperative
Technologies Lab.
[5] A. Maedche, and S. Staab, ”Ontology Learning for the Semantic Web”,
2003
[6] P. Kataria, R. Juric, and S. Paurobally, K. Madani, ”Implementation of Ontology for Intelligent Hospital Wards”, 41st Hawaii International
Conference on System Sciences 2008.
[7] K. Decker, K. Sycara, and M. Williamson, ”Intelligent Adaptive Information Agents”.
[8] D.I. Moldovan, and R. Mihalcea, ”Using WordNet and Lexical Operators to Improve Internet Searches,” IEEE Internet Computing, vol. 4, no. 1, pp. 34-43, Jan./Feb. 2000.
[9] D. Buscaldi, P. Rosso, and E.S. Arnal, ”A wordnet-based query expansion method for geographical information retrieval.” in Working Notes for the
CLEF Workshop, 2005.

[10] R. Guha, R. McCool, and E. Miller, ”Semantic search,” in Proceedings of the 12th international conference on World Wide Web, pp. 700709,
ACM Press, 2003.
[11] J. Heflin, and J. Hendler, ”Searching the web with shoe,” in Artificial
Intelligence for Web Search, AAAI Workshop, WS-00-01.pp. 35-40,
AAAI Press, 2000.
[12] ”Darpa agent markup language.” http://www.daml.org/, November
21,2009
[13] ”Resource Description Framework,” http://www.w3.org/RDF/, November 21,2009
[14] ”Web Ontology Language,” http://www.w3.org/TR/owl-features/,
November 23,2009
[15] C. Rocha, D. Schwabe, and M.P. de Aragao, ”A hybrid approach for searching in the semantic web,” in Proceedings of the 13th international conference on World Wide Web, pp. 374383, 2004.
[16] P. M. Kruse, A. Naujoks, D. Roesner, and M. Kunze, ”Clever search:
A wordnet based wrapper for internet search engines,” January 2005.
[Online]. Available: http://arxiv.org/abs/cs/0501086
[17] D. Fensel, ”Ontologies: A Silver Bullet for Knowledge Management and
Electronic Commerce,” Springer, 2001.
[18] N. Guarino, ”Formal Ontology in Information Systems,” in N. Guarino
(ed.) Formal Ontology in Information Systems. Proceedings of FOIS98,
Trento, Italy, 6-8 June 1998. IOS Press, Amsterdam: pp. 3-15,June 1998.
[19] C. Denney, C. Batchelor, O. Bodenreider, S. Cheng, J. Hart, J. Hill,
J. Madden, M. Musen, E. Pichler, M. Samwald, S. Szalma, L. Schriml,
D. Sedlock, L. Soldatova, K. Sonoda, D. Statham, T. Whetze, E. Wu, and S. Stephens, ”Pharma Ontology: Creating a Patient-Centric Ontology for Translational Medicine,”, Available from Nature Precedings http://dx.doi.org/10.1038/npre.2009.3686.1 , 2009.
[20] A. Daniyal, S.R. Abidi, and S.S.R. Abidi,”Computerizing Clinical Pathways: Ontology-Based Modeling and Execution,” Medical Informatics in a United and Healthy Europe K.-P. Adlassnig et al, (Eds.) IOS Press,
2009
[21] M. Hadzic, and E. Chang, ”Ontology-based support for human disease study,” in the 38th Annual Hawaii International Conference on System
Sciences (HICSS’05), vol. 6, pp.143a, 2005
[22] ”Intro the Semantic Web”, http://www.slis.kent.edu/ mzeng/metadata/ semanticweb/intro-en.htm [23] ”What is an semantic model”, http://fbhalper.wordpress.com/2007/11/29/ whats-a-semantic-model-and-why-should-we-care/ [24] Huaizhong
,G.
Gardarin,
”Keywords
Extraction
Document
Similarity and Categorization,” http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.115.2389 [25] ”Semantic Web Reaserch Group”, http://www.mindswap.org/2003/
CancerOntology/
[26] ”Semantic Mapping”, http://literacy.kent.edu/eureka/strategies/ semanticmapping.pdf [27] H. Murfi and K. Obermayer, ”A Two-Level Learning Hierarchy of Concept Based Keyword Extraction for Tag Recommendations,” www.kde.cs.unikassel.de/ws/dc09/papers/paper17.pdf [28] ”The
Word
Vector
Tool”,
http://www-ai.cs.unidortmund.de/SOFTWARE/WVTOOL/doc/wvtool-1.0.pdf/, May 15,2010
[29] Apache Tika”, http://tika.apache.org, May 24,2010
[30] Apache Lucene”, http://en.wikipedia.org/wiki/Lucene, May 24,2010
[31] ”JDIC - JDesktop Integration Components”, https://jdic.dev.java.net,
March 21,2010

Similar Documents

Premium Essay

Doc, Docx, Pdf, Wps, Rtf, Odt

...Coping with Continuous Change in the Business Environment CHANDOS KNOWLEDGE MANAGEMENT SERIES Series Editor: Melinda Taylor (email: melindataylor@chandospublishing.com) Chandos’ new series of books are aimed at all those individuals interested in knowledge management. They have been specially commissioned to provide the reader with an authoritative view of current thinking. If you would like a full listing of current and forthcoming titles, please visit our web site www.chandospublishing.com or contact Hannah Grace-Williams on email info@chandospublishing.com or telephone number +44 (0) 1993 848726. New authors: we are always pleased to receive ideas for new titles; if you would like to write a book for Chandos, please contact Dr Glyn Jones on email gjones@chandospublishing.com or telephone number +44 (0) 1993 848726. Bulk orders: some organisations buy a number of copies of our books. If you are interested in doing this, we would be pleased to discuss a discount. Please contact Hannah Grace-Williams on email info@chandospublishing.com or telephone number +44 (0) 1993 848726. Coping with Continuous Change in the Business Environment Knowledge management and knowledge management technology ANTONIE BOTHA DERRICK KOURIE AND RETHA SNYMAN Chandos Publishing Oxford · England Chandos Publishing (Oxford) Limited TBAC Business Centre Avenue 4 Station Lane Witney Oxford OX28 4BN UK Tel: +44 (0) 1993 848726 Fax: +44 (0) 1865 884448 Email:...

Words: 69553 - Pages: 279

Free Essay

Correlation Based Dynamic Clustering and Hash Based Retrieval for Large Datasets

...Correlation Based Dynamic Clustering and Hash Based Retrieval for Large Datasets ABSTRACT Automated information retrieval systems are used to reduce the overload of document retrieval. There is a need to provide an efficient method for storage and retrieval .This project proposes the use of dynamic clustering mechanism for organizing and storing the dataset according to concept based clustering. Also hashing technique will be used to retrieve the data from the dataset based on the association rules .Related documents are grouped into same cluster by k-means clustering algorithm. From each cluster important sentences are extracted by concept matching and also based on sentence feature score. Experiments are carried to analyze the performance of the proposed work with the existing techniques considering scientific articles and news tracks as data set .From the analysis it is inferred that our proposed technique gives better enhancement for the documents related to scientific terms. Keywords Document clustering, concept extraction, K-means algorithm, hash-based indexing, performance evaluation 1. INTRODUCTION Now-a-days online submission of documents has increased widely, which means large amount of documents are accumulated for a particular domain dynamically. Information retrieval [1] is the process of searching information within the documents. An information retrieval process begins when a user enters a query; queries are formal statements of...

Words: 2233 - Pages: 9

Free Essay

Abcdef

...Improved Matchmaking Algorithm for Semantic Web Services Based on Bipartite Graph Matching Umesh Bellur, Roshan Kulkarni Kanwal Rekhi School of Information Technology, IIT Bombay umesh@it.iitb.ac.in, roshan@it.iitb.ac.in Abstract The ability to dynamically discover and invoke a Web Service is a critical aspect of Service Oriented Architectures. An important component of the discovery process is the matchmaking algorithm itself. In order to overcome the limitations of a syntax-based search, matchmaking algorithms based on semantic techniques have been proposed. Most of them are based on an algorithm originally proposed by M. Paolucci, et al. [21]. In this paper, we analyze this original algorithm and identify some correctness issues with it. We illustrate how these issues are an outcome of the greedy approach adopted by the algorithm. We propose a more exhaustive matchmaking algorithm, based on the concept of matching bipartite graphs, to overcome the problems faced with the original algorithm. We analyze the complexity of both the algorithms and present performance results which show that our algorithm performs as well as the original. 1 Introduction Loose Coupling is an important principle underlying Service Oriented Architectures. One aspect of loose coupling is the ability to invoke a service provider with little (or no) prior knowledge about it. The publish-find-bind architecture is intended to facilitate this process. Service providers create WSDL [9] descriptions...

Words: 5700 - Pages: 23

Premium Essay

Sppech

...Effective Keyword Extraction Method for Videos in Web Pages by Analyzing their Layout Structures Jongwon Lee Chungkang College 162 Chungkang-ro Majang-myun Ichon-si, Gyunggi-do 467-744, Korea Abstract- This paper proposes an effective keyword extraction method for the Web videos by analyzing the structure of the Web pages. The proposed scheme calculates the relative importance (or weights) of the text blocks to a video by analyzing the distances of the text blocks to the video. This distance, called the layout distance, indicates a degree of relevance of text block to video, and could be estimated by analyzing the layout structure of Web pages. Since the Web pages with several videos such as Web pages posting UCC videos have a special layout structure, this layout analysis helps to precisely estimate the relevance of text block to the video. This weight of text block is used to compute the final weights of keywords extracted from that text block by analyzing their HTML tags and other well-known techniques such as TF/IDF. Some experiments with 1,087 Web pages that have total 2,462 videos show that the precision of the proposed extraction scheme is 17% higher than ImageRover[1]. Giseok Choi, Juyeon Jang, and Jongho Nang Sogang University 1 Sinsu-dong Mapo-gu, Seoul 121-742, Korea weights are inverse proportional to the layout distances to the video, however, they are adjusted by reflecting the structural characteristics of Web pages with videos. After assigning the weights to...

Words: 4016 - Pages: 17

Free Essay

What Can Information Technology Do for Law?

...of Law & Technology Volume 21, Number 2 Spring 2008 WHAT CAN INFORMATION TECHNOLOGY DO FOR LAW? Johnathan Jenkins∗ TABLE OF CONTENTS I. INTRODUCTION ..............................................................................589 II. INCENTIVES FOR BETTER INTEGRATION OF INFORMATION TECHNOLOGY AND LAW ............................................................591 III. THE CURRENT STATE OF INFORMATION TECHNOLOGY IN LEGAL PRACTICE .......................................................................594 IV. THE DIRECTION OF LEGAL INFORMATICS: CURRENT RESEARCH .................................................................................597 A. Advances in Argumentation Models and Outcome Prediction ..............................................................................597 B. Machine Learning and Knowledge Discovery from Databases ..............................................................................600 C. Accessible, Structured Knowledge ...........................................602 V. INFORMATION TECHNOLOGY AND THE LEGAL PROFESSION: BARRIERS TO PROGRESS ......................................604 VI. CONCLUSION ..............................................................................607 I. INTRODUCTION MUCH CURRENT LEGAL WORK IS EMBARRASSINGLY, ABSURDLY, WASTEFUL. AI-RELATED TECHNOLOGY OFFERS GREAT PROMISE TO 1 IMPROVE THAT SITUATION. Many professionals now rely on information technology (“IT”) to simplify, automate, or better understand aspects...

Words: 9086 - Pages: 37

Free Essay

Apple

...The Web Resource Space Model Web Information Systems Engineering and Internet Technologies Book Series Series Editor: Yanchun Zhang, Victoria University, Australia Editorial Board: Robin Chen, AT&T Umeshwar Dayal, HP Arun Iyengar, IBM Keith Jeffery, Rutherford Appleton Lab Xiaohua Jia, City University of Hong Kong Yahiko Kambayashi† Kyoto University Masaru Kitsuregawa, Tokyo University Qing Li, City University of Hong Kong Philip Yu, IBM Hongjun Lu, HKUST John Mylopoulos, University of Toronto Erich Neuhold, IPSI Tamer Ozsu, Waterloo University Maria Orlowska, DSTC Gultekin Ozsoyoglu, Case Western Reserve University Michael Papazoglou, Tilburg University Marek Rusinkiewicz, Telcordia Technology Stefano Spaccapietra, EPFL Vijay Varadharajan, Macquarie University Marianne Winslett, University of Illinois at Urbana-Champaign Xiaofang Zhou, University of Queensland Other Books in the Series: Semistructured Database Design by Tok Wang Ling, Mong Li Lee, Gillian Dobbie ISBN 0-378-23567-1 Web Content Delivery edited by Xueyan Tang, Jianliang Xu and Samuel T. Chanson ISBN 978-0-387-24356-6 Web Information Extraction and Integration by Marek Kowalkiewicz, Maria E. Orlowska, Tomasz Kaczmarek and Witold Abramowicz ISBN 978-0-387-72769-1 FORTHCOMING The Web Resource Space Model Hai Zhuge Chinese Academy of Sciences Hai Zhuge Key Lab of Intelligent Information Processing Institute of Computing Technology Chinese Academy of Sciences P.O. Box 2704-28 No. 6 Science South...

Words: 42490 - Pages: 170

Free Essay

Information Mgt

...Biotechnology Information Eric W. Sayers1,*, Tanya Barrett1, Dennis A. Benson1, Evan Bolton1, Stephen H. Bryant1, Kathi Canese1, Vyacheslav Chetvernin1, Deanna M. Church1, Michael DiCuccio1, Scott Federhen1, Michael Feolo1, Ian M. Fingerman1, Lewis Y. Geer1, Wolfgang Helmberg2, Yuri Kapustin1, David Landsman1, David J. Lipman1, Zhiyong Lu1, Thomas L. Madden1, Tom Madej1, Donna R. Maglott1, Aron Marchler-Bauer1, Vadim Miller1, Ilene Mizrachi1, James Ostell1, Anna Panchenko1, Lon Phan1, Kim D. Pruitt1, Gregory D. Schuler1, Edwin Sequeira1, Stephen T. Sherry1, Martin Shumway1, Karl Sirotkin1, Douglas Slotta1, Alexandre Souvorov1, Grigory Starchenko1, Tatiana A. Tatusova1, Lukas Wagner1, Yanli Wang1, W. John Wilbur1, Eugene Yaschenko1 and Jian Ye1 1 Downloaded from http://nar.oxfordjournals.org/ by guest on March 20, 2015 National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA and 2University Clinic of Blood Group Serology and Transfusion Medicine, Medical University of Graz, Auenbruggerplatz 3, A-8036 Graz, Austria Received September 16, 2010; Revised October 29, 2010; Accepted November 1, 2010 ABSTRACT In addition to maintaining the GenBank nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Web site. NCBI resources...

Words: 11985 - Pages: 48

Free Essay

Websphere Service Registry and Repository , Used for Soa Governance on Bpm

...Front cover WebSphere Service Registry and Repository Handbook Best practices Sample integration scenarios SOA governance Chris Dudley Laurent Rieu Martin Smithson Tapan Verma Byron Braswell ibm.com/redbooks International Technical Support Organization WebSphere Service Registry and Repository Handbook March 2007 SG24-7386-00 Note: Before using this information and the product it supports, read the information in “Notices” on page xv. First Edition (March 2007) This edition applies to Version 6, Release 0, Modification 0.1 of IBM WebSphere Service Registry and Repository (product number 5724-N72). © Copyright International Business Machines Corporation 2007. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii The team that wrote this redbook. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii Become a published author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . ...

Words: 163740 - Pages: 655

Premium Essay

Dphs Server

...Support Systems (DSS) has focused on computerized support for making decisions with respect to managerial problems (Turban 2005). Information is power. Providing significant and updated information is important to an administration because it is committed to promote transparency in school. It is grateful to a certain place that if it is linked to the rest of the world, it means that people could easily access vital information about the school. This set of web information could include data about the events of school or the school history, its vision and mission, its faculty and staff and its plans and programs. In this age of information, almost all fields of endeavor such as education, manufacturing, research, games, entertainment, and business treat information systems as a need. Indeed, every activity in our daily life today requires people to get involved in the use of information systems. Information technology is playing a crucial role in the development of modern society and social life. It has transformed the whole world into a global village. Now social life has moved to online. People are using discussion board, blogs and social networking sites through web-based technology to communicate digitally. World Wide Web, digital library, e-commerce and computer based distance learning have made our life easier. The advancement of information communication technology in developed country, education field should take advantage to upgrade their learning and management techniques...

Words: 22831 - Pages: 92

Free Essay

Interbot

...Interbot: A Resume Based Employment Interview Chatbot Using an Enhanced Example Based Dialog Model Andrea May G. Aquino Department of Computer Science University of Santo Tomas Espana, Manila, 1008, PH andreamayaquino@gmail.com Katherine May Ann R. Bayona Department of Computer Science University of Santo Tomas Espana, Manila, 1008, PH kmarbayona@gmail.com Kimberly Ann D.R. Gonzales Department of Computer Science University of Santo Tomas Espana, Manila, 1008, PH kimberlyanngonzales @yahoo.com Gabrielle Ann D. Reyes Department of Computer Science University of Santo Tomas Espana, Manila, 1008, PH gabrielleannreyes@gmail.com Ria A. Sagum Department of Computer Science University of Santo Tomas Espana, Manila, 1008, PH riasagum31@yahoo.com ABSTRACT Traditional resume based recruitment interviews conducted by Human Resources (HR) specialists are time-consuming and costly. In-person interviews only allow companies to handle only a limited number of job applicants at a time. Also, there is no centralized database for resume storage and retrieval. As a result, a substantial amount of time and money is misdirected on interviewing unqualified job applicants. The proponents developed a resume based employment interview chatbot, using an enhanced example based dialog model, to evaluate job applicants’ consistency in their resume details and interview answers. The chatbot will replace the HR interviewer while maintaining the fundamental quality...

Words: 6800 - Pages: 28

Free Essay

Nogotiation

...also divided by several technical issues. There are subfields which are focussed on the solution of specific problems, on one of several possible approaches, on the use of widely differing tools and towards the accomplishment of particular applications. The central problems of AI include such traits as reasoning, knowledge, planning, learning, communication, perception and the ability to move and manipulate objects.[6] General intelligence (or "strong AI") is still among the field's long term goals.[7] Currently popular approaches include statistical methods, computational intelligence and traditional symbolic AI. There are an enormous number of tools used in AI, including versions of search and mathematical optimization, logic, methods based on probability and economics, and many others. The field was founded on the claim that a central property of humans, intelligence—the sapience of Homo sapiens—can be so precisely described...

Words: 7301 - Pages: 30

Premium Essay

Hostel

...Chapter-1 1.1 Introduction Android is an software platform and operating system for mobile devices. It is based on the Linux kernel. It was developed by Google and later the Open Handset Alliance (OHA). It allows writing managed code in the Java language. Due to Android here is the possibility to write applications in other languages and compiling it to ARM native code. Unveiling of the Android platform was announced on 5 November 2007 with the founding of OHA. It's a consortium of several companies 1.1.1 Introduction to Project Environment OPERATING SYSTEM: An operating system (OS) is software consisting of programs and data hostel management system project report runs on computers and manages computer hardware resources and provides common services for efficient execution of various application software. For hardware functions such as input and output and memory allocation, the operating system acts as an intermediary between application programs and the computer hardware, although the application code is usually executed directly by the hardware and will frequently call the OS or be interrupted by it. Operating systems | | Common Features: * Process management * Interrupts * Memory management ...

Words: 15019 - Pages: 61

Premium Essay

Internet

...The Future of the Internet A Compendium of European Projects on ICT Research Supported by the EU 7th Framework Programme for RTD European Commission I nform ati on S oc i et y and M ed ia Europe Direct is a service to help you find answers to your questions about the European Union New freephone number * 00 800 6 7 8 9 10 11 Certain mobile telephone operators do not allow access to 00800 numbers or these calls may be billed. In certain cases, these calls may be chargeable from telephone boxes or hotels. «The views expressed are those of the authors and not necessarily those of the European Commission or any of its officials» A great deal of additional information on the European Union is available on the Internet. It can be accessed through the Europa server (http://www.europa.eu). Cataloguing data can be found at the end of this publication. ISBN 978-92-79-08008-1 © European Communities, 2008 Reproduction is authorised provided the source is acknowledged. Printed in Belgium PRINTED ON CHLORE FREE PAPER The Future of the Internet A Compendium of European Projects on ICT Research Supported by the EU 7th Framework Programme for RTD European Commission I nform ati on S oc i et y and M ed ia ••• 2 Preface 5 priorities identified by the Internet Governance Forum: openness, security, access, diversity and critical Internet resources. The use of the Internet in public policies will considerably grow in areas such as education, culture, health and e-government...

Words: 66329 - Pages: 266

Premium Essay

Databasse Management

...Fundamentals of Database Systems Preface....................................................................................................................................................12 Contents of This Edition.....................................................................................................................13 Guidelines for Using This Book.........................................................................................................14 Acknowledgments ..............................................................................................................................15 Contents of This Edition.........................................................................................................................17 Guidelines for Using This Book.............................................................................................................19 Acknowledgments ..................................................................................................................................21 About the Authors ..................................................................................................................................22 Part 1: Basic Concepts............................................................................................................................23 Chapter 1: Databases and Database Users..........................................................................................23 ...

Words: 229471 - Pages: 918

Premium Essay

Daimler-Chrysler Merger Portrayal

... Elsevier Butterworth–Heinemann 200 Wheeler Road, Burlington, MA 01803, USA Linacre House, Jordan Hill, Oxford OX2 8DP, UK Copyright © 2005, Elsevier Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher. Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone: (+44) 1865 843830, fax: (+44) 1865 853333, e-mail: permissions@elsevier.com.uk. You may also complete your request on-line via the Elsevier homepage (http://elsevier.com), by selecting “Customer Support” and then “Obtaining Permissions.” Recognizing the importance of preserving what has been written, Elsevier prints its books on acid-free paper whenever possible. Library of Congress Cataloging-in-Publication Data Rao, Madanmohan. KM tools and techniques : practitioners and experts evaluate KM solutions / Madanmohan Rao. p. cm. Includes bibliographical references and index. ISBN 0-7506-7818-6 (alk. paper) 1. Knowledge management. 2. Organizational learning. 3. Knowledge management—Data processing. 4. Management information systems. 5. Information resources management. 6. Database management. I. Title Knowledge management tools and techniques. II. Title. HD30.2.R356 2004 658.4¢038—dc22 2004050698 British Library...

Words: 182966 - Pages: 732