ࡱ > % T T A q bjbjt+t+ > A A ] . . . $ R R R R P 4 R j ! ^ ! " " " " J( ( 4 ) a c c c c c c $ n . 2) R' ' ^ 2) 2) A " " : N! J A A A 2)
8 " . " a R R 2) a A . A G . a " d 0f/fR R .7
User Profile Construction
Alistair Duke & Jos van der Meer
BT, AIdministrator
Identifier Deliverable 14Class DeliverableVersion 1Version date 27-9-2002Status ReleaseDistribution InternalResponsible Partner BTexact
On-To-Knowledge Consortium
This document is part of a research project funded by the IST Programme of the Commission of the European Communities as project number IST-1999-10132. The partners in this project are: Vrije Universiteit Amsterdam (VU) (co-ordinator), NL; the University of Karlsruhe, Germany; Schweizerische Lebensversicherungs- und Rentenanstalt / Swiss Life, Switzerland; British Telecommunications plc, UK; CognIT a.s, Norway; EnerSearch AB, Sweden; AIdministrator Nederland BV, NL.
Vrije Universiteit Amsterdam (VU)Faculty of Sciences, Division of Mathematics and Computer ScienceDe Boelelaan 1081a1081 HV Amsterdam, the NetherlandsFax and Answering machine: +31-(0)20-872 27 22Mobil phone: +31-(0)6-51850619Contactperson: Dieter FenselE-mail: dieter@cs.vu.nl University of KarlsruheInstitute AIFBKaiserstr. 12D-76128 Karlsruhe, GermanyTel: +49-721-6083923Fax: +49-721-693717Contactperson: R. StuderE-mail: studer@aifb.uni-karlsruhe.deSchweizerische Lebensversicherungs- und Rentenanstalt / Swiss LifeSwiss Life Information Systems Research GroupGeneral Guisan-Quai 408022 Zrich, SwitzerlandTel: (41 1) 284 4061, Fax: (41 1)284 6913 Contactperson: Ulrich ReimerE-mail: Ulrich.Reimer@swisslife.chBritish Telecommunications plcBT Adastral ParkMartlesham HeathIP5 3RE Ipswich, UKTel: (44 1473)605536, Fax: (44 1473)642459Contactperson: John DaviesE-mail: John.nj.Davies@bt.comCognIT a.sBusterudgt 1.N-1754 Halden, NorwayTel: +47 69 1770 44, Fax: +47 669 006 12Contactperson: Bernt. A. BremdalE-mail: HYPERLINK mailto:bernt@cognit.no bernt@cognit.noEnerSearch ABSE 205 09 Malm, SwedenTel: +46 40 25 58 25; Fax: +46 40 611 51 84Contactperson: Hans OttossonE-mail: hans.ottosson@enersearch.seAIdministrator Nederland BVJulianaplein 14B3817CS Amersfoort, NLTel: (31-33)4659987, Fax: (31-33)4659987 Contactperson: Jos van der MeerE-mail: Jos.van.der.Meer@aidministrator.nlSirma AI, Ltd. - Artificial Intelligence Labs, Onto-Text Lab.
Sophia, Bulgaria
Tel: (359 2) 981 23 38
Contact person: Atanas Kiryakov
E-mail: naso@sirma.bg
Abstract
This report accompanies the demonstration of deliverable 14 - User Profile Construction. The deliverable is concerned with the construction of user-specific profiles, expressed in ontological terms. The report discusses the need for such profiles and describes the prototypes that were built.
Introduction
One of the main aims of Knowledge Management is a reduction of what is termed information overload. Broader use of IT and communications technology in the workplace has led to people becoming swamped with information both in terms of what they receive (usually via e-mail) and in terms of what is available to them when browsing or searching.
KM tools have attempted to reduce information overload with the use of user profiles. These can take different forms. They can be automatically defined based upon a users role or position with an organisation or they can be user defined where the user is allowed to express what their interests are and refine these over time. KM tools employ these profiles in a matching process against information in an attempt to deliver to the user only the information that they are interested in thus reducing the overload.
Within the On-To-Knowledge project, tools have been developed that employ profiles. OntoShare [1] is a system designed to facilitate and encourage the sharing of information between communities of practice within, or perhaps across, organisations and to encourage people who may not previously have known of each others existence in a large organisation to make contact where there are mutual concerns or interests.
When a user finds information of sufficient interest to be shared with their community of practice, they can add it to the systems store. The system examines the document and determines to which of a set of concepts it should be added. These concepts comprise a hierarchy of subject areas describing the domain in which the community operates in other words, a light-weight Ontology.
As well as classifying the document, the sharing system notifies other users of its existence. It only informs users to whom it believes the document will be relevant based upon their membership profile to the different concepts in the ontology.
Spectacle [3, 6] is a visualisation facility. It can be used to visualise large sets of data or documents with the intention of allowing the user to more easily find what they are looking for and also detect relationships between data that would not be apparent in a textual presentation of the data. In this deliverable Spectacle will be used to visualise OntoShare profiles.
The purpose of this deliverable is to examine the profiling techniques that are used in the On-To-Knowledge tools and to develop prototypes that allow the profiles to be defined in a more expressive way.
OntoShare Profiles
In OntoShare, users can define their profile by subscribing to a set of concepts that are organised in an ontology. The membership to a particular concept is shown in the OntoShare user interface by a red flag icon (see figure 1.)
Figure1. OntoShare concepts and profile.
A user can select to subscribe to as many concepts as they like at any stage of the hierarchy. They can also select to be informed about information that is added to any sub-concepts of concepts that they have subscribed to. The users profile is then defined as a list of concepts. As documents are added to the system, the user is only informed if the document is added to a concept to which they have subscribed. This approach is sufficient for many users, however there are two situations where limitations of this approach can become apparent . The first is that users may actually be interested in documents that are added to concepts to which they are not subscribed. The second is that they may receive notification of documents which they are not actually interested in (contributing to the information overload problem). Both of these problems could be overcome if more expressive queries were allowed. The user would be able to further express within a concept which documents they were interested in. This would allow them to ignore uninteresting documents added to a concept to which they were subscribed. They would also be able to express an interest in a specific topic that was outside of their concept membership allowing them to receive notification if a document of that topic was added (in effect, every query implicitly defines a subclass which characterises the users interests more specifically).
For reasons of efficiency, OntoShare uses an internal data format and a local database for the storage of ontologies, user profiles, document classification, etc. This ensures that user interface response times are acceptable. In order to interoperate with other Semantic Web tools, it is necessary to export this internal format to RDF. This allows the OntoShare data to be stored in Sesame [4] - the RDF Schema-based Repository and Querying facility, developed by Aidministrator. This export process is described in section 4.
A benefit of this interoperability is the ability to allow users to further refine their interest profiles using RQL (RDF Query Language) queries. RQL is the query language for both RDF and RDF Schema. and allows the more expressive queries mentioned above to be developed. Sesame contains an RQL query engine which allows an RQL query to evaluated against the data in its stores. A prototype to allow users to express their OntoShare profiles as RQL and evaluate these using Sesame is described in section 5.
One aim of OntoShare is to develop links between members of a community of interest. It attempts to do this by allowing users to find out about other users that have similar interests to them or have an interest in a particular document that the user is accessing. A further aim of this deliverable was to explore how these facilities could be enhanced with the use of the Spectacle visualisation tool. The intention was to determine whether a visualisation of the OntoShare data in Spectacle would reveal further relationships between the data and users that were not apparent from the OntoShare interface. This facility would also allow people from outside of the community of interest to access the repository without actually using OntoShare itself and is another benefit of the interoperability between the OntoShare tools. This is described in section 6.
RDF Export
As described above, in order to carry out the user profiling task, it was first necessary to express the OntoShare store as RDF and upload it to Sesame. OntoShare was designed to use its own database to store all information including user profiles and document details such as keywords, URL, concept membership, etc. Although it would be feasible to use Sesame in its current form as the OntoShare database, it would not be as efficient. OntoShares own internal datastructures are optimised for the usage made by OntoShare, rather than the general-purpose operation provided by Sesame. RDF is seen as an interchange format between Semantic Web applications. A local SQL database is used as the OntoShare store. Both Oracle and MySQL can be used to provide this database to OntoShare.
EMBED Word.Picture.8
Figure2. OntoShare RDF export
The export can be carried out in two ways. Figure 2 shows the first of these via a WWW interface. This allows a knowledge engineer to select one of the available OntoShare stores for export (each community will have its own store). They can select whether to export the keywords that are extracted by the ProSum text summariser or not. These keywords are generally numerous and if not required can be left out to improve speed and reduce the amount of storage required. The knowledge engineer can also choose to export the RDF(S) as text (which can then be saved as an XML file) or directly to Sesame.
The second method of export is via a script. This latter method would be called automatically e.g. every night to upload the latest state of the store to Sesame. It is likely that this method would be the most widely used.
The OntoShare store is exported to an agreed RDF format. The storage of document information is in line with the data model that has been employed throughout the On-To-Knowledge project. In this model, a document or a URL is described as being isAbout a resource. As such, each document can have several isAbout statements if they are about several resources. The relationships between resources are then described by a schema which in this case the concept hierarchy that is displayed to the user in the OntoShare interface. Thus, if an OntoShare document is a member of three concepts, it will have three isAbout statements one for each of the concepts. This is illustrated by the following RDFS fragment:
and a corresponding RDF fragment which also shows the properties that are stored for each document that has been submitted (excluding the keywords which are too numerous to show here):
2002-04
-19
MPEG-4
description
http://mpeg.telecomitalialab.com/standards/mpeg-4/
mpeg-4.htm
#Profile_6
Good beginners introduction to MPEG-4
< Executive Overview. This document gives an overview of the
MPEG-4 standard, explaining which pieces of technology it
includes and what sort of applications are supported by this
technology.>
The above RDF defines the various attributes that exist for a document or are attributed when a document is stored e.g. URL, submitted by, annotation, etc. The most important attribute is isAbout as this links the document to the ontology. This link is made when the document is added to one or more of the concepts in OntoShare. The RDF states that the document isAbout one or more of the concepts defined in the communitys ontology. This association is part of the data model that was defined for use in the On-To-Knowledge project. The full RDF(S) for OntoShare can be seen in appendix 1.
Once uploaded to Sesame, the RDF(S) data becomes available for use by all the other On-To-Knowledge tools and in particular, as described below, the Profile Refiner and Spectacle.
Profile Refiner
The Profile Refiner has been developed as a prototype to investigate the refinement of OntoShare profiles into more complex RQL queries. In the long term, its functionality could be incorporated into OntoShare. Alternatively the interface could exist in its own right as a profile editor for any application that uses Sesame or RDF based tools. Specifically, the users profile could be used to create a search agent which periodically queries one of more OntoShare repositories stored in Sesame and then feeds back the results to the user.
The architecture for the Profile Refiner is shown in figure 3.
EMBED Word.Picture.8
Figure 3. Profile Refiner Architecture
The Profile Refiner initially allows the user to logon using their OntoShare username and password. This allows the users OntoShare profile to be extracted from the OntoShare store using the same method employed in the OntoShare client interface (i.e a CGI call and XML response). The interface then presents this profile to the user and allows them to refine it. This step is described in detail below. Once the user is happy with their profile, they can then submit it. An RQL query is then constructed and evaluated against the RDF data stored in Sesame. The Sesame client library[5] is used to communicate with the Sesame server. The results of the query i.e. the URLs of a set of documents are then returned, parsed by the Profile Refiner and displayed. The user is then able to click on one of these URLs to access the document.
The interface of the Profile Refiner is shown in figure 4.
The top left of the interface shows the users profile as a selection of concepts. Initially this is identical to the OntoShare selection. The user can select whether to use this profile as a basis or alter it based upon selections made in the interface. If further selections are made, the user can perform these in a variety of ways. The user can select to:
Include all subclasses of the selected concepts in the query;
Include all subclasses of the selected concepts in the query apart from any number of subclasses (a tell me everything about X apart from Y query);
Include no subclasses at all; and
Include all concepts related to the selected concepts.
EMBED Word.Picture.8
Figure 4. Profile Refiner Interface
The 4th point above relies upon a measure of relationship between concepts beyond the predefined taxonomy (effectively crosslinks in the ontological structure). This is calculated by OntoShare using a statistical algorithm based upon the number of keywords that each concept shares. In OntoShare, keywords are used to characterise concepts. This requires a nested RQL query since it is necessary to query to determine the related concepts of a given concept before querying for documents.
As well as selecting concepts for the query, the user can also refine by selecting to receive details of documents which have been:
Submitted between specified dates;
Submitted in a previous number of months or days;
Submitted by a particular set of users;
Submitted by all but a particular set of users;
Containing a set of specified keywords;
That do not contain a set of specified keywords.
Appropriate combinations of the above can also be selected. The query can then be executed with the results displayed at the bottom of the interface. This result display could be improved by showing further information such as who submitted it, comments supplied, etc. (as is the case with the OntoShare interface).
Queries can be stored using the Profile Refiner. A save function stores the query in the OntoShare store. This would allow users to refine their query over a period of time using the interface. It would also allow periodic queries to be carried out with the result presented to the user perhaps as a digest of activity within a communitys OntoShare system, as mentioned above.
The intention of this task was to allow much more complex queries to be expressed. The following example certainly illustrates that this is the case. In OntoShare, a tyical profile is just a list of concepts allowing you to express the query:
Show me documents which are about: Virtual Community or XML
Expressed as RQL this is:
Select url, title
from {doc}http://www.bt.com/ontoshare#isAbout{theme}
where ( ( theme = http://i97.labs.bt.com/rdfferret/export.xml#Virtual_Community)
or ( theme = http://i97.labs.bt.com/rdfferret/export.xml#XML))
Using the Profile Refiner, this can be refined to say:
Show me document which are about: Virtual Community or XML
or are about: any subclass of Virtual Community or XML
or are about: any group related to Virtual Community or XML
and have keywords: XML, Semantic Web
and do not have keywords: RDF
and are between dates: 2002-01-02 and 2002-05-05
and are submitted by: Martin Crossley or Alistair Duke
Expressed as RQL this query is:
Select url, title
from {doc}http://www.bt.com/ontoshare#isAbout{theme},
{doc}http://www.bt.com/ontoshare#title{title},
{doc}http://www.bt.com/ontoshare#href{url},
{doc_keyword} http://www.bt.com/ontoshare#ofDocument {doc},
{doc_keyword}http://www.bt.com/ontoshare#keyword{keyword},
{doc}http://www.bt.com/ontoshare#submit_date{date},
{doc}http://www.bt.com/ontoshare#submitted_by{sub_profile},
{sub_profile}http://www.bt.com/ontoshare#user_name{submitter}
where ( ( theme = http://i97.labs.bt.com/rdfferret/export.xml#Virtual_Community
or theme in subClassOf ( http://i97.labs.bt.com/rdfferret/export.xml#Virtual_Community ) )
or ( theme = http://i97.labs.bt.com/rdfferret/export.xml#XML
or theme in subClassOf ( http://i97.labs.bt.com/rdfferret/export.xml#XML ) ) )
or ( ( theme = http://i97.labs.bt.com/rdfferret/export.xml#Virtual_Community
or (theme in select theme
from {theme}http://www.bt.com/ontoshare#isRelatedTo{theme2} where theme2 = http://i97.labs.bt.com/rdfferret/export.xml#Virtual_Community) )
or ( theme like "http://i97.labs.bt.com/rdfferret/export.xml#XML"
or (theme in select theme
from {theme}http://www.bt.com/ontoshare#isRelatedTo{theme2}
where theme2 = http://i97.labs.bt.com/rdfferret/export.xml#XML
)
)
)
and ( keyword like "XML" or keyword like "Semantic Web")
and not ( keyword like "RDF")
and date > "2002-01-02" and date < "2002-05-05"
and ( submitter like "Martin Crossley" or submitter like "Alistair Duke" )
Spectacle Visualisation
The second main activity in this task is the use of Spectacle to visualise the OntoShare store. Note that in our termininology 'visualisation' refers to both graphical visualisation as well as textual presentation of information [6]. For the OntoShare case study this has resulted in a textual presentation, as easy as an ordinary web-site. In fact it is an ordinary web-site. Its value is not in its HTML, its value is in the contents presented and in the structure by which the contents are presented.
The main information items in OntoShare and their relations are now briefly described, since they will be the basis for the Spectacle presentation.
Information objects:
A user in the OntoShare system is represented by a "profile" object. A profile contains basic information about the human user, such as an e-mail address and a full name.
A document in the OntoShare system is represented by a "document" object. A document does contain basic (meta-)information about the real-world document, such as its title, summary and URI of the real-world document.
Concepts are (of course) objects in the OntoShare system too.
Relations between information objects:
When e.g. a document is submitted by a user, a "submitted by" relation will exist between the document and the user's profile. When a user adds comment to a document, a Comment object is created. A "comments on" relation between this object and the document is created plus an "author of" relation between the user's profile and the Comment object.
Classifications of a document are represented by "is about" relations between the document and concept objects. These relations can be the result of explicit user classification and of automatic system classification by OntoShare.
Interests of users are represented by "member of" relations between the user's profile and concept objects.
Ontoshare and Sesame
As described above, Sesame is not the native repository underlying the OntoShare system. In order to share the knowledge accumulated by the OntoShare system with other services, it is exported to Sesame as described in section 4. This is according to the overall On-To-Knowledge architecture, in which Sesame plays a pivotal role with respect to interoperability between components.
OntoShare, Sesame and Spectacle
On top of the Sesame client API that was developed by Aidministrator during the OntoKnowledge project and using the experiences with the Spectacle-based generic RDF Explorer developed earlier in the OntoKnowledge project (reported in deliverable 13), a OntoShare specific Spectacle application was developed to convey the information that was exported by OntoShare in Sesame, to end-users.
Spectacle was used to target information for each individual user. Most users prefer a personalised presentation, but don't like the idea to be "excluded" from information that is available. Therefore, even a personalized view includes access to all documents.
Personalisation with respect to the type hierarchy
The first personalisation is with respect to various subsets of the total document set. For each set (see below) the relevant documents are sorted by submission date (most recent first) and presented to the user. Starting at the root node (all documents of the set), the user can narrow down the amount of documents by selecting a subtype of the current concept. In this OntoShare personalisation, the concept hierarchy is the most natural selection mechanism when dealing with a large set of documents for two reasons. The first reason is that the concept type hierarchy is created by humans (as opposed to automatic extraction), which is reflected in its size (small) and its quality (high). The second reason is that the OntoShare end-users are already familiar with the concept type hierarchy.
The concept type hierarchy is also an effective selection mechanism. One reasons is that in the (collaboratively created) OntoShare ontology, subclasses always refine their superclass (i.e. they are proper subclasses of their superclass). Furthermore, leaf concepts always contain small sets of documents, since the depth of the concept type hierarchy at a particular node reflects the amount of information. And finally, documents are on average classified by means of the most specific concept(s) contained in the document.
It is important to notice that navigation along the concept type hierarchy is an application specific design decision. In the Enersearch case study [7] for example, due to the very flat and broad concept type hierarchy (which was extracted completely automatically by OntoExtract), it was decided instead that the conjunction of (possibly unrelated) concepts was the most natural and effective way to narrow down the amount of documents.
Figure 5. Navigation through various document sets.
Besides the document-sets determined by the concept-hierarchy, a number of other document-subsets are presented to each user as well:
The documents matching the interests of the user.
The documents that have been annotated by the user.
The documents that have been submitted by the user.
Figure 5 shows a screenshot of the resulting personalised information presentation for one of the users in the BT case study.
By the way, note that navigating to e.g. "Documents annotated by Martin Crossley / Collaboration" is equivalent to a non-trivial, on the fly RQL query generation.
The query in question is:
select Document
from {Comment} ontoshare:commentOn {Document},
{Comment} ontoshare:author {Profile},
{Document} ontoshare:isAbout . rdfs:subClassOf {Concept} . rdfs:subClassOf {conceptClass},
{Profile} ontoshare:user_name { MC }
where MC = "Martin Crossley"
and Concept = ns4:Collaboration
using namespace
rdf = http://www.w3.org/1999/02/22-rdf-syntax-ns# ,
rdfs = http://www.w3.org/2000/01/rdf-schema# ,
ns3 = http://i97.labs.bt.com/rdfferret/ ,
ns4 = http://i97.labs.bt.com/rdfferret/export.xml# ,
ontoshare = http://www.bt.com/ontoshare# ,
ns6 = http://i97.labs.bt.com/rdfferret/export.xml#News_:
Personalisation with respect to user-interests
The Spectacle presentation also encourages the interaction between different users of the system based on their mutual concerns or interests too.
Wherever a user is mentioned in the (document) presentation, it links to the OntoShare Spectacle presentation environment for that particular user. This is true for e.g. the submitter and the annotators as presented with a document. Following e.g. the "submitter" link, one can find "trivial" information about the submitter (such as name and e-mail address), but also more semantic information such as the concepts of interest of that submitter. Or, e.g., one can navigate through all (other) documents submitted by this person.
There is another interesting type information that is presented to the user at each chapter. Each chapter primarily contains (summaries of) documents, but since each chapter corresponds to a concept, it is an natural extention to present in the chapter:
which users have commented on at least one document containing this concept.
which users have submitted at least one document containing this concept.
which users have also expressed their interest in this concept.
Figure 6. Users that might be interesting.
Note that this added functionality is not part of the original OntoShare application. The required information for this functionality is derived from the information stored in Sesame. The OntoShare Spectacle presentation application formulates RQL queries which do not correspond to existing functionality in the OntoShare application.
Conclusions
This report has described prototypes that have been developed to explore user profile construction and visualisation. The Profile Refiner has been developed to allow users to refine profiles so that they can be delivered with the correct information and not overloaded with it. The interface relies upon an existing system OntoShare for which an ontology has been defined in RDF Schema (see appendix 1). It allows users to express their interests in a much more detailed way. However this expression is limited by the ontology in question. Currently there is little meta-data associated with the content of the document (other than membership of a set of concepts and user comments upon it). The realisation of the Semantic Web will mean that much more metadata will exist. In this scenario, users will be able to further refine their queries based upon the metadata associated with the documents themselves. This should allow profiles to be further refined.
In this task, the Spectacle visualisation illustrates an approach in which the semantic information about both the objects (documents) and users have been exploited to build a better information portal, without adding complexity to the end-user interaction. It is recognised that the expressive power of the Spectacle information presentation is less than the expressive power of the Profile Refiner for OntoShare, shown in figure 4 of Chapter 5. It is expected to depend upon the target users, what query and presentation metaphore supports them most effectively in their day-to-day tasks.
Both the Profile Refiner and Spectacle applications demonstrate tool interoperability with OntoShare and show the valus of OntoShares RDF-based information repository.
References
Davies, J., Duke, A. & Stonkus, A., "OntoShare: Ontologies for Knowledge Sharing, RDF & Semantic Web Applications workshop. 11th International WWW Conference, Hawaii, USA, May 2002.
Davies, J., Krohn, U. & Weeks, R., "QuizRDF: Search Technology for the Semantic Web", Semantic Web workshop, 11th International WWW Conference, Hawaii, USA, May 2002.
Wester, J. & Fluit, C., Using Visualization for Information Management Tasks. The workshop on Visualisation of the Semantic Web (VSW '02), July 2002, London
Broekstra, J., Kampman, A., and van Harmelen, F., Sesame: An Architecture for Storing and Querying RDF Data and Schema Information. In Semantics for the WWW, D. Fensel, J. Hendler, H. Lieberman, and W. Wahlster, Eds. MIT Press, 2001
Communicating with Sesame, Published on the WWW at: HYPERLINK "http://sesame.aidministrator.nl/publications/communication.html" http://sesame.aidministrator.nl/publications/communication.html
OnToKnowledge deliverable 13, Visualization facility. Christiaan Fluit, Herko ter Horst, Jos van der Meer
OnToKnowledge Deliverable 29: Evaluation Document. To appear.
Appendix 1
On-To-Knowledge: Content-driven Knowledge management Tools through Evolving Ontologies
PAGE
Deliverable 14
-PAGE 2-
EU-IST Project IST-1999-10132 On-To-Knowledge
On-To-Knowledge: Content-driven Knowledge management Tools through Evolving Ontologies
! 5 6 G H I \ g u v
~
# ƽ쬩의}}r j B*CJ UmH 0J+ B*CJ mH B*CJ mH 0J+ 5B*CJ mH mH CJ mH B*CJ mH 5CJ$ mH mH j UhmH B*CJ H*mH 5B*CJ H*mH 0J/ 5B*CJ H*mH CJ mH 5CJ mH
56mH 5mH 5CJ0 mH B*mH j B*UhmH
5B*mH ) " # $ % & ' I \
$ $
7
Tn
u $
Tn $
Tn $
L+DS
7 $ &@#$ u#+D
7 " # $ % & ' I \
c d ~
6
*
+
R S ` a e f ` a - @ A ( ) + , .
mnfg .
.
<