Its always good to reflect they say. So as part of the checking process to make sure everything is in place I decided to go back over the blog and map what we did, when we did it and to which of the three streams it impacted on. It is quite clear from this that the primary focus has been on metadata why, how, who does this? This has been closely linked to the setting up of the repository and the discussions around learning objects in general. The following image maps to some extent the process we have undertaken and dates the main events to which blog posts are available. The information in dashed boxes are main events that, currently have no formal report or blog post attached. Some of these are still sat on the lap top and will be updated in due course.
Posts Tagged ‘Metadata’
Project overview
Posted by Dawn on March 31, 2009
Posted in Events, General, Metadata, Reflections, Repositories, Search | Tagged: Metadata, Repositories, Search | Leave a Comment »
Evaluation… Part 3
Posted by Dawn on March 27, 2009
The final part of the user evaluation (part 3) analyses the responses given on the questionnaire concerning their use of LOs and their response to the metadata generator. Enjoy J
Posted in Metadata, Reports | Tagged: Metadata, prototype, user evaluation | Leave a Comment »
Evaluation Study … part 1
Posted by Dawn on March 25, 2009
Over the last few days I have been testing the metadata generator with some willing participants (3 in total). There were four parts to this study. The first task set comprised of a series of find and complete tasks aimed at testing the intuitiveness of the interface and terminology. Participants at this stage weren’t privy to help files or detailed information about the applications processes. The following report presents the finding of this part of the study and makes recommendations regarding the interface and terminology used only. The next three parts, auto generation process, uploading to intraLibrary and the questionnaire result, will be forth coming over the next few days. In view of this study I have revisited some of the questionnaires we undertook early on, based on the paper prototyping. We had very few respondents, so statistical analysis was not warranted, however some of the comments made were taken on board at the time. I will be including a brief discussion of these and the relevant aspects and compare, if possible, to the more recent questionnaire results.
Posted in General, Metadata, Reports | Tagged: Metadata, report, user evaluation | Leave a Comment »
Updating the overall architecture
Posted by Dawn on December 11, 2008
Last time Mark and I got together to apprise our current progress on the search extender for use with IntraLibrary, we also reviewed the general architecture of the systems surrounding the repository. As with most things this creative and dynamic this was done with pen and paper over a copy of our previous design and it is only recently that I have had the time to formalise it. An image of this is here. There has been a few refinements since then and the following diagram I think gives a good overview of what we have at the moment and what **things** are involved.
There are still a few issues. On the upload side we would have loved to have had a go at adapting SWORD to take a metadata file as well as the LO. SWORD’s purpose is to ease upload of multiple objects to multiple repositories as Nick explains here. But what does the user do then if they are not uploading packed objects? Where does the metadata come from? This is particularly relevant in the use of intraLibrary.
The other question is one that has been floating around for some time. How do we authentic the actual download of the Learning Object. Unlike Research papers (shown here as a comparison, as the repository is duel purpose) there are issues and concerns about making learning objects accessible external to the institution. As a result of this the search interface is currently two interfaces rather than the one. One suggestion is that as long as the metadata is exposed individuals could enquire as to getting a copy of the LO via the authors/institution.

Current Streamline Architecture
Posted in General, Metadata, Repositories, Search | Tagged: Architecture, Metadata, repository, Search | Leave a Comment »
Linking to intraLibrary
Posted by Dawn on November 12, 2008
Today I started looking at the repository in more detail. There are several jobs that need to be done in order to ensure that metadata produced by the automatic generator prototype will be successful integrated. The first of these is to define the subset of LOM that is to be applied to all LOs. Having come from eCat the auto-generator currently implements a sub set of the LOM standard. Discussions with Ben when we were looking at adapting eCat suggested that the subset it was using was well researched and was most practical to users.
From general discussion around the university there are small pockets of individuals producing LOs in a packaged format. Some of these maybe produced using compendle which is SCORM IMS compliant and contains all the necessary metadata. Others may be using wimba create (formally course genie) which I have been told produces little or no metadata, (although this is contradictory to the information on their web site, this maybe a version issue). Some may also be using the eCat plug-in for word, which we promoted last year. For most, leaning content will be produced using more familiar tools such as power point and word documents and possibly some web pages for the more adventurous. This type of content is the main focus for the auto-generator. But these different generation process produce different metadata and some none at all.
From the perspective of the repository just about any type of content can be submitted or referenced. So this isn’t a problem. The problem lies in what metadata (application profile form Interlibrary view) should be required. In order to account for those applications that produce and package LOs all possible fields should be made available. However for those having no metadata with their content and possibly no means of producing any, a very minimal set should be presented to get some basic information off the depositor.
So bearing these two extremes in mind I opted to set all metadata fields as optional and thus accounting for all possible subsets of LOM, but then presented the user with only a minimal mandatory set, viewable by the depositor. These included, the LO’s title, description, keywords, authors detail (for content only), contribution date and Rights details.
The second job I need to do was test how the process of upload works with already packed objects. For this I used those LOs produced the by the replica project. These are IMS content packages most seam to be dated as 2005. IntraLibrary should detect this and extract the embedded metadata to populate its own metadata records. This does not appear to be working for most of the objects uploaded. IntraLibrary also has a preview function which should work with most of the file types it can store. Again this function only worked properly with some of the LOs. Downloading and unpacking these LOs was fine and the content was the same as the originals. This might suggest that either IntraLibrary is not backwards compatible with previous standards or the packages uploaded are not well-formed.
The final job I had to do was to check the upload of external XML LOM files for attachment to content already uploaded. This was one of the features I most liked about IntraLibrary and also enable me to develop a standalone auto-generator with no packaging functionalities. Again I was to be disappointed. Nothing happens when I use this facility on IntraLibrary. I’ve double checked the XML format and it all seams correct so at the moment I have no idea why this is happening.
Nick is arranging for a meet with IntraLibrary within the next couple of weeks. Hopeful we should be able to resolve these problems then.
Posted in General, Metadata, Repositories | Tagged: IntraLibrary, Metadata, prototype, Repositories | 1 Comment »
Colour coded output 2
Posted by Dawn on September 5, 2008
In a previous post I discussed using colour to identify for the user the sources for the generated metadata. Whether it had come from the documents they had passed into the application or from one of the data collections. While I liked the idea I felt it was to complex which effectively defeated the whole reason for using it. So I dropped it and reverted back to plain black text.
However what is important for the user is to known whether or not the metadata is complete. By this I mean that those fields that are required to be filled are filled and some warning is passed to the user where this is not the case. I thought of using popup warnings that did not allow the user to export the metadata file (effectively complete the process) unless certain fields were filled in. Again this was too complex from a coding perspective and I felt was irritating to the user. There was also the fact that different repository and packaging systems don’t all follow the LOM standard explicitly.
Thinking back to the colour coding I decided to use that to provide none intrusive warnings to the user. The user readable output is colour coded red if a field is missing and essential to the LOM standard and grey if missing and none essential as viewed here. This works much better than the popups and also enables the user to decide whether or not to fill in the relevant feeds before exporting. Allowing them to tailor the metadata produced to what ever application they intend to use it with.
Posted in Metadata | Tagged: development, Metadata, prototype | Leave a Comment »
Extracting Metadata (basic)
Posted by Dawn on August 24, 2008
Discussed with Mark Thursday 24th ish before going on holiday.
The basic interface and functionality of the automatic metadata generator is almost complete. As identified in the report on eCat’s use of metadata there are three potential sources of metadata.
- The Learning objects (LOs) content and supporting documentation.
- Persistent collections of data – data that can be reused with each new metadata file.
- System data – data that can be generated from the system architecture and file formats etc.
The prototype has been designed to use collections of data for contributor (both content, LOM 2 Lifecycle, and metadata LOM 3 Meta-metadata) information, requirement information (LOM 4 Technical) and rights information (LOM 6 Rights). Personal preferences, another collection of data, can be utilised for LOM 2.1 version, 2.2 Status, the majority of LOM 5 educational section and some aspects of the LOM 4 Technical section.
System data is captured to identify file types and size. Ideally login information from embedded organisation systems should also be used to capture the user’s personal details. This is not implemented at this time as linking in to university systems is a prolonged and difficult task. A separate flat file login process has been setup to represent this enable the user to write their details once and use many times.
The final source of metadata is that of the LO and any organisational documentation or development notes (referred to as Scripts). This data can be used to identify potential keywords and possible classification of the LO. Classification has not been explored at this stage as the LeedsMet repository (the main test bed) has not identified the classification system it is going to use with LOs. The money is currently on JAC, but we shall have to wait and see.
I have focused on word docs to start with but should be able to utilise HTML, PowerPoint and possible pdf by the end of the project. Mark did point out that there is a substantial difference between word 2003 (my current version) and word 2007 (the new XML format for vista), but there are limitations to what we can achieve here. Maybe someone else can hack that for me
The aim of the extraction process is to generate a set of potential keywords from the documents supplied by the user. I have been running tests on some student essays at the moment as their topics are easy to distinguish. This potential set is then presented to the user so they can select what they think is most appropriate or add new words. Its more of a brain stimulator than a definitive answer to the keyword generation problem.
To do this I’ve started with the basic methods used in Information Retrieval (IR) problems. Simple term frequency (TF) scans the document counting the number of times each word appears. There is usually some pre processing of the document such as removal of Stop Words and Stemming . I’ve opted for just the stopping process as stemming returns many words that don’t convey the true contextual meaning from the perspective of keywords. For example computing becomes compute.
TF can expand into various other methods on of the most common being term frequency–inverse document frequency (tf-idf). Basically a weighted measure across a document set (not a single document). This determines the highest frequency of terms that occur for each document with the least number of occurrences across the set. This can only be used if the user submits several documents to the auto generator. So it has limitations.
The final method I tested was weighted document structures. This counts terms again but adds greater weight to those that appear in headings and titles. This can be used both on a single document and on a document set.
General I found very little difference across the three methods. The top three to five terms tended to be the same (ordering was often a little different), with the next five to ten words being a mix of useful and not so useful terms. No particular method stood out from this but they all seamed to be putting relevant words at the top of their lists.
Now I need to consider how to mix these basic techniques with the different types of content the user may submit. A textual learning object may benefit from the weighted term frequency where as scripts and university documents may perform better using the cross document set tf-idf.
Posted in General, Metadata, Reflections | Tagged: Metadata, prototype | Leave a Comment »
April’s Meeting
Posted by johnheap on April 9, 2008
A new team member joined the group for this week’s meeting. This was very useful as it allowed him to ask ‘innocent’ questions from a position of relative ignorance about the project and allowed the rest of us to clarify some of the confusions we have because we often see only our piece of the project. For example, we discussed the topic of Resource Discovery and I think we agreed (though my colleagues will no doubt correct my latest ‘confusion’) that the basic approaches to Resource Discovery – as it relates to Learning Objects in particular – are:
- Searching any metadata (and in terms of the Streamline project, hopefully this means appropriate and reliable metadata that has been automatically generated).
- Semantic analysis of content (Google Plus?)
- Peer recommendation/Social Networking.
Now, it was worth attending the meeting just to clarify that … but it wouldn’t have been clarified without a project newbie!
Posted in General, Metadata, Search | Tagged: Meeting, Metadata, Resource Discovery, Searching, Semantic analysis | Leave a Comment »
Tagging
Posted by Dawn on March 26, 2008
Spurned on by the tagging bug (although I still have yet to use the links they’re all nicely sorted) I bought two interesting books this week. I think they will be useful background to the social networking and personal organisation aspects of the project. Ok the books:
I’ve read most of the former and the latter I will leave for a later post. The book is light weight and easy to read presenting the concept in a variety of scenarios. Tagging is used to organise and relate all sorts of digital objects. Photos, links, files anything that resides on the internet or computer can be tagged. This is only restrained by the supporting system or tagging application. Not only do tags organise objects so they can be in many pigeonholes at once, they act as links form on hole to another related hole and within a community they act as a voting system. A way of communally defining an object as well as showing interest and suggesting that the thing is more important than those things you don’t bother to tag.
This got me thinking about metadata and whether it might become an obsolete technology. It’s designed in the same frame of mind as the Dewey system in some ways, aimed at experts to define objects in set and predefined way. This is certainly evident from the repositories Read the rest of this entry »
Posted in Links, Metadata, Reflections | Tagged: books, Metadata, tagging | Leave a Comment »
Search & Extract Metadata Interface designs
Posted by Dawn on March 5, 2008
Here are the interface designs for the two main tools we intend to produce. This one contains a ruff sketchof a free standing metadata generation tool using Denim. Once you’ve install Denim and open one of the files in the zip, go to the Denim’s menu (top left) and select File > Run. The enclosed word document gives more details as well as screen-shots of some of the features for those of you who are unable to get Denim to run. The ideas on how to present keywords to the user are contained in the previous Denim release and may be incorporated here depending on the algorithms used. At its most basic it should be able to extract keywords from university documents and use a store of information to add the reaming metadata fields. Some of this was discussed in a previous report on eCat’s metadata.
The second tool is for searching a metadata store, initially a folder of flat files. Sketches of three interfacehave been produced for this. The download also contains a word doc with more details and screen shots . These have been influenced by the investigations into repositories, content management systems and eCat. The basic functionality remains the same across these interfaces but their presentation is very different. The aim here is to produce a stand alone application for user evaluation of different search methods and algorithms. Hence the actual search feature has many elements to test different combinations.
Eventually both these tools can be developed into web services, if appropriate so they can be used by other applications. This will mean the development of a connection interface and administration tools.
Feed back on all these and the previous eCat interfaces released is essential so please make comments.
Posted in Downloads, Metadata, Search | Tagged: Denim, download, interfaces, Metadata, paper prototype, Search | Leave a Comment »
