Contact

Blog

Best Practices for TEI in Libraries (v3)

posted Oct 17, 2011, 8:25 AM by Danielle Plumer

The TEI Special Interest Group on Libraries has released version three of the Best Practices for TEI in Libraries: A Guide for Mass Digitization, Automated Workflows, and Promotion of Interoperability with XML Using the TEI. The introduction specifies five levels of practice:
There are many different library text digitization projects, serving a variety of purposes. With this in mind, these Best Practices are meant to be as inclusive as possible by specifying five encoding levels. These levels are meant to allow for a range of practice, from wholly automated text creation and encoding, to encoding that requires expert content knowledge, analysis, and editing. The encoding levels are not strictly cumulative: while higher levels tend to build upon lower levels by including more elements, higher levels are not supersets because some elements used at lower levels are not used at higher levels—often because more specific elements replace generic elements.
One of my disappointments with my time working on the Texas Heritage Online program was that so few libraries in Texas used TEI at any level. Most of the text digitization was at Level 1, at best (which is to say that there was no markup at all, just searchable text behind page images).

Digital Humanities Text Analysis Tools

posted Oct 17, 2011, 8:14 AM by Danielle Plumer

Lisa Spiro has a great post on her blog, showcasing various Digital Humanities resources and tools. One of the resources I hadn't been aware of is the Text Analysis Developer's Alliance, or TADA, and TaPoR (Text Analysis Portal for Research). Note that the portal is being redesigned and the URL may change in future. Since I have a feeling that my next job will involve a lot of text analysis, this was a fantastic find. I also really like the DiRT list of text analysis tools (again, this site is due to be redesigned and URLs may change). Thank you Lisa!

Media Preservation (and Digitization)

posted Sep 26, 2011, 9:00 AM by Danielle Plumer

This was an interesting announcement posted to the Archives & Archivists list. I've been interested in digitization and preservation (two separate things, though often conflated) of audio-visual materials for a couple of years now, though I haven't had many opportunities to practice it.

Indiana University Bloomington announces the release of a detailed report entitled "Meeting the Challenge of Media Preservation: Strategies and Solutions." This 128-page report is available for  download at http://www.indiana.edu/~medpres/

"Meeting the Challenge" is the result of a year of research and  planning by a campus-wide task force charged with addressing the
problems identified in the earlier IU Bloomington media preservation survey report published in 2009. "Meeting the Challenge" explores a range of topics related to the preservation and conservation of audio, video, and film, including: guiding preservation principles, facility planning, prioritization, digitization methodologies, strategies for film, principles for access, technological infrastructure needs, and engagement with campus units and priorities. Although developed specifically for the Bloomington campus, the findings and analyses in "Meeting the Challenge" may be useful to universities and other organizations with media holdings.

While conversion into a digital format is a very good way to preserve the intellectual content of audio-visual material, and is thus a "preservation strategy," I am still concerned about losing the artefactual value of the original items. This report addresses physical storage of film materials, acknowledging that preservation issues can be largely addressed here through appropriate storage, but for the majority of the media types addressed it considers only digitization. I understand why -- digitization meets the preservation goal, in large part and also makes access easier for most users -- but I'd still prefer an approach that includes both strategies for all materials.

Exhibit3

posted Sep 8, 2011, 9:53 AM by Danielle Plumer

This relates to the post I did on the Library of Congress' Recollection project a few weeks ago. One of the components of Recollection is Exhibit, part of the Simile suite of tools developed originally at MIT. Now the Exhibit tool itself has been updated for people who want to use it as a stand-alone component of different sites; the new version is called Exhibit3.

MacKenzie Smith, Research Director at MIT Libraries and Eric Miller, Founder and President of Zepheira, have written an article about Exhibit3, published on SemanticWeb.com. It's actually a two-part article, but part II hasn't come out yet. Part I discusses the tool and some of the functional requirements and architectural decisions that were involved in developing it. It's a good article not only for folks interested in the tool but also for developers interested in developing similar tools.

Digital Humanities

posted Sep 6, 2011, 8:17 AM by Danielle Plumer

For the next semester, I'm going to be sitting in on a class in the School of Information at UT Austin. The class is called "Digital Humanities" and is taught by a new faculty member, Tanya Clement. Most assignments will be submitted via blog, so I'll double-post mine here and to the class blog.

For me, Digital Humanities is a way of doing projects in the humanities. It’s a bit broader than the old term, Humanities Computing, which was more about using computers to analyze humanities materials using techniques taken from computer science and related fields. Digital Humanities can include not only the analysis but also the presentation of the materials themselves.

I’ve created a quick Prezi with five DH projects I’m interested in at
http://prezi.com/_r-u_27kbuue/digitalhumanities/. The projects are:

I’ll post more information about each later.


Day of Digital Archives, Oct. 6th, 2011

posted Sep 6, 2011, 8:09 AM by Danielle Plumer   [ updated Sep 6, 2011, 8:14 AM ]

I don't know how widespread the movement to create this is, or if it's a one-woman show, but I liked the idea of having a day when people post, tweet, and discuss digital archives. There's a fair amount of interest in preserving digital information, as more and more of our everyday stuff is born digital, and very little useful information about digital preservation available outside of the library-archives community.

The Day of Digital Archives is an initiative to raise awareness of digital archives among both users and managers. On this day, archivists, digital humanists, programmers, or anyone else creating, using, or managing digital archives are asked to devote some of their social media output (i.e. tweets, blog posts, youtube videos, etc.) to describing their work with digital archives. By collectively documenting what we do, we will be answering questions like: What are digital archives? Who uses them? How are they created and managed? Why are they important?

This year's Day of Digital Archives will be held on October 6th and entries will be gathered at the Day of Digital Archives blog <http://dayofdigitalarchives.blogspot.com/>.

What is meant by "digital archives" well, primarily archives, repositories, content management systems and other initiatives that collect or manage born-digital material. These initiatives don't have to primarily collect born-digital materials...in fact they are more likely to only have some born-digital content as part of their mandate. Or, maybe they don't really have a "mandate" at all...maybe someone will contribute their thoughts about managing their own personal digital content or social media presence. The thread ties the participants together is that they collect, manage, preserve, develop, use, think about or otherwise love born-digital content.

Do you create, manage, or use digital archives? Would you like to participate? Well then, drop me a line at gretchen[.]gueguen[@]gmail[.]com with your contact info and I'll keep you up to date!

You could contribute in a couple of ways:

1. Create a blog post at http://dayofdigitalarchives.blogspot.com/ for the 6th of October (rather that writing it on October 6th, you can pre-write it to automatically post on the 6th as well) talking about some aspect of your work with Digital Archives on that day. It could be a really specific exploration of a single activity on that day. Or it could be a broader topic not really related to that specific day (What kinds of tools you could really use to process a born-digital collection).

2. Write a post to your own blog similar to that described above and post a trackback to the Day of Digital Archives blog

3. Tweet throughout the day about your work with digital archives using the #digitalArchivesDay hashtag

Even if you can't contribute a post or a tweet, be sure to keep up with the blog on the 6th and join the discussion in the comments.

Gretchen Gueguen
Digital Archivist, Digital Curation Services
University of Virginia Library
PO Box 400114
Charlottesville, VA 22904
(434) 924-4073


Single Search for Cultural Heritage Collections

posted Aug 23, 2011, 9:01 AM by Danielle Plumer

I chuckled a bit at this one. I'm pretty sure that this report essentially replicates a presentation I gave at the 2006 LITA Annual Meeting (parts of which I've done at other meetings, including the 2007 Texas Conference on Digital Libraries), and we weren't the first ones to work on this problem by a long shot. Still, it's an important topic, and the Texas Heritage Online board remains committed to providing a single search tool for all Texas institutions. In that context, the report is worth reading.

Single Search:
The Quest for the Holy Grail

An OCLC Research Report by:

Leah Prescott, Getty Research Institute and
Ricky Erway, OCLC Research

Download the report (.pdf: 163K/29 pp.)

This report summarizes the discussions from an OCLC Research Library Partnership working group of nine single search implementers and highlights the emerging practices in providing single search access to library, archive and museum collections. The goal of the report is to foster successful single search implementations by sharing the experience of the working group with those who want to create single search but don't know where to start.

Library of Congress: Recollection

posted Aug 23, 2011, 8:45 AM by Danielle Plumer   [ updated Aug 23, 2011, 8:53 AM ]

I heard about this at OpenRepositories 2011. It looks like it will be useful for people who want to build map-based browsing interfaces for digital collections; the list, chart, and timeline exhibits are updates from the old Simile project (developed at MIT, later maintained at Simile-Widgets.org), but it adds some tools for normalizing data.

The following is a guest post by Trevor Owens, Digital Archivist with the Office of Strategic Initiatives.

We are happy to announce the full open source release of the Recollection software platform. Briefly, Recollection is a web application that enables librarians, archivists, curators, and historians to create dynamic interfaces to cultural heritage collections. If you are unfamiliar with the project, I blogged about how you can use the tool to explore cultural heritage collections last month. If you think Recollection might be useful to your organization, we encourage you to take a minute to request an account for the beta instance of the tool.

Recollections

An example of a Recollection view, in this case of a set of postcards from Fairfax, VA

For most users, the instance of Recollection that the National Digital Information Infrastructure and Preservation Program maintains will be more than enough for their needs. We will have more information on exciting new features and next steps for our hosted instance in the near future. With that said, this full open source release is an important demonstration of the commitment of this project to openness, transparency and sharing.

With the open source release anyone is now free to download the code and set up their own instance of Recollection. For example, a newspaper or other media outlet might set up their own instance of the Recollection software and create accounts for their staff to let them quickly share interfaces to data with their readers, or a University could set up their own instance of the software to let their faculty, students or staff explore cultural heritage data sets. Anyone is now free to take the code, tweak it, and make it their own.

You can find the source code on the loc-recollect project page on sourceforge. If you are curious about what it would take for you to set up the software you can take a look at the readme file which has some basic documentation for getting started. For those who are interested, the following are some of the most important open source components that the Recollection software depends on. Recollection uses RDF to facilitate the integration of data drawn from diverse sources. Recollection uses Simile Exhibit to create the collection interfaces. Recollection uses Akara for data transformation. All of this is put together using Django, a Python Web framework.


MXF Application Specification

posted Aug 19, 2011, 10:21 AM by Danielle Plumer   [ updated Aug 19, 2011, 10:26 AM ]

It looks like the Federal Digitization Guidelines Initiative is making some progress on the AV standards and guidelines. Here's some new information:

This email is being sent to multiple lists (apology for duplication) to notify colleagues that the Federal Agencies Audio-Visual Digitization Working Group has posted an updated version of the in-progress MXF Application Specification for Archive and Preservation (AS-AP).  This specification pertains to the wrapper element for digital files produced when reformatting videotapes and other audio-visual materials.  In addition to the updated specification, we have also posted a narrative commentary on six of the issues we have encountered in our work, recapping a series of emails I sent to a some of you earlier this summer.

Draft specification (27 pages):  http://www.digitizationguidelines.gov/guidelines/FADGI_MXF_ASAP_Arch_Pres_1h_20110815.pdf

Issues paper (11 pages):  http://www.digitizationguidelines.gov/guidelines/FADGI_MXF_ASAP_Issues_20110815.pdf  

 URL for the Web page with links to those two documents and four related documents, including a general background paper written in 2010:  http://www.digitizationguidelines.gov/guidelines/MXF_app_spec.html 

As the issues paper indicates, some matters to be specified in AS-AP depend upon the execution of related work in standards organizations.  One key example concerns the clarification of the method(s) and tagging to be employed when mapping interlaced frame images encoded as JPEG 2000 to the MXF container.  We have been pleased to learn that there is a move within SMPTE to revise the relevant standard: SMPTE ST 422:2006, Material Exchange Format – Mapping JPEG 2000 Codestreams into the MXF Generic Container.

We seek comments and advice from our readers.  Our online response form has been intermittent in its transmission of comments.  Please feel free to send me a note directly (cfle@loc.gov) or post a note to this list.

Best wishes.

Carl Fleischhauer
Project Coordinator
Office of Strategic Initiatives
Library of Congress
Washington, DC 20540-1330
202-707-3979

cfle@loc.gov

I've been following this because the Texas Heritage Online Standards and Guidelines for Cultural Heritage Digitization Projects is being revised this summer, and we're including a section on AV digitization. I'll include this in the draft.

OAI --> Flickr importer tool

posted Aug 19, 2011, 10:08 AM by Danielle Plumer   [ updated Aug 19, 2011, 10:12 AM ]

This looks like an interesting new tool:

VCU Libraries has posted new sets of digital collections images on Flickr with images + metadata pulled from ContentDM's OAI, using a tool developed by the our web team.

This Flickr uploader is a web-based PHP app that should work with any OAI-enabled image repository (with some configuration - we have not tested this).

The code is available at http://code.google.com/p/contentdm2flickr/ and is very much in beta, but we hope this tool will be useful for others. We'd be excited to hear from you if you have questions, end up using it, or want to join the project to improve it.

Thanks!

--

Erin White
Web Applications Developer, VCU Libraries
804-827-3552 | erwhite@vcu.edu | http://library.vcu.edu/

I'll have to test it out. Of course, I'd prefer a Flickr --> OAI option so that I can get sets out of Flickr without crawling them, but I haven't seen anything with that functionality. Since Flickr metadata is unpredictable anyway, crawling works.

1-10 of 10