The TEI Special Interest Group on Libraries has released version three of the Best
Practices for TEI in Libraries: A Guide for Mass Digitization,
Automated Workflows, and Promotion of Interoperability with XML Using
. The introduction
specifies five levels of practice:
There are many different library text digitization projects,
serving a variety of purposes. With this in mind, these
Best Practices are meant to be as inclusive as
possible by specifying five encoding levels. These levels are
meant to allow for a range of practice, from wholly automated
text creation and encoding, to encoding that requires expert
content knowledge, analysis, and editing. The encoding levels
are not strictly cumulative: while higher levels tend to build
upon lower levels by including more elements, higher levels are
not supersets because some elements used at lower levels are not
used at higher levels—often because more specific
elements replace generic elements.
One of my disappointments with my time working on the Texas Heritage Online
program was that so few libraries in Texas used TEI at any level. Most of the text digitization was at Level 1, at best (which is to say that there was no markup at all, just searchable text behind page images).
Lisa Spiro has a great post on her blog
, showcasing various Digital Humanities resources and tools. One of the resources I hadn't been aware of is the Text Analysis Developer's Alliance
, or TADA, and TaPoR
(Text Analysis Portal for Research). Note that the portal is being redesigned and the URL may change in future. Since I have a feeling that my next job will involve a lot of text analysis, this was a fantastic find. I also really like the DiRT list of text analysis tools
(again, this site is due to be redesigned and URLs may change). Thank you Lisa!
This was an interesting announcement posted to the Archives & Archivists list. I've been interested in digitization and preservation (two separate things, though often conflated) of audio-visual materials for a couple of years now, though I haven't had many opportunities to practice it.
Indiana University Bloomington announces the release of a detailed report entitled "Meeting the Challenge of Media Preservation: Strategies and Solutions." This 128-page report is available for
download at http://www.indiana.edu/~medpres/
"Meeting the Challenge" is the result of a year of research and
planning by a campus-wide task force charged with addressing the
problems identified in the earlier IU Bloomington media preservation survey report published in 2009. "Meeting the Challenge" explores a range of topics related to the preservation and conservation of audio, video, and film, including: guiding preservation principles, facility planning, prioritization, digitization methodologies, strategies for film, principles for access, technological infrastructure needs, and engagement with campus units and priorities. Although developed specifically for the Bloomington campus, the findings and analyses in "Meeting the Challenge" may be useful to universities and other organizations with media holdings.
While conversion into a digital format is a very good way to preserve the intellectual content of audio-visual material, and is thus a "preservation strategy," I am still concerned about losing the artefactual value of the original items. This report addresses physical storage of film materials, acknowledging that preservation issues can be largely addressed here through appropriate storage, but for the majority of the media types addressed it considers only digitization. I understand why -- digitization meets the preservation goal, in large part and also makes access easier for most users -- but I'd still prefer an approach that includes both strategies for all materials.
This relates to the post I did on the Library of Congress' Recollection
project a few weeks ago. One of the components of Recollection is Exhibit, part of the Simile suite of tools developed originally at MIT. Now the Exhibit tool itself has been updated for people who want to use it as a stand-alone component of different sites; the new version is called Exhibit3
MacKenzie Smith, Research Director at MIT Libraries and Eric Miller,
Founder and President of Zepheira, have written an article about Exhibit3
published on SemanticWeb.com
. It's actually a two-part article, but part II hasn't come out yet. Part I
discusses the tool and some of the functional requirements and architectural decisions that were involved in developing it. It's a good article not only for folks interested in the tool but also for developers interested in developing similar tools.
I don't know how widespread the movement to create this is, or if it's a one-woman show, but I liked the idea of having a day when people post, tweet, and discuss digital archives. There's a fair amount of interest in preserving digital information, as more and more of our everyday stuff is born digital, and very little useful information about digital preservation available outside of the library-archives community.
The Day of Digital Archives is an initiative to raise
awareness of digital archives among both users and managers. On this day,
archivists, digital humanists, programmers, or anyone else creating, using, or
managing digital archives are asked to devote some of their social media output
(i.e. tweets, blog posts, youtube videos, etc.) to describing their work with
digital archives. By collectively documenting what we do, we will be answering
questions like: What are digital archives? Who uses them? How are they created
and managed? Why are they important?
This year's Day of Digital Archives will be held on
October 6th and entries will be gathered at the Day of Digital Archives blog
What is meant by "digital archives" well, primarily
archives, repositories, content management systems and other initiatives that
collect or manage born-digital material. These initiatives don't have to
primarily collect born-digital materials...in fact they are more likely to only
have some born-digital content as part of their mandate. Or, maybe they don't
really have a "mandate" at all...maybe someone will contribute their
thoughts about managing their own personal digital content or social media
presence. The thread ties the participants together is that they collect,
manage, preserve, develop, use, think about or otherwise love born-digital
Do you create, manage, or use digital archives? Would you
like to participate? Well then, drop me a line at
gretchen[.]gueguen[@]gmail[.]com with your contact info and I'll keep you up to
You could contribute in a couple of ways:
1. Create a blog post at http://dayofdigitalarchives.blogspot.com/
for the 6th of October (rather that writing it on October 6th, you can
pre-write it to automatically post on the 6th as well) talking about some
aspect of your work with Digital Archives on that day. It could be a really
specific exploration of a single activity on that day. Or it could be a broader
topic not really related to that specific day (What kinds of tools you could
really use to process a born-digital collection).
2. Write a post to your own blog similar to that
described above and post a trackback to the Day of Digital Archives blog
3. Tweet throughout the day about your work with digital
archives using the #digitalArchivesDay hashtag
Even if you can't contribute a post or a tweet, be sure
to keep up with the blog on the 6th and join the discussion in the comments.
Digital Archivist, Digital Curation Services
of Virginia Library
PO Box 400114
Charlottesville, VA 22904
I chuckled a bit at this one. I'm pretty sure that this report essentially replicates a presentation I gave at the 2006 LITA Annual Meeting (parts of which I've done at other meetings, including the 2007 Texas Conference on Digital Libraries
), and we weren't the first ones to work on this problem by a long shot. Still, it's an important topic, and the Texas Heritage Online
board remains committed to providing a single search tool for all Texas institutions. In that context, the report is worth reading.
The Quest for the Holy Grail
An OCLC Research Report by:
Leah Prescott, Getty Research Institute and
Ricky Erway, OCLC ResearchDownload the report
(.pdf: 163K/29 pp.)
This report summarizes the discussions from an OCLC Research Library Partnership working group of nine single search implementers and highlights the emerging practices in providing single search access to library, archive and museum collections. The goal of the report is to foster successful single search implementations by sharing the experience of the working group with those who want to create single search but don't know where to start.
I heard about this at OpenRepositories 2011
. It looks like it will be useful for people who want to build map-based browsing interfaces for digital collections; the list, chart, and timeline exhibits are updates from the old Simile project (developed at MIT
, later maintained at Simile-Widgets.org
), but it adds some tools for normalizing data.
The following is a guest post by Trevor Owens, Digital Archivist with the Office of Strategic Initiatives.
We are happy to announce the full open source release of the
Recollection software platform. Briefly, Recollection is a web
application that enables librarians, archivists, curators, and
historians to create dynamic interfaces to cultural heritage
collections. If you are unfamiliar with the project, I blogged about how you can use the tool to explore cultural heritage collections last month. If you think Recollection might be useful to your organization, we encourage you to take a minute to request an account for the beta instance of the tool.
An example of a Recollection view, in this case of a set of postcards from Fairfax, VA
For most users, the instance of Recollection that the National Digital Information Infrastructure and Preservation Program maintains
will be more than enough for their needs. We will have more information
on exciting new features and next steps for our hosted instance in the
near future. With that said, this full open source release is an
important demonstration of the commitment of this project to openness,
transparency and sharing.
With the open source release anyone is now free to download the code
and set up their own instance of Recollection. For example, a newspaper
or other media outlet might set up their own instance of the
Recollection software and create accounts for their staff to let them
quickly share interfaces to data with their readers, or a University
could set up their own instance of the software to let their faculty,
students or staff explore cultural heritage data sets. Anyone is now
free to take the code, tweak it, and make it their own.
You can find the source code on the loc-recollect project page on sourceforge. If you are curious about what it would take for you to set up the software you can take a look at the readme file
which has some basic documentation for getting started. For those who
are interested, the following are some of the most important open
source components that the Recollection software depends on.
Recollection uses RDF to facilitate the integration of data drawn from diverse sources. Recollection uses Simile Exhibit to create the collection interfaces. Recollection uses Akara for data transformation. All of this is put together using Django, a Python Web framework.
It looks like the Federal Digitization Guidelines Initiative is making some progress on the AV standards and guidelines. Here's some new information:
This email is being sent to multiple lists
(apology for duplication) to notify colleagues that the Federal
Agencies Audio-Visual Digitization Working Group has posted an updated
version of the in-progress MXF Application Specification for Archive
and Preservation (AS-AP). This specification pertains to the wrapper
element for digital files produced when reformatting videotapes and
other audio-visual materials. In addition to the updated
specification, we have also posted a narrative commentary on six of the
issues we have encountered in our work, recapping a series of emails I
sent to a some of you earlier this summer.
Draft specification (27 pages): http://www.digitizationguidelines.gov/guidelines/FADGI_MXF_ASAP_Arch_Pres_1h_20110815.pdf
Issues paper (11 pages): http://www.digitizationguidelines.gov/guidelines/FADGI_MXF_ASAP_Issues_20110815.pdf
for the Web page with links to those two documents and four related
documents, including a general background paper written in 2010: http://www.digitizationguidelines.gov/guidelines/MXF_app_spec.html
the issues paper indicates, some matters to be specified in AS-AP
depend upon the execution of related work in standards organizations.
One key example concerns the clarification of the method(s) and tagging
to be employed when mapping interlaced frame images encoded as JPEG
2000 to the MXF container. We have been pleased to learn that there is
a move within SMPTE to revise the relevant standard: SMPTE ST 422:2006,
Material Exchange Format – Mapping JPEG 2000 Codestreams into the MXF
seek comments and advice from our readers. Our online response form
has been intermittent in its transmission of comments. Please feel
free to send me a note directly (firstname.lastname@example.org) or post a note to this list.
Office of Strategic Initiatives
Library of Congress
Washington, DC 20540-1330
I've been following this because the Texas Heritage Online Standards and Guidelines for Cultural Heritage Digitization Projects is being revised this summer, and we're including a section on AV digitization. I'll include this in the draft.
This looks like an interesting new tool:
VCU Libraries has posted new sets of digital collections
images on Flickr with images + metadata pulled from ContentDM's OAI, using a
tool developed by the our web team.
This Flickr uploader is a web-based PHP app that should
work with any OAI-enabled image repository (with some configuration - we have
not tested this).
The code is available at http://code.google.com/p/contentdm2flickr/
and is very much in beta, but we hope this tool will be useful for others. We'd
be excited to hear from you if you have questions, end up using it, or want to
join the project to improve it.
Web Applications Developer, VCU Libraries
804-827-3552 | email@example.com
I'll have to test it out. Of course, I'd prefer a Flickr --> OAI option so that I can get sets out of Flickr without crawling them, but I haven't seen anything with that functionality. Since Flickr metadata is unpredictable anyway, crawling works.