Ramblings on Librarianship, Technology, and Academia

I never metadiscourse I didn't like

4/2/08 10:22 am - RFC: Operational Preservation Matrix

[Tagged as, among other things, otw, because even though I am dealing with these issues as a professional I think that The Organization for Transformative Works is very well-placed to be one of the few organizations prepared to confront operational preservation from the outset. After all, the OTW has to deal with one even more frightening aspect of operational preservation: it is an entirely volunteer-run organization which promises perpetual preservation. It takes a lot of planning and commitment to be prepared to follow through on a commitment like that. Luckily, the OTW has both.]

Introductory thoughts on Operational Preservation )

I would love to get comments from the community on this, because I truly believe that this could be a very useful model for organizations designing digitization projects. I know I'm going to prompt my institution to follow this matrix for all new digitization efforts.

Problem Statement: When an archivist deposits material in a digital archive, he or she often has assumptions that object is preserved in perpetuity, just as it would be worried a physical object. Depositors of digital material often have the same assumptions, as do institutional administrators. However, the assumptions of the software development and maintenance community do not assume permanence on the same scale in which archivists are accustomed to providing permanence. Moreover, administrators (and archivists) often have unrealistic assumptions about the labor and costs involved in daily operational maintenance to provide digital preservation, which are -- if not higher -- certainly different from the operational maintenance costs for providing physical preservation. Even worse, many digital preservation projects are funded by limited-duration soft money instead of out of an operational budget.

Or, in a nutshell, we need to remember that Digital preservation has an ongoing operational cost which cannot be provided within the archive.

Operational Preservation: To that end, I am proposing this matrix for new preservation and archival projects to see if they have thought of the requirements necessary for permanent preservation.

Anything calling itself a digital preservation project has to be prepared, in perpetuity, to provide all items down the left-hand column for all of the items in the top row. Funding is really a redundant item -- by "Labor", I mean funding for staff to provide all of the work involved, and "Physical facility" is really something which can be provided by funding -- but the fact that digital preservation requires ongoing operational money is too important to ignore. By "Bureaucratic support" I mean policies and procedures in place which support the operational business of preservation at an organizational level.

Operational Preservation Matrix
Labor Physical facility Bureaucratic support Funding
Existence of the datastream
in a file system or database
. . . .
Object access via handle/doi/uri . . . .
Maintenance, repair, and upgrade
of hardware (server, disk, etc.)
. . . .
Maintenance, patching, and upgrade
operating system
. . . .
(The following tasks are not as
essential, but still very important)
. . . .
Rolling forward file formats . . . .
Transferring data to more modern
repository and software tools when appropriate
. . . .
Modernizing user interface as appropriate . . . .


(Of course, traditional preservation of physical objects is also an ongoing operational cost. Physical objects require extensive physical facilities with narrow environmental limitations, they require re-housing and repair, they require maintenance and supervision. But these ongoing operational tasks can be performed by archivists with traditional skills. The technological operational tasks of a digital archive often can't be performed even by technologically-trained archivists, because the institution will have specific requirements about who is able to, say, maintain the network.)

2/22/08 02:07 pm - real preservation

I've been getting increasingly concerned about what I see as a too-shallow view of sustainability in digital preservation. There's been a lot of lip service paid over the last few years to preservation, and I have certainly heard talks by grant-funding agencies in which they explained that they are now only funding grants which have sustainability written into the grant structure. Yet time and time again, I see soft money being awarded to projects for which the project administrators clearly have only the vaguest idea of what sustainability really means in a software environment.

I don't see this as anyone's fault, mind you. Software developers and IT folks aren't used to thinking of software projects in terms of Permanence. In the traditional software world, the only way something is going to be around forever is if it's going to be used all that time -- for example, a financial application which is in constant use needs to be constantly up. But archival digital preservation has a very different sense of permanence. For us, permanence might mean that you build a digital archival collection once, don't touch its content again for 10 years, but can still discover all of its preserved content at the end of those 10 years.

Meanwhile, in Internet time, a project which has been around for two years is clearly well past its prime and ready to be retired.

Repository managers are putting all of this great work into the repository layer* of preservation: handles and DOIs, PRESERV and PRONOM, JHOVE and audit trails and the RLG checklist. But meanwhile, all of these collections of digital objects -- many of them funded by limited-duration soft money -- are running on operating systems which will need to be upgraded and patched as time passes, on hardware which will need to be upgraded and repaired as time passes, on networks which require maintenance. Software requires sustenance and maintenance, and no project which doesn't take into account that such maintenance requires skilled technical people in perpetuity can succeed as perpetual preservation. Real sustainability means commitment from and communication with the programmers and sysadmins. It requires the techies understand an archivist's notion of "permanence", and the librarians and archivists (and grant agencies) understand how that a computer needs more than electricity to keep running -- it needs regular care and feeding.

(This, by the way, is one of the reasons I'm so excited by the OTW Archive of One's Own and the Transformative Works and Cultures journal. The individuals responsible for the archive and the journal *do* have a real understanding of and commitment to permanence down to the hardware and network provider level. Admittedly, it's a volunteer-run, donation supported organization, so its sustainability is an open question. But it's a question the OTW Board is wholeheartedly investigating, because they understand its importance.)

*I'm somewhat tempted to make an archival model of preservation that follows the layered structue of the OSI model of network communication. Collection policy layer, Accession layer, Content layer, Descriptive Metadata layer, Preservation Metadata layer, Application Layer, Operating System layer, Hardware layer. Then you could make sure any new preservation project has all of those checkboxes ticked. Sort of an uber-simplification of the RLG Checklist, in a nice, nerd-friendly format.

6/22/07 11:40 am - jcdl post 2: digital curation and preservation

The first panel I went to was digital curation and preservation. my notes from these sessions are more sparse.

how to choose a digital preservation strategy )

factors affecting website reconstruction from the web infrastructure )

defining what digital curators do and what they need to know, the DigCCurr project )

generating best-effort preservation metadata for web resources at time of dissemination )

5/31/07 09:34 am - moving on: bittersweet

At the end of June, I will be leaving Brandeis to accept a position as Digital Resources Archivist at Tufts, and I'm experiencing major seller's remorse. Not buyer's remorse -- I am extremely excited about joining the team over at Tufts Digital Collections and Archives -- but seller's remorse. I don't want to leave my baby, my digital collections, with so much exciting work going on here.

The fact is that in only a year we've built the digital collections here from the glint in the milkman's eye to a robust and scalable program which will be ready to launch in a few weeks. What I'm most proud of is that I think we've built something which can live just fine without me while they hunt for a replacement, and what I am most upset about is leaving for somebody else all the great ideas for projects we've been forming as we've approached the finish line: Institutional Repository; ETDs; special faculty projects; integration with the University photography department. So all of you out there who read this humble blog and might have the skills to foster my baby? Apply for this job. The Brandeis Digital Collections deserve the best.

So, Tufts. Why am I so pleased about a position which looks like a step down? I'll be going from driving the entire digital collections initiative at one university to being responsible for a small component (management, ingest, and maintenance of digital objects) of the digital collections at a roughly equivalent university. (Not to mention that I will be moving from DSpace to Fedora, and so far, I very much prefer DSpace.)

Over the years that I've been working, I've learned something startling about myself: I'd rather be a small fish in a big pond than a big fish in a small pond. Which is not to say that I would rather be a peon or cog in the machine -- anything but. Everyone who knows me knows that I am chock full of opinions. But I want learning opportunities, mentors, people to teach me things. At the best working environment I ever had -- The Company Formerly Known As, as we like to call it -- I was smack in the middle of a large group of people which included both some of the best mentors I've ever had and some really terrific entry level people who were eager to learn. There was the opportunity to teach and learn from my peers.

I've had a great time over the last year at Brandeis learning by doing, learning by screwing up, learning by attending classes, learning by attending conferences, learning by reading blogs and mailing lists and conference proceedings. I've had my trial by fire, and now it's time for me to get some solid mentoring. The conferences I've attended over the last year have been chock full of presentations by people in the group I'm about to join. Now is my chance to really learn from people who've been doing this for a long time.

Also, I would be lying if I didn't admit that proximity to my home and a walking commute played a large part in my decision to change. One of the advantages of working in a university is gaining the University community. As a car-free person, I'm so distant from Brandeis geographically that I can't take advantage of that community. At Tufts, I can.

11/13/06 04:24 pm - style vs. substance

I've been seeing occasional resumes from librarians who've paid more attention to whuffie than to skills. Conference presentations, published papers, and frequent contribution to mailing lists and bulletin boards -- but an inability to answer direct questions in an interview. Candidates who are excited by the potential offered by new technologies and Library 2.0, but who can't talk about the practicalities of library work, even after several years work in a library. The whuffie might get a foot in the door, but it doesn't get anything after that. If it's clear there's no substance to a candidate, we don't continue with that individual.

I find this fairly reassuring, as I've been thinking lately about my own career and what I'd like to do with it. I've been given the opportunity to have a shift at the university's reference and information desk -- a fairly low-profile opportunity, as such shifts generally are. And I love it. Today I helped two students find the resources for semester-long projects, while showing them how to recognize from a citation whether something was a journal or monograph, how to read our catalog system to see whether or not we have the resources electronically or in print, how to find government documents... It was fantastic.

I know many people who are loaded up on social capital are *also* people of substance. But it's good to remind myself that the relationship between social capital and substance isn't 1:1, and that it's fairly easy to see when there is nothing behind a good dose of social capital.

11/8/06 11:15 am - k-federated gets divorced!

Okay, folks, I need your help. I am currently getting soaked in a brainstorm, and I'd like to get this idea down before I lose the details. But since this is a brainstorm, it might make no sense at all. Tell me if what I'm talking about is an incredibly stupid idea that will never work. Alternately, tell me if what I'm suggesting is ridiculously common, and everybody does it this way already, and how could I not have noticed?

The two-part problem:

1. As we investigate products for digital asset management in the library, it's extremely likely that no one product will solve all of our needs. We will perforce find ourselves with digital resources in a number of different products, and will need to design either a single front end, or we'll have to accept a certain amount of user confusion at not knowing which tool holds the resources they need.

2. It's entirely possible that a single asset might be simultaneously part of our institutional repository and yet necessary for our learning management software, or similarly dual-purposed. How do these assets get filed? In what product?

My idea: carefully design an institution-specific set of metadata fields for each purpose. One indicating institutional repository, for example, and another indicating learning management. Assign as many of these metadata fields as necessary to each asset, no matter what product the asset is stored in. Store the asset in a product which is best suited for that asset-type. Then, using some kind of harvesting (e.g. Z39.50, OAI), harvest the contents of the various products and repositories. Write an institution-specific search mechanism that knows how to search the harvested data for all, say, institutional repository items. Or for all items in the special collections.

This idea of course ellides several major problems: designing the metadata; building what is effectively a small-scale federated search tool; deciding the appropriate product for the appropriate kind of asset; submitting assets into a multitude of products, possibly by non-librarian users such as faculty members and students. But is there any meat to this idea?ed

3/10/06 12:04 pm - digital asset management

As an aside, I'm really interested, as I look around the net, to see if other institutions have manged to have needs-driven digital asset management initiatives rather than tools-driven. The problem seems to be that all of these digital asset management projects (course materials, IR needs, exhibits, etc) occur all over an institution, and existing software projects have been organically slipping into other niches to fill needs. Need course materials stored? Let your course management monopoly package do it. Need to catalogue your e-journals, and then your local pre-prints? Let your OPAC software store them as well. There's exciting projects going on building more comprehensive and planned tools, but the needs are now, and users aren't just clamoring, they're using whatever they can find.

Are there potentially going to be products which will be good at storing IR text documents and websites and internal archival materials for preservation and display and multimedia objects for classroom and research use and and easy-to-use upload server for student work and whatever else comes up? Or should we resign ourselves that any good system will have to involve a number of technological solutions?

*reburies head in tool research*

11/8/05 01:43 am - Hubris and Ambition

Recently, I had an extended interview for a job for which I was ultimately rejected. I don't know who did get the job, but I'm sure I'll know soon enough. You see, this interview was to become Somebody in the library world. The person in this position will be a Mover and Shaker in the world of librarianship and technology. She'll have the opportunity to see potential improvements in librarianship and make them happen, to change the rules, to be part of the paradigm shift. I'm sure in the coming years I'll see her name at conferences, in books, on papers. And I'll be a little jealous every time.

As luck would have it, my next interview -- before I'd even been rejected from the Somebody position -- was to be a Nobody. A cog in a library system, about 6 steps removed from any reference or research or information. My job would be to make life a little more efficient for those who make life more efficient for those who enable the people who do actual library work. And what I discovered, when I interviewed for the Nobody position, was that I'd been corrupted by the interview for the Somebody position. While I'd not gone into librarianship in the hopes of fame and fortune, suddenly I found all other library positions paling before the reflected glory of my unrealised Somebodyness. All my unrealised hopes and dreams (the novelist I'll never be, despite my mother's constant pressure; the open-source revolution I never made; the PhD I never got; even the BNF I'm not) brought to light in all their unattractive, spotted, warty nakedness. Suddenly the simple library jobs for which I'd dropped my career, gone thousands of dollars into debt, and changed my life seemed petty.

It's hard getting my perspective back. I remind myself that it's easy in this day and age for a smart person to become Somebody if she so chooses. I have this blog: if I think of clever and world-shattering ideas I can post them. I'm a programmer: if I don't like existing library software I can write my own, better software. I'm literate and intelligent: I can write articles, attend conferences, and generally make a Somebody of myself. But only if I want to. It's not going to happen because an employer tells me so, but only if it's so important to me to become Somebody that I do the work.

Is it that important to me? I don't know. I'm happy enough in my life, and don't generally think I need to be on the forefront of changing the world. I don't want to be a name everybody knows, though I'd certainly not mind the private satisfaction of knowing that the Somebodys out there owe some small measure of their success to me. (I always did crew in high school plays. Does it show?) It wouldn't have occured to me until I interviewed for the Mover and Shaker position and realised the idea thrilled me. (And terrified me, in equal measure.)

I have to remember that being a librarian is, by definition, being Somebody. Remind myself of all my old lessons in social justice and community activism. Think globally, act locally, and all that. And I do remember, usually. Except late at night, when I'm trying to sleep, and I'm drowning in might-have-beens.

Note to self: Self, remember how [info]parenth_blog and [info]mirith convinced you to become a librarian? It was because they showed you how much you'd love reference, and they were right.

Self answers: Doh! I forgot. And Self gets back to the busy game of looking forward to reference and instruction at a conventional librarian position.

3/18/05 08:29 am - collection development

When I started this blog, I thought I would be doing a lot more explorations of the advances and changes so prevalent in library technology. The fact is, though, that right now I'm somewhat focused on being hired as a professional librarian in the job I currently have, which means that my library-like concerns are focused on the needs of this job. That's not a bad thing; this position calls for a broad set of skills, including management, collection development, reference, managing online resources and designing print and digital pathfinders, and the public services and facilities management aspects of a small library. It's also not all a bad thing that I'm being forced by circumstance to hone traditional librarianship skills instead of following my inclination and leaping off to spend time with the digital shiny before I have a handle on the basics. While I'm no expert, after combining my experience at this job with the cataloguing I've done elsewhere, I believe I've at least touched lightly on all aspects of traditional librarianship except budgeting and construction, and I did both of those extensively in my technology life. Which isn't to say that I believe that after a year of paraprofessional student library jobs I'm a library expert. I'm just glad that I'm getting some breadth and depth in traditional library experience. Heck, I have to keep reminding myself that I don't want to spend all day in front of the computer, anyway. If I didn't want to be in a traditional library, I never would have left IT. Just because I want to spend some time focusing on the digital doesn't mean it will serve me well to shortchange the traditional.

reference collection development isn't as simple as they taught us in class, if Balay can't help me )

2/10/05 12:01 am - wikipedia

For the past two days, I've been a mad wikipedian. I got email from a classmate who knew I'm a wikipedia advocate asking me to talk about wikipedia to a colleague who'd called it "fun, but not scholarly" (I glowed appropriately, and with the necessary caveats and pro/con links); I watched Jon Udell's fabulous Heavy Metal Umlaut: the Movie (go watch! long, contains sound); and I updated several pages.

I found rather sad that the library and information science page, of all things, was in a year-old state of semi-stasis because of the early creation of an anti-academic article. After the neutral point of view was disputed, the page languished, unedited, for a year. I've been editing non-controversial Wikipedia pages for a while, but this was the first page I've ever made changes to with a non-neutral point of view warning. It was intriguing reading through all of the style and etiquette guides to make sure I was going about it the right way.

I certainly hope I don't cause any flaming. There are some interestingly contradictory points of Wikipedia etiquette. The first is that you don't unnecessarily delete controversial points of view, you just cite them so that they are statements of fact ("Bill O'Reilly called Jeremy Glick a coward", rather than "Jeremy Glick's cowardice...") and include opposing viewpoints where appropriate. But the second is that you can't use weasel words: "some people say Jeremy Glick was a coward". The trick with the "library and information science" page is that the controversial content contained some unverifiable statements, namely that practicing librarians and LIS professors are frequently at loggerheads.

I suspect that there's truth in that statement, though certainly not as much as the original article implied. The problem is that once practicing librarians start publishing their disagreements with the academy, they are well, publishing. And therefore somewhat in the academy, or at least in the semi-academic world of self-reflection, publishing, and dissemination. Honestly, do most practicing librarians who aren't interested in LIS even care what happens of library schools once they get out, as long as graduating students are competent to do the work? Library students, now they care, and frequently wish there were more practicing librarians among their professors. I could probably find some evidence of controversy between practicing and scholarly librarians if I spent enough time searching the peer-reviewed literature, but I certainly couldn't find much of the open web (amusingly, several of my Google searches for the great missing controversy led me straight to [info]yarinareth2). Anyway, basic Wikipedia etiquette said that I needed to retain the original author's controversial statements as best I could, but Wikipedia style demanded more evidence than I could find. I did what I could, and weaseled out of it into discussion page for the article.

Sadly, now I've created a complex framework for the page, but it's midnight, and I have to wake up for work in 6 1/2 hours. I'll work on fleshing out the information, but hopefully other people will contribute as well (hint, hint).
Powered by LiveJournal.com