Ramblings on Librarianship, Technology, and Academia

I never metadiscourse I didn't like

4/1/08 12:15 pm - Open Repositories 2008, part 1.

All these papers will eventually be available in the Open Repositories 2008 conference repository. I'm linking to all of the placeholders; papers should be up soon.

This will be very limited liveblogging, because I'm typing in the conference and dictating betwen sessions, so I can't say much. Hopefully I'll get some good fodder for my upcoming sustainability post.

Keynote:

Repositories for Scientific Data, Peter Murray-Rust )

Session 1 – Web 2.0

Adding Discovery to Scholarly Search: Enhancing Institutional Repositories with OpenID and Connotea, Ian Mulvany, David Kane )

The margins of scholarship: repositories, Web 2.0 and scholarly practice, Richard Davis )

Rich Tags: Cross-Repository Browsing, Daniel Smith, Joe Lambert, mc schraefel )

Ow. I'm not doing this for the next session. I can blog at the breaks.

2/22/08 02:07 pm - real preservation

I've been getting increasingly concerned about what I see as a too-shallow view of sustainability in digital preservation. There's been a lot of lip service paid over the last few years to preservation, and I have certainly heard talks by grant-funding agencies in which they explained that they are now only funding grants which have sustainability written into the grant structure. Yet time and time again, I see soft money being awarded to projects for which the project administrators clearly have only the vaguest idea of what sustainability really means in a software environment.

I don't see this as anyone's fault, mind you. Software developers and IT folks aren't used to thinking of software projects in terms of Permanence. In the traditional software world, the only way something is going to be around forever is if it's going to be used all that time -- for example, a financial application which is in constant use needs to be constantly up. But archival digital preservation has a very different sense of permanence. For us, permanence might mean that you build a digital archival collection once, don't touch its content again for 10 years, but can still discover all of its preserved content at the end of those 10 years.

Meanwhile, in Internet time, a project which has been around for two years is clearly well past its prime and ready to be retired.

Repository managers are putting all of this great work into the repository layer* of preservation: handles and DOIs, PRESERV and PRONOM, JHOVE and audit trails and the RLG checklist. But meanwhile, all of these collections of digital objects -- many of them funded by limited-duration soft money -- are running on operating systems which will need to be upgraded and patched as time passes, on hardware which will need to be upgraded and repaired as time passes, on networks which require maintenance. Software requires sustenance and maintenance, and no project which doesn't take into account that such maintenance requires skilled technical people in perpetuity can succeed as perpetual preservation. Real sustainability means commitment from and communication with the programmers and sysadmins. It requires the techies understand an archivist's notion of "permanence", and the librarians and archivists (and grant agencies) understand how that a computer needs more than electricity to keep running -- it needs regular care and feeding.

(This, by the way, is one of the reasons I'm so excited by the OTW Archive of One's Own and the Transformative Works and Cultures journal. The individuals responsible for the archive and the journal *do* have a real understanding of and commitment to permanence down to the hardware and network provider level. Admittedly, it's a volunteer-run, donation supported organization, so its sustainability is an open question. But it's a question the OTW Board is wholeheartedly investigating, because they understand its importance.)

*I'm somewhat tempted to make an archival model of preservation that follows the layered structue of the OSI model of network communication. Collection policy layer, Accession layer, Content layer, Descriptive Metadata layer, Preservation Metadata layer, Application Layer, Operating System layer, Hardware layer. Then you could make sure any new preservation project has all of those checkboxes ticked. Sort of an uber-simplification of the RLG Checklist, in a nice, nerd-friendly format.

12/15/06 10:59 am - faculty driven vs. ideology driven

Yesterday I went to an extremely valuable BLC community of interest institutional repository meeting. The reason it was valuable was because most of the people attending were in about the same place we are -- barely started, no software yet chosen or a recently installed software package with a few articles in it. Many of the bloggers, speakers, and presenters on this topic have more established programs, with software, and a program, and administrative support, and invested faculty. Speaking with other librarians who, like us, are barely started in the process of setting up an institutional repository highlighted some valuable questions and concerns.

One which really came to light for me is a problem which is probably so resolved for the established repositories that people don't even consider it a question: should the institutional repository hold what we find valuable or what the faculty find valuable? Specifically, should we be driving toward open access articles, which the faculty aren't demanding, or should we be serving the faculty's actual demands, which for most of us seems to be file management of vast piles of working data (images or datasets, usually).

My argument is that we should be serving both of these needs, and it is deceptive to think of them as both "institutional repository". One need is driven by our faculty, and should be thought of as a business process requirement. Their business process requires them to be able to manage terabytes of data. Some libraries might be taking on the responsibility for helping them do this management (in terms of backups, metadata application, etc.). If, in a given university, managing this data is the library's responsibility, than as employees of the University we should of course be fulfilling our requirement.

But this is entirely separate from the open access archives of faculty research. One comment that multiple people made in yesterday's meeting is "but why should we be giving them open access archives, when they don't want them?" And my argument would be that in this particular place we're not responding to a faculty request -- and that's okay. We're being visionaries in our field. We're serving the greater scholarly community, which happens to include our faculty, even if they don't know it yet. We are getting ahead of the game, so when requirements about open deposit of research start coming down from grant funders, we'll be able to provide them with the repository.

Saying "but the faculty don't want Thing B, a one Thing A," is a false dichotomy. We provide them with Thing A, if that's part of our mission, but we also provide them with Thing B. Just because they aren't asking for it doesn't mean it's not our responsibility to give it to them. They don't have to use it -- though many of them, once they learn about increased impact factors, eventually will. But we should still give it to them.

Because open access archives aren't just for our own faculty. They benefit scholarship and education and research in the world, and that doesn't just help our own universities.

Certainly doesn't hurt, though.

11/8/06 11:15 am - k-federated gets divorced!

Okay, folks, I need your help. I am currently getting soaked in a brainstorm, and I'd like to get this idea down before I lose the details. But since this is a brainstorm, it might make no sense at all. Tell me if what I'm talking about is an incredibly stupid idea that will never work. Alternately, tell me if what I'm suggesting is ridiculously common, and everybody does it this way already, and how could I not have noticed?

The two-part problem:

1. As we investigate products for digital asset management in the library, it's extremely likely that no one product will solve all of our needs. We will perforce find ourselves with digital resources in a number of different products, and will need to design either a single front end, or we'll have to accept a certain amount of user confusion at not knowing which tool holds the resources they need.

2. It's entirely possible that a single asset might be simultaneously part of our institutional repository and yet necessary for our learning management software, or similarly dual-purposed. How do these assets get filed? In what product?

My idea: carefully design an institution-specific set of metadata fields for each purpose. One indicating institutional repository, for example, and another indicating learning management. Assign as many of these metadata fields as necessary to each asset, no matter what product the asset is stored in. Store the asset in a product which is best suited for that asset-type. Then, using some kind of harvesting (e.g. Z39.50, OAI), harvest the contents of the various products and repositories. Write an institution-specific search mechanism that knows how to search the harvested data for all, say, institutional repository items. Or for all items in the special collections.

This idea of course ellides several major problems: designing the metadata; building what is effectively a small-scale federated search tool; deciding the appropriate product for the appropriate kind of asset; submitting assets into a multitude of products, possibly by non-librarian users such as faculty members and students. But is there any meat to this idea?ed

3/10/06 12:04 pm - digital asset management

As an aside, I'm really interested, as I look around the net, to see if other institutions have manged to have needs-driven digital asset management initiatives rather than tools-driven. The problem seems to be that all of these digital asset management projects (course materials, IR needs, exhibits, etc) occur all over an institution, and existing software projects have been organically slipping into other niches to fill needs. Need course materials stored? Let your course management monopoly package do it. Need to catalogue your e-journals, and then your local pre-prints? Let your OPAC software store them as well. There's exciting projects going on building more comprehensive and planned tools, but the needs are now, and users aren't just clamoring, they're using whatever they can find.

Are there potentially going to be products which will be good at storing IR text documents and websites and internal archival materials for preservation and display and multimedia objects for classroom and research use and and easy-to-use upload server for student work and whatever else comes up? Or should we resign ourselves that any good system will have to involve a number of technological solutions?

*reburies head in tool research*
Powered by LiveJournal.com