Posts categorized “open access”.

sourcecode:binary::???:ppt/odp/pdf

(sourcecode is to binary as ??? is to ppt/odp/pdf)

Ted Gould just posted to the planet with his presentation that he gave at the Desktop Summit. At the end of his post you’ll notice that he uploaded his presentation to Launchpad (at lp:~ted/presentations/2009_desktop_summit/).

I think that is a great idea! Not only does it provide the ability for the community to see what others are using for their presentations but it allows anyone to branch a presentation, which has awesome potential. Especially with the presentation format that Ted chose, SVGs. The S5 presentation format (XHTML/CSS/JS based) would also be a great candidate for easy branching and editing of presentations.

But what if you need to create presentations with others who use Powerpoint or Impress and you wanted to harness the power of a Version Control System? Old powerpoint (ppt) files are binary blobs which don’t work well in version control systems (they *work* but not *well*). Impress (odp) and new Powerpoint (pptx) files are effectively zipped archives of xml and images. However, since it is zipped, bzr treats it as a binary. I only tested with bzr but don’t foresee any of the other systems behaving any differently.

Why would you want to use a VCS for your presentation files? Especially a DVCS like bzr/git/hg? COLLABORATION!

Some of you may know that I am currently working with Open.Michigan, a project at the University of Michigan that enables the creation of Open Educational Resources (OER). OER is effectively a broader term for the concept of Open CourseWare. Basically, everything used in education is a resource, not just presentations, and thus is useful for others to see, use, and remix. If you are curious to see what kinds of things we produce, see our Educommons installation.

OpenMichigan

Back to the topic at hand though: presentations and DVCS.

One of the major areas that the OER community could greatly improve upon is the area of remixing; taking the openly licensed materials and using them, adding new material, and creating something original. Remixing, in general, is enabled by having access to the source files of the material being worked with. Sure, you can use a PDF or a mp3 in a remix, but it is usually better to have the original .odt or multitrack file to work from. This is why Open.Michigan provides to the public the ppt files along with the pdfs of the presentations created through the OER program.

But lets leverage some of the tried and true methods of the FLOSS community in the OER community. One of the biggest and most fundamental benefits of the FLOSS world is that everyone has access to the source code, and can easily get it, edit it, and (hopefully) compile a new version of the program; effectively a “remix.” How does the FLOSS community lower the barriers and increase efficiency for that workflow? We provide public access to code repositories, instructions on building the software (documentation), and a bug tracker to inform what needs to be worked on next.

I want to mirror much of that to the OER community. One of the first things that needs to happen is to provide an easy way to manage multiple versions of a single resource (eg: presentation, video/audio, book). A VCS seems like the obvious choice. But there must be a better way than just managing binary blobs, right?

That is the part that I need to figure out next: how to utilize the power of a DVCS in this genre. Then I can move on to figuring out what a bug tracker for OER would look like (and if it is even needed). The documentation is actually already there, at least for Open.Michigan.

Do you have any ideas?

University of Michigan Open Access Week

There is a great event coming up at the University of Michigan, sponsored and coordinated by a great team of librarians: Open Access Week 2009.

Molly Kleinman, one of those great librarians, puts it into context for us:

I’m struck by how timely these events are, and how much we could conceivably do under the umbrella of discussing open access and the future of scholarship. … The confluence of circumstances nationally has made this the perfect moment to discuss what’s wrong with existing modes of academic publishing, and to start getting aggressive about making change.

You really should read the rest of Molly’s post for a wonderful explanation of why the current scholarly publishing system is failing for everyone except the Elseviers of the world.

Along with presentations focused on faculty and scholarly publishing models, there is also going to be a talk by my current boss, Nathan Yergler, CTO of Creative Commons. Nathan will be talking about the impact of Creative Commons (CC) licenses on Open Access, what challenges still exist for Open Access, and what the Creative Commons is doing to build and support an ecosystem of openness. Everyone is welcome to join this event, and all the events during Open Access Week. For the details about Nathan’s talk, check out the announcement on the OPEN:Michigan blog.

If you are in the South East Michigan area and are interested in what Michigan is doing to promote Open Access and make it really work, come by for any of the events; there should be a wide enough range to accommodate most interests.

The HathiTrust – A Report for the ALA Office for Information Technology Policy

This past week was Spring Break at the University of Michigan. So I decided to skip the trip to the beach and instead go to Washington DC to work 9-5 for a week. Really.

My school, the School of Information, has this neat program called Alternative Spring Break where students can go work with some really cool organizations in Washington DC, New York, or Chicago. It is an opportunity to go discover if you actually enjoy doing what you are in Graduate School full-time to learn (my words, not theirs). Also, it is a wonderful networking opportunity; I met some really great people last week and whether or not they can help me find a job is secondary.

I specifically worked for the American Library Association’s Office for Information Technology Policy. This is basically the “think tank” for the ALA Washington office. The Washington office also has the people in the Office of Government Relations; the people that go out there and make sure that the libraries’ perspective is heard on Capitol Hill. It is a really important perspective: who else are as big of proponents of open access to knowledge for all people? who else guards your privacy to such a great degree? Librarians are wonderful people to have on your side, but watch out if you do something wrong.

My time at the OITP involved writing a report about the HathiTrust, an endeavor originating at the University of Michigan and the University of Indiana. It is, in the most simple of terms, a long-term digital works preservation project. It is preserving and providing access to all of the digital scans that are being given to the various member Universities from the Google Book Search scannning program and also the libraries’ internal scanning operations. But there are some important implications of the HathiTrust, and that is what I set out to find. I want to give special thanks to John Wilkin, Executive Director of the HathiTrust, for answering my many questions.

If you are curious what the HathiTrust means for you and libraries in general, feel free to read my report: The HathiTrust – A Report for the ALA Office for Information Technology Policy, it is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License, so feel free to share it with whomever.

Scholarly Publishing and Authenticated Reviews

First, a review of a neat new tool that provides a cool function for many academics:

GPeerReview is a very simple Open Source tool that lets you write a review of a work, embed a hash of the work in your review, and sign that review with your digital signature (using your GPG key). The last two things are pretty neat. The hash allows you to be sure that people know which version of a paper you reviewed. Or at least, they will know if the version they have matches the version you had. This would be useful in the case where major changes are made to the paper that contradict your review.

Then, signing your review so that the author (and their publisher/advisor/dean/what have you) knows it is actually from you is pretty neat, and an obvious use of gpg. In fact, GPeerReview is essentially just a wrapper around the GnuPG command-line tool (see the FAQ).

I think this is a pretty interesting tool that could have some great uses, especially if we integrate it with the work-flow of academics (somehow). Step one of that implementation would be to move it from the CLI to some sort of Word/OpenOffice.org plugin. Or, even better, would be to provide a web-based service for this.

Crazy Idea
Launchpad for Scholarly Articles and GPeerReview

Going back to my crazy idea of a Launchpad for Scholarly Articles: basically a service that provides users the ability to link published articles, whether open access or not, with pre-prints or author deposited versions in Institutional Repositories. The killer feature of this service would be to provide a way for people who DON’T have access to the expensive scholarly journals a way to read and be informed via the pre-prints written by the authors that are not restricted by the overzealous journal publishers.

Then, add on the ability for readers of those articles to make comments on and provide useful reviews of the material. Even adding this ability to places like arxiv.org would be great; it provides a mechanism to build community. And as we all know, the community is what makes any service an important resource for people. Without community the service is just a collection of tools.

But, I’ll be honest with you, I don’t know all of the various web-based services out there for scholarly communication; maybe someone has already implemented something like this. Leave a comment if you know of anything out there like this.

Very glad to see this on Slashdot

Background:

I am not one to use Slashdot as a measure of the importance of an issue. I’m sure there is something I could link to right now showing the complete inanity of some stories, but I won’t.

HOWEVER, this just hit the Slashdot homepage: “Non-Profit Org Claims Rights In Library Catalog Data

This is slightly old news and I thought about blogging it before. I tend to try and keep my posts on this blog mostly tech related with an obvious leaning towards Open Source (Ubuntu specifically) since I am on planet.ubuntu. However, I now feel ok to post this now since it is on Slashdot ;).

What’s the Deal?

So, in essence: The OCLC is changing their policies to restrict what their members can do with the bibliographic data which is provided. Bibliographic data is simply a collection of facts (Author, Title, publication date, etc) and is thus not able to be copyrighted. However, there is nothing stopping anyone from restricting what you can do with ANY data via a contract (think: EULA). This is what they are doing, they are stopping their members from sharing this collection of facts with other people who might be able to use those facts. Yes, some people might make a commercial use of those facts, but there are also others who, as nonprofits, are simply trying to make a wonderful product for all of humanity to use.

Ok, that last sentence was slightly over dramatic, but I want to get this point across: the limiting of this knowledge (facts are knowledge) only hurts us as a whole and only helps the OCLC; no one else.

The Code4Lib group, a collection of techies in the Library community, have a nice wiki page with more information on this change of policies, including a diff between the two versions. The page also includes others’ opinions (blog posts) on the matter.

Now, as happens regularly with me on issues related to library policy, others may disagree with me. These others may even be my co-workers and/or bosses. As such, the usual disclaimer of this is only my opinion and no one else’s etc etc applies here.

Google Book Settlement

This is old news now since it happened over a week ago, however, the continued discussion of this settlement is needed and hopefully welcomed.

I have been silent on this settlement on this site due to a few reasons (full disclosure):

  • I was at the Open Content Alliance’s (OCA) yearly meeting in the Presidio of San Francisco when the settlement was announced. As such, I was privy to the private discussions between members of the OCA and others. I didn’t want to say anything I learned there before they had a chance to say it themselves.
  • I work with a very high level administrator at the University of Michigan Libraries. The UofM Libraries are one of the Google Book “Fully Participating Libraries” and as such have a special relationship with Google. This relationship may cause members of the UofM libraries opinions’ of this settlement to be influenced in one direction or another.
  • I have a personal moral preference to the methods of the Open Content Alliance and feel that some of Google’s Terms Of Use (in the contracts signed with libraries) are less than good.
  • There have been many people saying contradictory things about this settlement; everyone couldn’t be right in their analysis. Just like sunlight is the best disinfectant, time is the best producer of truth.
  • The settlement is one-hundred and forty-one (141!) pages long. This doesn’t include the fifteen (15!) attachments to the settlement. This is part of why so many were making false claims, they just didn’t get to the part that explained what would happen in the situation they were talking about.
  • Plus, I was going to be giving a presentation on the Google Library Project for my class on Intellectual Property and Information Law (PubPol 688/SI 519). I decided to wait until after the presentation to post my views. I could have posted a draft of my presentation before to see what sorts of comments I would receive but to be honest, I wasn’t thinking that far in the future. Graduate School does that to me.

 

Here is the presentation I gave yesterday (2008-11-7):

(.odp, .pdf, .ppt)
Unfortunately, for you, my slides don’t contain all of the information I conveyed (because that presentation style sucks). Fortunately, for the students in the class, my slides didn’t contain all of the information I conveyed.

You will notice that my presentation takes a very hard look at the Settlement; I’m not one to see something like this and think it is the best outcome we could have had. Yes, there are some really great things to the settlement but that doesn’t mean I can’t critique the parts that are bad.

A quick example of one of the really great things the Settlement provides: All “Fully Participating Libraries,” libraries that have signed scanning agreements with Google and have had a sizable percentage of their libraries scanned, will have free access to the entire corpus of books Google has scanned. Not just the books that were scanned at that specific library, but the books scanned at all libraries. So, if you are a student at the University of Michigan, University of California, Stanford, or any of the libraries listed in Settlement Attachment G “Approved Libraries” you can be happy about that.

If, however, you are a student at any other university or college you won’t be as happy. Your school, unless it pays the subscription fee (not yet disclosed), will only be able to have a limited number of “terminals” that can be connected to the Google Library; a more correct term would be the Google Bookstore. Even the UofM’s own Paul Courant said this settlement will create the “Universal Bookstore;” he didn’t say “Universal Library.” But I digress….

These other libraries will have a set number of virtual terminals based on the size of their school (1 per 10,000 students or 1 per 4,000 students, depending on the type of school). These are virtual terminals because the access is restricted to a physical computer. The number of computers which have access to the service is a set number, but the computers with access could vary based on demand to any computer within the library.

Issues that I didn’t go into depth in my class presentation that are none-the-less important include:

  • The effective monopoly on the materials that Google now has. Sure, others could join the game, at the $145 million price tag, but since this was a settlement not a legal decision there isn’t a lot of incentive for groups such as the OCA to go into talks with the AAP and Authors Guild.
  • To continue my digression from above: the fact that this is going to be a “Universal Bookstore” not a “Universal Library” is slightly saddening.
    • I don’t have a legal reason to feel sad; the copyright holders have every right to charge for these materials. But I feel like everyone other than Google, the authors, and the publishers are being scammed. Again, not for a legal reason, but for a moral reason:
    • Libraries, through public funding, have been keeping these books safe for the last 70 years. These books, up until the day of the settlement, have had worthless to the publishers and authors. These books are out-of-print and thus all purchases of them have been paid to individuals base don the first-sale doctrine. Now, Google, through its Universal Bookstore, will sell you these books and pay the authors for them. Google will not pay the Libraries who were the ones who made this whole endeavor possible. Sure, the libraries agreed to only get the digital copies back as part of their agreements with Google, but that was before anyone had thought about this possibility. Should those contracts be renegotiated?<end_rant>
  • What Happened to Fair Use?
    • This could possibly be one of my biggest critiques of this settlement: the pure fact that there is a settlement. This was a copyright infringement case brought against Google by two associations, the Associate of American Publishers and the Authors Guild. Google had a fairly good Fair Use argument and may have indeed won the case based on it. This would have been a GREAT THING (most likely). Others would have the same rights as Google as it pertains to the scanning and displaying of books.
    • Now, however, Google is a “special citizen” in this arena; they have “rights” others do not. Is that fair? No. Is that was is best for our future, and the future of libraries? No.

 

Hopefully I don’t sound too negative towards this settlement. Ok, lets be honest, I am pretty darn negative towards it. But hey, that is my job, at least what I see my job being. There are plenty of people out there being paid a large sum of money to tell you how good this settlement is. The ones who are out there telling you how bad it is are most likely not being paid to do so; I’m not.

If you have read this far and are still interested in this topic, you should check out what the rest of the world has been saying about this settlement. A good place to start would be TechDirt’s opinion on the matter. And, the Open Access News blog has posts that summarize others’ opinions in four parts (1, 2, 3, and 4).

EDIT:
Full Disclosure (thanks to Jon for reminding me): I am employed by Creative Commons and through that work have been involved with the OpenLibrary Project. Also, I am employed by Paul Courant, the Dean of Libraries for the University of Michigan. As thus, there may been some conflicting influences on my opinions. I am in a special dual position.

Preservation Entities Should Ignore Copyright

That isn’t me talking, that is the Library of Congress.

The Library of Congress along with the Joint Information Systems Committee (JISC), the Open Access to Knowledge (OAK) Law Project, and the SURFfoundation released a report (pdf) on Monday that basically states just that.

The stated purpose of the report is:

  1. to review the current state of copyright and related laws and their impact on digital preservation;
  2. to make recommendations for legislative reform and other solutions to ensure that libraries, archives and other preservation institutions can effectively preserve digital works and information in a manner consistent with international laws and norms of copyright and related rights; and
  3. to make recommendations for further study or activities to advance the recommendations in the Report.

The key is number 2, “to make recommendations for legislative reform…”  From the release on digitalpreservation.gov:

As the laws of the countries discussed in the report demonstrate, in many cases exceptions and limitations do not accommodate the actions required for digital preservation.

Now, the recommendation doesn’t just simply state that anyone who wants to preserve information can do so.  So no, you won’t have the LOC on your side if you are sued for “preserving” media on your home machine which you do not have legal right to possess it.

From the report:

[These suggestions should] apply to all non-profit libraries, archives, museums and other institutions as may be authorized by national law (hereafter, “preservation institutions”) that are open to the public, provided they do not undertake these activities for any purpose of commercial advantage.

These institutions would be able to (1) reproduce as many copies as necessary for effective preservation, (2) transfer those copies to other formats as standards progress, (3) “communicate” those works within and between various preservation repositories to maintain redundancy.

Why did the Library of Congress et. al produce this report?  Because without some changes to the current status qua of copyright law libraries and archives will be unable to exercise one of their most important roles in our society: preservation.

[In the current US copyright system] there is no specific authorization for libraries and archives to make preservation copies of published works in their collections.

If you are at all interested in learning more about how copyright effects the preservation of our society’s knowledge, you should read the report.  Plus, for those of you who thought that librarians are just quiet subservient employees of the state that don’t speak up for our rights; think again.  Librarians are at the front of cultural freedom as any other group, if not more.

Bug Watches

As a part-time bug triager, I’m always curious of the new tools out there that enable people to work better and more efficiently.  One such new project, which I think has some real potential, is Stephan Hermann’s Leonov project.

Another thing which I just read in my news reader was the fact that Luca Nussbaum added a functionality to Debian’s package overview pages which lets maintainers see what version of the package is in Ubuntu and how many bugs are reported against it in Launchpad.  This seems like a great idea and could even be expanded upon for better results.

My thought process:

A. Launchpad’s ability to watch other bug trackers for the same bug greatly improves the ability of developers to find and fix bugs.

A.1. People really like that ability.

B. Launchpad is only able to do that in a one-way direction (it can’t tell the Debian BTS that it’s bug has been marked “Fix Committed”)

B.1. Putting all of the work on the dev’s/triagers to then go back upstream and report it for every bug is a laudable goal, but as we all know, time is precious for everyone.

C. The ability to get bug data from LP and use it for enabling productivity is there, albeit a little “hacky” (screen scrapping is never fun).

D. Wouldn’t it be cool if other Bug Trackers could watch LP in the same way it watches them?

It seems to me, from both Lucas’ and Stephan’s efforts that doing D is possible right now.  Yes, it would be a ton more easy if Lucas’ and Stephan’s concerns were addressed (text/XML export etc).

I know the Launchpad developers are working right now to implement support for reporting back to other bug trackers certain information but I’m not sure of its progress.

Some Blueprints which might be related but I can not read (they are private): Bugs Remote API and Remote Launchpad Python Library (if you know of any other blueprints or bugs with more information, post them in the comments, please).

Does anyone know of any other bug trackers which are actively working on or at least discussing the ability to grab data from LP (or other BTS)  about certain bugs?

Since it hasn’t been talked about enough already…

So, why should LaunchPad (Malone) be open sourced?*

I’m not going to say because other groups need to use the bug tracking/code hosting/question answering/multi-project-resource unifying features. No, I do believe that it wouldn’t make much sense for there to be multiple Launchpads out there dealing with bugs/code/etc (maybe a little of sense, but not much).

That market is already taken by launchpad.net and others (bugzilla, trac, savannah, et. al.)

Ok, so what market am I looking at? Scholarly communication <BORING!>

Not really boring actually. If you haven’t been paying attention to the scholarly communication world lately, let me tell you, a lot is changing. University libraries are spending more and more money every year on electronic journals. The rate of increase for the same product is higher than that of inflation, for a product which doesn’t improve (can we say monopoly/oligopoly?). In response many institutions (university libraries) are beginning to provide competing services. Full disclosure, my current employer is the Scholarly Publishing Office at the University of Michigan where we publish scholarly journals in an online and Open Access fashion. So, we are providing an alternative to the current commercial publisher vendor lock-in.

What does this have to do with LaunchPad and Open Source Software? Well, we are now in a global situation where there are many many many many open access journals and publications out there. There are some services out there than can help you navigate them, like the Directory of Open Access Journals. But, that service only indexes Open Access journals. Plus, there are now these things called Institutional Repositories, which are collections of preprints and articles and data from the “scholars” in a given “institution” (university, research lab, etc).

Then you have the commercial vendors. They don’t like people looking at their stuff, they don’t play nice with others unless they think they will lose money if they don’t. Libraries are getting better and better at letting their patrons search both sets of journals in one place, but the interface ALWAYS is hideous and creates MANY hoops the user has to jump through. In a word, it is LAME.

I haven’t answered what LP has to do with this yet. I’m getting there, I promise.

What does LaunchPad do really well? Linking various bugtrackers so that people can work together more efficiently to solve problems, right? That was the whole goal of Launchpad, otherwise Ubuntu would have stayed with bugzilla. What is the analogue for the scholarly publishing/communication world? You have those many distinct collections of articles (Open Access journals, Institutional Repositories, and commercial vendors) that do not talk to each other, ever. Yes, there are groups out there trying to improve this situation like the Open Archives Initiative where they are setting metadata standards and standards for transferring that information to others. That is a great thing, but it is only a start.

<The Answer, Finally> If we created a LaunchPad for scholarly works, we could solve many of the beginning access issues associated with this crappy situation. Here’s the idea:

Think of a bug, that is the article in this case. The article (bug) can have a published status like draft version or published in a journal (New, Incomplete, Fix Committed). But for it to even be an article in this Scholar’s LP it needs to have a reference to where it is, physically. So instead of a bug originating in LP and then being linked to other trackers as time goes on, the article needs to have an initial link to some place (OA journal, IR, or Comm. Vendor) using some standard like Digital Object Identifier or Handle.net (which assigns a unique id to object online that can point to any address, so the changing of URLs won’t effect findability).

Then, this article (bug) can also have different versions linked to it. So, example: I publish an article in a prestigious journal, Nature, and I’m proud of it. So, I go to the Scholar’s LP and submit a new article. I give it the DOI or handle.net id and it automagically retrieves the metadata from the article’s current place of residence (that is if the provide it, I might have to fill it in myself). Then it shows up as a new article in the system. My advisor, who thinks the work I did was cool, thinks that my previous drafts before publication are also pretty good. Since the version in Nature is not available to everyone for free, he links the preprint version that resides in my University’s Institutional Repository to my article. That is just like linking to an upstream bug in LP.

Of course, all the metadata is editable and updatable with information like author(s), publication data, place, copyright status (license), etc etc. Plus, if we wanted, we could limit certain metadata elements (like copyright status) to only the article’s author(s), we can do that by verifying emails with respect to what is in the actual article’s author list.

This Scholar’s LP could provide a wonderful unified interface so that “scholars” (define that however you want) can navigate this crazy mess of publishing easily (or at least easier). The “killer app” part of this is the ability to link a published article which is under crappy copyright restrictions to other versions which are available for everyone via institutional repositories or other places, in one place.

There are plenty of fancy cool things which could be done with this model, and I will talk about those later. One example is automatically linking to works cited to another Scholar’s LP or to an external link. But for now, I just wanted to get this idea out there and see if anyone has any comments.

* yes, you are right, we don’t need LaunchPad to be opensourced to do this, it was just a way to get you to read this, sorry.

Gots me a Job, ‘cuz I’m SMRT!

Yes, I have a job now. I currently work for the Scholarly Publishing Office at the University of Michigan Libraries.

Right now it is a pretty mundane job. I convert incoming articles to a standard format (there is a little bit more to it than just Open -> Save As). When I am done with the conversion, they are published! Online! Usually Open Access! Woot!

Yeah, that is the cool part. The SPO does a really cool service. It provides electronic publishing for peer-reviewed scholarly journals, and most times those journals publish as open access. If you don’t know what Open Access is, well, where have you been? It is the Next Big Thing(TM)(R)! Really, if you don’t, read some stuff that Peter Suber has written, for instance his Open Access Overview and his log.

Now I just need to compete with Kathleen (a co-worker) for space in the office!