(sourcecode is to binary as ??? is to ppt/odp/pdf)
Ted Gould just posted to the planet with his presentation that he gave at the Desktop Summit. At the end of his post you’ll notice that he uploaded his presentation to Launchpad (at lp:~ted/presentations/2009_desktop_summit/).
I think that is a great idea! Not only does it provide the ability for the community to see what others are using for their presentations but it allows anyone to branch a presentation, which has awesome potential. Especially with the presentation format that Ted chose, SVGs. The S5 presentation format (XHTML/CSS/JS based) would also be a great candidate for easy branching and editing of presentations.
But what if you need to create presentations with others who use Powerpoint or Impress and you wanted to harness the power of a Version Control System? Old powerpoint (ppt) files are binary blobs which don’t work well in version control systems (they *work* but not *well*). Impress (odp) and new Powerpoint (pptx) files are effectively zipped archives of xml and images. However, since it is zipped, bzr treats it as a binary. I only tested with bzr but don’t foresee any of the other systems behaving any differently.
Why would you want to use a VCS for your presentation files? Especially a DVCS like bzr/git/hg? COLLABORATION!
Some of you may know that I am currently working with Open.Michigan, a project at the University of Michigan that enables the creation of Open Educational Resources (OER). OER is effectively a broader term for the concept of Open CourseWare. Basically, everything used in education is a resource, not just presentations, and thus is useful for others to see, use, and remix. If you are curious to see what kinds of things we produce, see our Educommons installation.
Back to the topic at hand though: presentations and DVCS.
One of the major areas that the OER community could greatly improve upon is the area of remixing; taking the openly licensed materials and using them, adding new material, and creating something original. Remixing, in general, is enabled by having access to the source files of the material being worked with. Sure, you can use a PDF or a mp3 in a remix, but it is usually better to have the original .odt or multitrack file to work from. This is why Open.Michigan provides to the public the ppt files along with the pdfs of the presentations created through the OER program.
But lets leverage some of the tried and true methods of the FLOSS community in the OER community. One of the biggest and most fundamental benefits of the FLOSS world is that everyone has access to the source code, and can easily get it, edit it, and (hopefully) compile a new version of the program; effectively a “remix.” How does the FLOSS community lower the barriers and increase efficiency for that workflow? We provide public access to code repositories, instructions on building the software (documentation), and a bug tracker to inform what needs to be worked on next.
I want to mirror much of that to the OER community. One of the first things that needs to happen is to provide an easy way to manage multiple versions of a single resource (eg: presentation, video/audio, book). A VCS seems like the obvious choice. But there must be a better way than just managing binary blobs, right?
That is the part that I need to figure out next: how to utilize the power of a DVCS in this genre. Then I can move on to figuring out what a bug tracker for OER would look like (and if it is even needed). The documentation is actually already there, at least for Open.Michigan.
Do you have any ideas?



Using ODF files kind of works if you install scripts to unzip them before commit and only check in the unzipped contents. You have to use a formatter for the XML though, because OpenOffice.org does not use newlines (at least it didn’t the last time I tried).
The format, however, is very verbose and manual merges, even though now possible with a text or XML editor, may still be a pain.
Posted by Anonymous on July 7th, 2009.
I think the perfect dvcs format for all fluid formats (docs and presentations) would be HTML5 + CSS3.
Posted by Daeng Bo on July 8th, 2009.
hey, this is slightly off-topic (i.e. not problem-solving your particular problem), but this post made me wonder if I ever told you about Chuck Ransom’s idea for using hip-hop sampling as a way to teach information literacy and highlight the importance of citation in scholarly work. let’s talk about that some time.
I love your description of binary blobs. I have been hoping you will serve as Librations’ copyright specialist, so I thought I’d mention that here, too. we’re going to need help choosing the right CC license… :)
Posted by kdt on July 8th, 2009.
[...] I forgot to ask you all, what do YOU think? http://blog.grossmeier.net/2009/07/07/sourcecodebinarypptodppdf/ [...]
Posted by Greg Grossmeier (greg) 's status on Wednesday, 08-Jul-09 13:26:27 UTC - Identi.ca on July 8th, 2009.
Why not TeX??
I use it for all my documents, starting from papers to all my presentation. With beamer you can create every bit of presentation you get with those WYSIWYG software!
Posted by Aslash on July 8th, 2009.
@Anonymous: I like that idea, actually. At least then you’ll be able to see what kinds of changes where made. For example, if an image was deleted/added in the “images” directory, or which slide was edited (each slide has its own XML file).
@Daeng: I think that would work, too (or like Aslash suggests, TeX). But the main problem I foresee is converting Powerpoint/Impress files _into_ HTML/CSS (or TeX). Being able to use these other formats as a starting point is important because there is so much out there available right now under some Creative Commons license.
@kdt: I’ll ask about Chuck during our next update call. And of COURSE I will serve as Librations Copyright Specialist! :)
@Aslash: I’m not suggesting TeX because no one uses it in my line of work. And by “my line of work” I mean practically every line of work. Sad but true. Yes, it would solve the VCS problem, but solving that problem can not itself create a new problem of “everyone should switch to TeX to create presentations because I want to use a VCS.” See reasoning that I gave Daeng.
@Everyone: I guess I want to be able to take all of these legacy ppt/pptx/odp files and convert them into a HTML/CSS, TeX, or S5 type format, without formating loss, and with the ability to rebuild those ppt/pptx/odp files after modification. Is that so much to ask?!?!
(I think the answer is “yes”)
Posted by Greg on July 8th, 2009.
As mentioned in a few posts on identi.ca, you could use ReStructured Text as your source; rst2s5 exists (and works pretty well); rst2odp (http://pypi.python.org/pypi/rst2odp/) lets you generate ODP “blobs” from the source. You could of course add some additional scripting to convert to PPT, apply your theme of choice, etc.
Posted by Nathan Yergler on July 8th, 2009.
I use latex for my papers and latex-beamer for my presentations and find that git plays very nicely with these tools.
I generate and edit figures as svg files in Inkscape, then export (“compile”) them as pdfs for pdflatex. Committing the svg’s rather than the pdfs works well for tracking small changes (text changes within a figure, or moving objects relative to each other) but tends to look like blob comparisons for large changes. In this context (and probably the odf context as well) it would be useful to have a tool for putting an xml file in a “normalized” form to give, for example, the same branch ordering for similar files.
@greg re: legacy presentations. I think there are two useful assets here, both of which are not too hard to convert:
1) The structure and text of the presentation.
This can be converted to TeX or HTML by hand about as quickly as
you can read the presentation if you can touch type.
2) Reusable images and diagrams. Saving these as png and/or pdf
would be pretty straightforward from openoffice/inkscape and
would be sufficient to make them available for TeX/HTML
presentations.
I do both of the above to cannabalize my old odp presentations for my new latex-beamer presentations. These steps are not rate-limiting relative to thinking about the presentation and choosing the bits that I want to reuse.
I’ve switched presentation tools every couple years since 1999 (from marking up overhead transparencies with sharpies, to powerpoint, to HTML, to openoffice, to latex-beamer). The learning difficulties for each tool were pretty similar (ok, transparencies win), and generally easier than the graphic design, writing, and public speaking aspects.
Posted by Mark on July 8th, 2009.
It’d be great to have some kind of OpenOffice integration with VCS. Something like MS Word’s change control + the collaborative part.
However, right now, I think your best chance is using a plain-text format:
* LaTeX. If you need to do an high-quality publication. There’s the problem of learning curve, though.
* An XML-based format like DocBook.
* A plain-text format like ReStructured Text or Markdown. If you don’t need to do too fancy things with the presentation, these will be extremely easy to learn for any collaborator and you can produce output in TeX, PDF and HTML (which can give great results with some good CSS).
I’m inclined to chose the last option for most stuff.
Posted by Owo on July 9th, 2009.
As for your metaphor, this works quite well, actually:
sourcecode : binary :: ps : pdf
PostScript files are distilled (akin to compilation) into PDFs, but you can never positively reconstruct the PS when given only the PDF. PostScript is a Turing-complete programming language, whereas a PDF file is just the result of the compilation.
Posted by Kevin on August 21st, 2009.