The Checklist of Open Sourcing a Python Project


I am happy to announce the first release of docxgen, a python library we render surveys in Microsoft Word format inside SurveyMonkey. Here is a quick example how Moby Dick was written:

from datetime import datetime
from docxgen import Document, title, subtitle, h1, paragraph
doc = Document()
  title='Moby Dick', creator='Herman Melville', created=datetime(1851, 1, 1)

body = doc.body
body.append(title('Moby Dick'))
body.append(subtitle('Moby Dick'))
body.append(h1('Chapter 1'))
body.append(paragraph([run('Call me Ishmael. Some years ago ...')]))
... ...'/tmp/moby-dick.docx')

It took me less then 5 business days for coding, but almost three-months my leisure time to finish the open source process. Here is my checklist:

  1. Get senior management approval
  2. Get LCA approval
  3. Develop more comprehensive unit tests for better code coverage, then we are free to refactor in the future.
  4. Write the documents for tutorial, API reference etc with Sphinx.
  5. Rework the APIs with the feedback from step #4.
  6. Port the library to python 3 with the awesome six library.
  7. Test everything with nose and tox.
  8. Release it in github.
  9. Hook up travis-ci for continues testing.
  10. Register in PyPI for general availability.
  11. Host the document in rtfd.

On the contrary, the checklist for a feature is much shorter:

  1. Get the feature spec from the Product Manager.
  2. Draft the technical spec, and get reviewed by the team.
  3. Develop in the dev environment, verify with manual testing.
  4. Check in to the test / staging environment, run acceptance tests with selenium.
  5. Release and profit.

Does this mean the open sourced project generally have better quality than the closed source project?

Yes, but with a high price tag.

The close source projects are most profit-driven. We want to deliver code in reasonably good quality cheaply in both short-term and long-term; so we can move on to the next work items in the overbooked schedule. The quality vs productivity balanced is dictated by the personal skills and corporate culture.

The open source projects, on the contrary, are most likely incentivized by the mastery and autonomy. The success of the open source project is measured by the adoption instead of the hard cold cash, so the documentation MUST be comprehensive for the early adopters to pick up. Developers also tend to follow the best coding practice, even “gold-plate” their code bases because the world is watching us, conceptually.