By Bryan Patrick Wood, Senior Data Scientist

Creating a Python package from scratch is annoying. There is no standard library tooling to help. There is no authoritative take on folder structure. So sling something into a single file script or Jupyter notebook to languish within Untitled7.ipynb to avoid the hassle. This did the job it needed to. Then it needs to be shared and reused …

All of this can be just enough friction to delay starting on a new idea. At least that has been the case for me. Let’s even say a particularly motivated mood strikes. Putting together the project structure will be error-prone and require more effort searching the internet for arcane setup.cfg incantations than writing the actual code for the idea. Maybe that’s all you have time to get done before it’s off to other activities. 

 

Or worse: you don’t even get that far.

 

Not a great use of time. This should be the easy part!

As a result of going through the process of spinning up a few new projects recently I decided to take the time to better understand the Python packaging ecosystem and create a project boilerplate template as an improvement over copying a directory tree and doing find and replace.

 

Why?

 
 
Is this reinventing the wheel? A valid question. Certainly somebody has already done this drudgery you say. And you’d be right. A quick web search will turn up pages of project templates. Same with github and pypi. So the question remains. There are some good reasons in this case, some reasons I did not reach for something already available.

 

PyScaffold Logo seems particularly popular.

 

First, I had been reading a book that went into detail on the topic and felt like applying what I was learning: this is always a good reason.

 

 is excellent and highly recommended (DISCLAIMER: I took this opportunity to look into affiliate marketing through Amazon so the link above is one; if you care, it’s easy enough to do an internet search on “serious python” and bypass).

 

Second, this is the type of task a Python expert certainly should be comfortable executing; somewhat ironically, it’ll also often be a task that is already taken care of at a company or on a project unless you’re involved at a very early stage.

Third, I can’t find the quote to do proper attribution unfortunately, but I recall reading something I’ll paraphrase that resonated with me

Don’t use anything you can’t take the time to learn well.

Whether it’s a 4,000+ line .vimrc file or a project template like this a time will almost certainly come when you need to change something. That’s when the inevitable technical debt comes due and pay you will. My experience has been that adding just what you need (and understand) and iterating over time is always a better strategy.

 

Definitely not another case of …

XKCD Standards

 

Lastly, as I became more engrossed in the details of the endeavor, the point became to be more opinionated especially with respect to dependent packages. I wanted something a coworker, colleague, collaborator, etc., could use immediately with my recommended dependencies for various different types of tasks. Turns out this is straight forward to bake in.

 
Cookiecutter Logo
 

There are many possible approaches but the one I had already some familiarity with in the Python ecosystem was cookiecutter. From their messaging cookiecutter is

A command-line utility that creates projects from cookiecutters (project templates), e.g. creating a Python package project from a Python package project template.

Familiarity in the sense that I had used someone else’s cookiecutter template before.

 

Using a cookiecutter someone else has created is trivial as detailed in the documentation

cookiecutter https://github.com/bpw1621/ordained

 

or

 

when the template has been pulled down already.

 

cookiecutter gh:bpw1621/ordained
is shorthand for accessing a github hosted cookiecutter template.

 

Typically, you are greeted with a few questions to configure details about the project template instantiation and then off to the races. For instance, the ordained cookiecutter template prompts as follows

Ordained configuration questions

 

And they have been improving, viz., here.

 

The defaults attempt to be sane and minimize redundant data entry. At this point a fully functional Python package has been created and the initial boilerplate version controlled in git. Since the options for specifying the type of virtual environment one wants at this point are a little complex that next step is left out of the automation (at least at the moment, viz., below).

 

I only needed to specify two options: the project name and description. Anyone other than me would have to enter all of their personal information but that can be handled with cookiecutter’s support for user config.

 

Not sure the Python community has coalesced around cookiecutter as the solution, but it’s at least a cut above copying an existing project and editing the various parts. Having used tools in other programming languages (e.g., Yeoman in Javascript) there is room for improvement. That said, one of my favorite quotes is

The perfect is the enemy of the good.

Le mieux est l’ennemi du bien.
(The perfect is the enemy of the good.)

—Voltaire, Philosophical Dictionary

Since the virtual environment creation is not automated a good default choice, after creating and activating the project’s virtual environment, is pip install -e .[base,dev,doc,test]. This will pull in those dependencies I typically do not want to live without as a matter of quality of life (i.e., base), those integral for development (i.e., dev), those needed to generate documentation (i.e., doc), and those needed to test (i.e., test). Including any of the other requirements groups will depend on what the project is trying to accomplish.

 

So What?

 
 
A large part of the opinionated aspect resides in the specification of recommended project dependencies. This is accomplished using setuptools support for options.extras_require to provide groups of dependencies to pull in covering different topics. Those groups are specified in a requirements group dictionary as part of the cookiecutter JSON configuration. Here’s the snippet from cookiecutter.json

Ordained cookiecutter requires configuration

 

Configuration keys in cookiecutter with two leading underscores stay part of the context but are suppressed from the initial configuration options provided to the user. This is unfortunately still an unreleased feature (as of cookiecutter 1.7.3) so using ordained requires installing cookiecutter from the HEAD of master (pre-release version of 2.0.0 as of this writing).

 

These topic based recommendations are very much a work in progress. It is largely informed, at the moment, by what I have been working on most recently and there are clearly large gaps. A hope is that as folks use this that it will be a wellspring of suggestions as to the Python packages I am not even aware that should be included as well as better alternatives to those I have grown to rely on. I will put aside why I made these choices for a future post after the recommendations are a little more fully fleshed out. At any rate, if you have your own dependency package recommendations it is trivial to fork the project and change a single JSON object in the top-level cookiecutter.json with them.

 

There is some unseemly vanity in sharing some of these gory technical details just because I think they’re clever. That said, it did take me sometime (viz., the - all over the place) to get it quite right given I had never written a complex Jinja2 template like this before.

 

The requirements group dictionary is used in a Jinja2 template to generate setup.cfg in the project

Ordained cookiecutter project setup.cfg

which creates the requirements groups lexicographically sorted with a special all group for the kitchen sink.

 

This could have been jammed inline in the project template, but I think it is cleaner to leave it here and less digging through the guts of the template to make additions and modifications.

 

Here are some other capability highlights provided directly out of the box

  • src directory structure (viz., Packaging a python library)
  • project configuration pushed to setup.cfg (i.e., trivial setup.py and no requirements.txt)
  • An example console script
  • pytest configuration and example test under tests (i.e., outside of src)
  • towncrier biolerplate for development / release note generation
  • Minimal Dockerfile for containerized development and deployment
  • Sphinx documentation boilerplate and a Makefile to automatically generate Sphinx API documentation
  • A bunch of other boilerplate configuration files including
  • tox.ini supporting multiple Python version development and testing
  • .editorconfig exported from Pycharm settings
  • .gitignore generated by gitignore.io

 

Now What?

 
 
I’ll be dogfooding this, but I would love feedback if anyone else decides to give it a spin. Drop me a comment on the blog or the project. Github pull requests welcome.

 
Bio: Bryan Patrick Wood (@bpw1621) is a Senior Data Scientist, and is leading a data science team tackling some of the most important challenges facing the nation. Check out his personal website for more.

Original. Reposted with permission.

Related:





Source link

Leave a Reply

Your email address will not be published.