Avoiding Dependencies

Andy McKay

1

Guest post on behalf of Matt Basta.

Avoid code duplication and reusing code is always an admirable goal. However, in some occasions, it’s not a bad idea to duplicate a little bit of code in order to make your software better.

An Example

When writing a Node app, accessing a directory listing is fairly simple: the fs.readdir command provides a list of objects in a directory, and a call to fs.stat will tell you whether each object is a directory or a file.

That’s fairly straightforward, but recursing (especially when using callbacks) can require some mental yoga. Each directory needs to keep track of how many subdirectories it expects to receive callbacks for and results need to be aggregated in pieces rather than sequentially.

When I needed a recursive directory listing this past week, I was tempted to use a library: why solve a solved problem? The glob library provides this functionality with very little effort:

Why would you use anything else? Here’s what I saw when I did an npm install glob:

  • glob
    • minimatch
    • lru-cache
  • sigmund
  • grafeful-fs
  • inherits

Simply by adding glob to my project, I’ve added a total of six modules. That’s not unexpected, though, since glob does a whole lot more than I need it to do. In the end, I ended up simply writing my own, and including it in the single file that requires the functionality:

Why are dependencies not good?

First, and most importantly, is the security of those packages. If the SSH keys of the developer of those packages are compromised, an attacker could provide an “updated” version of the library which includes malicious code. At Mozilla, we use internal mirrors of PyPi and NPM to make sure we’re not installing arbitrary modules. Avoiding dependencies helps to avoid requiring a solution.

Second, there’s a performance hit for libraries. If you’re building a one-off fix for a simple problem, using a library increases the overhead of your app’s load and use. If you’re building a web app on Node or Django, you might use an auto-reloader to restart your local server when script files change. In many projects, this takes an imperceptibly small amount of time to reload. As your codebase and dependency list grows, your auto-reloader can take precious seconds to run (we have projects that take almost five seconds to reboot).

Each dependency increases the time it will take to install your application. Zamboni (the code behind addons.mozilla.org and marketplace.firefox.com), for instance, takes 12 minutes to install libraries and dependencies. What starts out as a small number of libraries can very quickly grow out of control and become a serious nuisance. time npm install glob, to reference my earlier example, tells me that glob adds four seconds to my install time. Some simple and unscientific benchmarks from popular node libraries:

  • socket.io: 33s
  • mongoose: 9.5s
  • connect: 5.7s
  • jade: 5.3s
  • express: 4.4s
  • glob: 3.9s

And some from popular Python libraries (run with time pip install <module>):

  • numpy: 110s
  • django: 96s
  • pycrypto: 18s
  • flask: 14s

Third, you’re taking the gamble that the libraries that you depend on don’t have conflicting versions. A great example of this can be illustrated by a web app that I was building: the HTTP requests that I was creating with Python’s requests library didn’t have a .json property, as the docs state. An hour of hair-tearing later, I discovered that another library that I was using had already installed a much older version of requests which lacked the property. Unfortunately, the library wasn’t compatible with the latest version of requests and I had to settle for the old version.

The consequences could have been much worse, and this is a not-well-solved problem (for the Python community, at least).

Last, if your project is used as a library, including dependencies decreases the re-usability of your code. If your code uses a third-party library when it could have taken advantage of some slightly-gnarly standard library tools, all of the above reasons make it that much more difficult for a developer to justify using your tool.

Now hold on just a minute.

Does this mean you should never use an external library? Not at all! You should use libraries whenever it’s appropriate. Sometimes the effort required to perform a task correctly and thoroughly (or at all) is simply too much work to do on your own. OAuth? You’ll probably want a library. Database work or ORM? Use a library. But what if you need to serve a single static HTML file over HTTP? You can probably use Node’s http or Python’s SimpleHTTPServer without too much heartache.

Sometimes it’s just irresponsible to write code yourself: would you trust a developer’s one-off method or a mature ORM’s SQL sanitization code? What about code to generate secure random numbers for encrypting sensitive data? Or code that protects against XSS attacks? There are a lot of instances where it’s in your users’ best interest to use a trusted solution rather than your own solution.

A double-edged sword.

Another reason for choosing libraries is the community: if there are bug fixes to a third-party library, dependents can relatively easily update to the latest version of the library and take advantage of the improvements immediately.

That can be a good thing and a bad thing, though. Libraries without a community around them can contain bugs that don’t get patched in a timely manner. Fixes may be difficult or impossible to upstream.

TL;DR

  1. If you can easily get by without a dependency, you don’t need the dependency.
  2. If you’re building a library, you should avoid dependencies.
  3. Don’t avoid dependencies if it means potentially putting your users at risk.

One response

  1. Max Ogden wrote on ::

    I think dependencies are awesome and I use as many of them as possible, for reasons like avoiding NIH, making friends by forking github repos and many other reasons. Some things to bring up:

    On security: http://nodesecurity.io/ is a new initiative to audit popular vulnerabilities. Rather than avoid using others code this initiative encourages open source collaboration to fix things.

    On security and speed: You can run local NPM registries since it is based on CouchDB which replicates (IrisCouch even offers hosted private NPMs) which helps solve the security issue as well as the install time. If NPM is running on your laptop or in your office then the install latency will be really low.