node-tar

Build Status

Fast and full-featured Tar for Node.js

The API is designed to mimic the behavior of tar(1) on unix systems. If you are familiar with how tar works, most of this will hopefully be straightforward for you. If not, then hopefully this module can teach you useful unix skills that may come in handy someday :)

Background

A “tar file” or “tarball” is an archive of file system entries (directories, files, links, etc.) The name comes from “tape archive”. If you run man tar on almost any Unix command line, you’ll learn quite a bit about what it can do, and its history.

Tar has 5 main top-level commands:

The other flags and options modify how this top level function works.

High-Level API

These 5 functions are the high-level API. All of them have a single-character name (for unix nerds familiar with tar(1)) as well as a long name (for everyone else).

All the high-level functions take the following arguments, all three of which are optional and may be omitted.

  1. options - An optional object specifying various options
  2. paths - An array of paths to add or extract
  3. callback - Called when the command is completed, if async. (If sync or no file specified, providing a callback throws a TypeError.)

If the command is sync (ie, if options.sync=true), then the callback is not allowed, since the action will be completed immediately.

If a file argument is specified, and the command is async, then a Promise is returned. In this case, if async, a callback may be provided which is called when the command is completed.

If a file option is not specified, then a stream is returned. For create, this is a readable stream of the generated archive. For list and extract this is a writable stream that an archive should be written into. If a file is not specified, then a callback is not allowed, because you’re already getting a stream to work with.

replace and update only work on existing archives, and so require a file argument.

Sync commands without a file argument return a stream that acts on its input immediately in the same tick. For readable streams, this means that all of the data is immediately available by calling stream.read(). For writable streams, it will be acted upon as soon as it is provided, but this can be at any time.

Warnings

Some things cause tar to emit a warning, but should usually not cause the entire operation to fail. There are three ways to handle warnings:

  1. Ignore them (default) Invalid entries won’t be put in the archive, and invalid entries won’t be unpacked. This is usually fine, but can hide failures that you might care about.
  2. Notice them Add an onwarn function to the options, or listen to the 'warn' event on any tar stream. The function will get called as onwarn(message, data). Handle as appropriate.
  3. Explode them. Set strict: true in the options object, and warn messages will be emitted as 'error' events instead. If there’s no error handler, this causes the program to crash. If used with a promise-returning/callback-taking method, then it’ll send the error to the promise/callback.

Examples

The API mimics the tar(1) command line functionality, with aliases for more human-readable option and function names. The goal is that if you know how to use tar(1) in Unix, then you know how to use require('tar') in JavaScript.

To replicate tar czf my-tarball.tgz files and folders, you’d do:

tar.c(
  {
    gzip: <true|gzip options>,
    file: 'my-tarball.tgz'
  },
  ['some', 'files', 'and', 'folders']
).then(_ => { .. tarball has been created .. })

To replicate tar cz files and folders > my-tarball.tgz, you’d do:

tar.c( // or tar.create
  {
    gzip: <true|gzip options>
  },
  ['some', 'files', 'and', 'folders']
).pipe(fs.createWriteStream('my-tarball.tgz')

To replicate tar xf my-tarball.tgz you’d do:

tar.x(  // or tar.extract(
  {
    file: 'my-tarball.tgz'
  }
).then(_=> { .. tarball has been dumped in cwd .. })

To replicate cat my-tarball.tgz | tar x -C some-dir --strip=1:

fs.createReadStream('my-tarball.tgz').pipe(
  tar.x({
    strip: 1,
    C: 'some-dir' // alias for cwd:'some-dir', also ok
  })
)

To replicate tar tf my-tarball.tgz, do this:

tar.t({
  file: 'my-tarball.tgz',
  onentry: entry => { .. do whatever with it .. }
})

To replicate cat my-tarball.tgz | tar t do:

fs.createReadStream('my-tarball.tgz')
  .pipe(tar.t())
  .on('entry', entry => { .. do whatever with it .. })

To do anything synchronous, add sync: true to the options. Note that sync functions don’t take a callback and don’t return a promise. When the function returns, it’s already done. Sync methods without a file argument return a sync stream, which flushes immediately. But, of course, it still won’t be done until you .end() it.

To filter entries, add filter: <function> to the options. Tar-creating methods call the filter with filter(path, stat). Tar-reading methods (including extraction) call the filter with filter(path, entry). The filter is called in the this-context of the Pack or Unpack stream object.

The arguments list to tar t and tar x specify a list of filenames to extract or list, so they’re equivalent to a filter that tests if the file is in the list.

For those who aren’t fans of tar’s single-character command names:

tar.c === tar.create
tar.r === tar.replace (appends to archive, file is required)
tar.u === tar.update (appends if newer, file is required)
tar.x === tar.extract
tar.t === tar.list

Keep reading for all the command descriptions and options, as well as the low-level API that they are built on.

tar.c(options, fileList, callback) [alias: tar.create]

Create a tarball archive.

The fileList is an array of paths to add to the tarball. Adding a directory also adds its children recursively.

An entry in fileList that starts with an @ symbol is a tar archive whose entries will be added. To add a file that starts with @, prepend it with ./.

The following options are supported:

The following options are mostly internal, but can be modified in some advanced use cases, such as re-using caches between runs.

tar.x(options, fileList, callback) [alias: tar.extract]

Extract a tarball archive.

The fileList is an array of paths to extract from the tarball. If no paths are provided, then all the entries are extracted.

If the archive is gzipped, then tar will detect this and unzip it.

Note that all directories that are created will be forced to be writable, readable, and listable by their owner, to avoid cases where a directory prevents extraction of child entries by virtue of its mode.

Most extraction errors will cause a warn event to be emitted. If the cwd is missing, or not a directory, then the extraction will fail completely.

The following options are supported:

The following options are mostly internal, but can be modified in some advanced use cases, such as re-using caches between runs.

Note that using an asynchronous stream type with the transform option will cause undefined behavior in sync extractions. MiniPass-based streams are designed for this use case.

tar.t(options, fileList, callback) [alias: tar.list]

List the contents of a tarball archive.

The fileList is an array of paths to list from the tarball. If no paths are provided, then all the entries are listed.

If the archive is gzipped, then tar will detect this and unzip it.

Returns an event emitter that emits entry events with tar.ReadEntry objects. However, they don’t emit 'data' or 'end' events. (If you want to get actual readable entries, use the tar.Parse class instead.)

The following options are supported:

tar.u(options, fileList, callback) [alias: tar.update]

Add files to an archive if they are newer than the entry already in the tarball archive.

The fileList is an array of paths to add to the tarball. Adding a directory also adds its children recursively.

An entry in fileList that starts with an @ symbol is a tar archive whose entries will be added. To add a file that starts with @, prepend it with ./.

The following options are supported:

tar.r(options, fileList, callback) [alias: tar.replace]

Add files to an existing archive. Because later entries override earlier entries, this effectively replaces any existing entries.

The fileList is an array of paths to add to the tarball. Adding a directory also adds its children recursively.

An entry in fileList that starts with an @ symbol is a tar archive whose entries will be added. To add a file that starts with @, prepend it with ./.

The following options are supported:

Low-Level API

class tar.Pack

A readable tar stream.

Has all the standard readable stream interface stuff. 'data' and 'end' events, read() method, pause() and resume(), etc.

constructor(options)

The following options are supported:

add(path)

Adds an entry to the archive. Returns the Pack stream.

write(path)

Adds an entry to the archive. Returns true if flushed.

end()

Finishes the archive.

class tar.Pack.Sync

Synchronous version of tar.Pack.

class tar.Unpack

A writable stream that unpacks a tar archive onto the file system.

All the normal writable stream stuff is supported. write() and end() methods, 'drain' events, etc.

Note that all directories that are created will be forced to be writable, readable, and listable by their owner, to avoid cases where a directory prevents extraction of child entries by virtue of its mode.

'close' is emitted when it’s done writing stuff to the file system.

Most unpack errors will cause a warn event to be emitted. If the cwd is missing, or not a directory, then an error will be emitted.

constructor(options)

class tar.Unpack.Sync

Synchronous version of tar.Unpack.

Note that using an asynchronous stream type with the transform option will cause undefined behavior in sync unpack streams. MiniPass-based streams are designed for this use case.

class tar.Parse

A writable stream that parses a tar archive stream. All the standard writable stream stuff is supported.

If the archive is gzipped, then tar will detect this and unzip it.

Emits 'entry' events with tar.ReadEntry objects, which are themselves readable streams that you can pipe wherever.

Each entry will not emit until the one before it is flushed through, so make sure to either consume the data (with on('data', ...) or .pipe(...)) or throw it away with .resume() to keep the stream flowing.

constructor(options)

Returns an event emitter that emits entry events with tar.ReadEntry objects.

The following options are supported:

abort(message, error)

Stop all parsing activities. This is called when there are zlib errors. It also emits a warning with the message and error provided.

class tar.ReadEntry extends MiniPass

A representation of an entry that is being read out of a tar archive.

It has the following fields:

constructor(header, extended, globalExtended)

Create a new ReadEntry object with the specified header, extended header, and global extended header values.

class tar.WriteEntry extends MiniPass

A representation of an entry that is being written from the file system into a tar archive.

Emits data for the Header, and for the Pax Extended Header if one is required, as well as any body data.

Creating a WriteEntry for a directory does not also create WriteEntry objects for all of the directory contents.

It has the following fields:

constructor(path, options)

path is the path of the entry as it is written in the archive.

The following options are supported:

warn(message, data)

If strict, emit an error with the provided message.

Othewise, emit a 'warn' event with the provided message and data.

class tar.WriteEntry.Sync

Synchronous version of tar.WriteEntry

class tar.WriteEntry.Tar

A version of tar.WriteEntry that gets its data from a tar.ReadEntry instead of from the filesystem.

constructor(readEntry, options)

readEntry is the entry being read out of another archive.

The following options are supported:

class tar.Header

A class for reading and writing header blocks.

It has the following fields:

constructor(data, [offset=0])

data is optional. It is either a Buffer that should be interpreted as a tar Header starting at the specified offset and continuing for 512 bytes, or a data object of keys and values to set on the header object, and eventually encode as a tar Header.

decode(block, offset)

Decode the provided buffer starting at the specified offset.

Buffer length must be greater than 512 bytes.

set(data)

Set the fields in the data object.

encode(buffer, offset)

Encode the header fields into the buffer at the specified offset.

Returns this.needPax to indicate whether a Pax Extended Header is required to properly encode the specified data.

class tar.Pax

An object representing a set of key-value pairs in an Pax extended header entry.

It has the following fields. Where the same name is used, they have the same semantics as the tar.Header field of the same name.

constructor(object, global)

Set the fields set in the object. global is a boolean that defaults to false.

encode()

Return a Buffer containing the header and body for the Pax extended header entry, or null if there is nothing to encode.

encodeBody()

Return a string representing the body of the pax extended header entry.

encodeField(fieldName)

Return a string representing the key/value encoding for the specified fieldName, or '' if the field is unset.

tar.Pax.parse(string, extended, global)

Return a new Pax object created by parsing the contents of the string provided.

If the extended object is set, then also add the fields from that object. (This is necessary because multiple metadata entries can occur in sequence.)

tar.types

A translation table for the type field in tar headers.

tar.types.name.get(code)

Get the human-readable name for a given alphanumeric code.

tar.types.code.get(name)

Get the alphanumeric code for a given human-readable name.