ocfl-index

ocfl-index is a command line tool for indexing OCFL Storage Roots. It can be used to index and query the logical state of objects in a storage root. The index is stored as a sqlite3 database (see sqlite/schema.sql for details).

This is work in progress.

Usage:
  ocfl-index [command]

Available Commands:
  benchmark   benchmark indexing with generated inventories
  help        Help about any command
  index       index an OCFL storage root
  query       query the index

Flags:
  -f, --file string   index filename/connection string (default "index.sqlite")
  -h, --help          help for ocfl-index

Use "ocfl-index [command] --help" for more information about a command.

Indexing

You can index OCFL storage roots on the local filesystem or an S3 object store.

# index a storage root locally
ocfl-index index --dir ~/my/root

# index a storage root on s3 (first set environment variables)
export AWS_ACCESS_KEY_ID= ... 
export AWS_SECRET_ACCESS_KEY=...
export AWS_REGION=...
export AWS_S3_ENDPOINT="http://localhost:9000" # for non-aws S3 endpoint
ocfl-index index --s3-bucket my-bucket --s3-path store-prefix

Querying

To query, use the query [object-id] [path] subcommand. The path should be a relative path (using / as a separator) referencing a file or directory in the object. Use the -v flag to query the object at a particular version.

# list all objects in the index
ocfl-index query

# list all versions in an object
ocfl-index query object-id

# list names of files and directories in the root of an object's most recent version
ocfl-index query object-id "."

# list names in the 'foo' directory of the object's first version
ocfl-index query object-id "foo" -v v1

Benchmarking

The benchmark command can be used to get a sense of the performance characteristics of the index. It uses generated inventories with randomized states to build the index, measuring average times for index and query operations. It’s also useful for getting a sense of how the index file grows in size as you add inventories.

# example with 1000 inventories
ocfl-index benchmark --size 100 --num 1000

indexing 1000 generated inventories (1-4 versions, 100 files/version)
indexed 1000/1000 (0.16 sec/op avg)
queried 99 paths (0.0004 sec/op avg)
benchmark complete in 164.5 sec

S3 Config

AWS credentials can be set with the aws cli. You may also use the following environment variables:

# Access Key ID
AWS_ACCESS_KEY_ID= ... 
# Secret Access Key
AWS_SECRET_ACCESS_KEY=SECRET
# Region
AWS_REGION=us-east-1
# for non-aws S3 endpoint
AWS_S3_ENDPOINT="http://localhost:9000"

GitHub

View Github