FSort – Sort large file by lines

GoTest

Sometimes we need sorting filesystem files by lines (like CSV or JSON) before processing, but it’s too big to do a full load into memory.

FSort will only keep maximum 2 lines in the memory, then comparing, swapping them with custom defined comparator by selection sort algorithm.

CLI use

WIP

Package use

import (
    "io"
    "errors"
    "encoding/csv"
    "github.com/francistm/fsort"
)

func lessThanComparator(prev, next []byte) bool {
    return bytes.Compare(prev, next) < 0
}

func main() {
    file, _ := os.OpenFile(filePath, os.O_RDWR, 0666)

    if err := fsort.Sort(file, lessThanComparator); err != nil {
        log.Fatal(err)
    }

    reader := csv.NewReader(file)

    for {
        line, err := reader.Read()

        if errors.Is(err, io.EOF) {
            break
        }

        log.Printf("%+v", line)
    }
}

Options

  • WithCRLF set line break from LF to CRLF
  • WithSkipLine set the lines need to be skipped, like CSV header
  • WithBufferSize set the buffer size which will be loaded into memory, default is 2 * 10^6

GitHub

View Github