FSort – Sort large file by lines
Sometimes we need sorting filesystem files by lines (like CSV or JSON) before processing, but it’s too big to do a full load into memory.
FSort will only keep maximum 2 lines in the memory, then comparing, swapping them with custom defined comparator by selection sort algorithm.
CLI use
WIP
Package use
import (
"io"
"errors"
"encoding/csv"
"github.com/francistm/fsort"
)
func lessThanComparator(prev, next []byte) bool {
return bytes.Compare(prev, next) < 0
}
func main() {
file, _ := os.OpenFile(filePath, os.O_RDWR, 0666)
if err := fsort.Sort(file, lessThanComparator); err != nil {
log.Fatal(err)
}
reader := csv.NewReader(file)
for {
line, err := reader.Read()
if errors.Is(err, io.EOF) {
break
}
log.Printf("%+v", line)
}
}
Options
WithCRLF
set line break from LF to CRLFWithSkipLine
set the lines need to be skipped, like CSV headerWithBufferSize
set the buffer size which will be loaded into memory, default is2 * 10^6