Download a large list of files in parallel.
# for linux 64bit wget https://github.com/dimkouv/massivedl/releases/download/v1.2/massivedl_linux_amd64 chmod +x massivedl_linux_amd64 mv massivedl_linux_amd64 /usr/local/bin/massivedl
Create a .csv file with the downloads
filename,url 0.png,https://placehold.it/100x100 1.png,https://placehold.it/100x101 2.png,https://placehold.it/100x102 ...
Assuming the file was named
data.csv we can download the files using
massivedl -p 10 -i data.csv -s 1 -o downloads
Command line parameters
-p <int> (default=10) : Maximum number of parallel requests -s <int> (default=0) : Number of skipped lines from input csv -i <str> : Input csv file with the list of urls -o <str> (default='downloads') : Directory to place the downloads
Stop and continue later
You can stop and continue downloading later.
Ctrl+C then you will have the following dialog.
... Do you want to save progress? [Y/n]: yes Progress has been saved! Use the following command to continue downloading massivedl --load /path/to/savedfile.save
With this tool I was able to download about 1.5 million images (~60GB) for a machine learning project.
Subscribe to Golang Example
Get the latest posts delivered right to your inbox