Download url with aria2
Published:
Prerequisites
- aria2
- POSIX shells such as
bash
,zsh
.. and CLI.
Introduction
Today, I was given a task to download a list of a lot of URL store in the CSV file, and in this file, there are two columns called respectively Name
and URL
as shown below (Thanks to online CSV tool):
Name | URL |
---|---|
shakira.jpg | https://www.biography.com/.image/t_share/MTE5NDg0MDU0NzExNDA0MDQ3/shakira-189151-1-402.jpg |
eminem.jpg | https://i.pinimg.com/474x/19/63/da/1963daa666a8030047e2a9f13beb6975.jpg |
cristiano-ronaldo.jpg | https://files.thehandbook.com/uploads/2019/03/ronaldo.jpg |
My task requirement is, in each row, I have to download file from the given URL
and save it with the corresponding name on the left side. This is such an interesting task, as you can see if you do it manually with a series of repeated actions include enter, copy and paste, you even can’t estimate how long does it take to finish a long list of URLs, and thus the idea is using the automation tool in order to address the tedious one. Ok, let’s get started.
Tooling
The first idea that came to my mind was using one of the programming languages such as Java
or Bash
but it seems to take more time for a clumsy solution.
BufferedReader reader = new BufferedReader(new FileReader("urls.csv"));
List<String> data = reader.lines().collect(Collectors.toList());
data.forEach(i -> {
String[] line = i.split(",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)", -1);
String name = line[0];
String urlStr = line[1];
try(InputStream in = new URL(urlStr).openStream()){
Files.copy(in, Paths.get(name), StandardCopyOption.REPLACE_EXISTING);
} catch (IOException e) {
e.printStackTrace();
}
});
So after spent a couple of hours tirelessly googling, I eventually found out a tool that can address my problem by one-liner command
. This is aria2c, the ultra-fast download utility.
Solution
aria2c
supports so-called option lines
feature in input files. From man page
-i, –input-file=
Downloads the URIs listed in FILE. You can specify multiple sources for a single entity by putting multiple URIs on a single line separated by the TAB character. Additionally, options can be specified after each URI line. Option lines must start with one or more white space characters (SPACE or TAB) and must only contain one option per line.
Later on
These options have exactly same meaning of the ones in the command-line options, but it just applies to the URIs it belongs to. Please note that for options in input file – prefix must be stripped.
Later on
-o, –out=
The file name of the downloaded file. It is always relative to the directory given in –dir option. When the –force-sequential option is used, this option is ignored.
If you still don’t have any idea about which solution would be, let me summarize all it up into step by step
Firstly, in order to use
option lines
feature, we need convert ourCSV
file to the new format.https://www.biography.com/.image/t_share/MTE5NDg0MDU0NzExNDA0MDQ3/shakira-189151-1-402.jpg out=shakira.jpg https://i.pinimg.com/474x/19/63/da/1963daa666a8030047e2a9f13beb6975.jpg out=eminem.jpg https://files.thehandbook.com/uploads/2019/03/ronaldo.jpg out=cristiano-ronaldo.jpg Luckily, there are linux command called
sed
aka Stream Editor can convert the old one to the right new formatsed [options] commands [file-to-edit]
sed -E 's/([^,]*),(.*)/\2\n out=\1/' file.csv
Let’s take a moment to examine this command in detail:
-E
POSIX Extended Regular Expression's/([^,]*),(.*)/\2\n out=\1/'
indicate that we will substitute text string in each line in the CSV file to another usings
command (as in substitute) with the following form:s/SEARCH_REGEX/REPLACEMENT
. In our example,SEACH_REGEX
is([^,]*),(.*)
contains two parts delimited by comma, so-called respectively the first captured group (matched name’s value)([^,]*)
and the second captured group (matchedURL
’s value)(.*)
,REPLACEMENT
is\2\n out=\1
, we just rearranged these captured group in appropriate positions.
Next step, we use
aria2c
finished our tasksed -E 's/([^,]*),(.*)/\2\n out=\1.jpg/' file.csv | aria2c -i -
- The above command indicate that the output of command (aka
STDOUT
)sed -E 's/([^,]*),(.*)/\2\n out=\1.jpg/' file.csv
will be use as an input (akaSTDINT
) ofaria2c -i
command.Every thing in Linux is a file, therefore
STDIN
,STDOUT
andSTDERR
is every important to using CLI effectively.
- The above command indicate that the output of command (aka
For further reading:
https://gist.github.com/GAS85/79849bfd09613067a2ac0c1a711120a6
Thank you :blush:.