Review
I can now recommend the curl
manual. It was definitely worth the read. I discovered things about curl
that I didn’t know before and a great deal about how to more effectively use it. The manual contains a lot of information about niche topics like socks, proxies, and FTP servers. Most of which I skimmed. I tried to concentrate on aspects of curl
I thought I would use often.
I chose to read it because I wanted to better understand the capabilities of curl
and to recognize common spots where employing it could save me time and effort. I found the URL syntax documentation especially informative.
Because I work for a company focused on the climate, many of the examples below involve this index of global hourly NOAA weather data. It is organized by years, and USAF and WBAN station ids. Referring to the structure of this directory may help you better understand the examples below.
https://www.ncei.noaa.gov/data/global-hourly/
Basic curl
Output content to stdout
The station id for SFO (San Francisco International Airport) is 72494023234. This request prints the csv data to stdout.
curl https://www.ncei.noaa.gov/data/global-hourly/access/2016/72494023234.csv
Using grep on the output to view data for a particular day.
curl \
https://www.ncei.noaa.gov/data/global-hourly/access/2016/72494023234.csv \
| grep '2016-07-28'
Downloading files with -o
, --output
and -O
, --remote-name
Downloading the SFO data.
mkdir -p scratch/2016/
curl \
https://www.ncei.noaa.gov/data/global-hourly/access/2016/72494023234.csv \
--output scratch/2016/sf-weather.csv
Using the alias -o
instead.
curl \
https://www.ncei.noaa.gov/data/global-hourly/access/2016/72494023234.csv \
-o scratch/2016/sf-weather.csv
Using a progress bar with -#
, --progress-bar
curl
will display a progress meter with the amount of transferred information and how long the request took.
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 6825k 100 6825k 0 0 1988k 0 0:00:03 0:00:03 --:--:-- 1988k
Providing -#
or --progress-bar
arguments will display a progress bar of the transfer instead.
curl \
https://www.ncei.noaa.gov/data/global-hourly/access/2016/72494023234.csv \
-o scratch/2016/sf-weather.csv \
-#
############################################################################### 100.0%
If you don’t want any progress meter, use -s
, --silent
.
API requests with --request
, --data
, and --header
The NOAA website doesn’t provide any use for POST requests, so consider a micro-blogging API which is completely made up. ;-)
curl \
-X POST \
--url 'https://api.microblog.com/1.1/statuses/update.json' \
-d '{"status": "hello"}' \
-H 'Content-Type: application/json' \
-H 'authorization: OAuth ...oauth info...'
Args in this request:
-X
,--request
argument varies the HTTP verb of the request.--url
is the ordinary positional URL argument as a named argument.-d
,--data
is the data to post. In this case a JSON document.-H
,--header
supplies HTTP Headers to the post.
To post data contained in a file, use the @
symbol with the --data
arg:
--data @path/to/big_data_file.json
Making HEAD requests with -I
, --head
Retrieve only the header of a document. Useful when the content of a request is not consequential. I use this sometimes as a simple test of network connectivity to a server, or to see the raw 301 response before a redirected request.
curl \
-I \
https://www.ncei.noaa.gov/data/global-hourly/access/2016/72494023234.csv
Produces:
HTTP/1.1 200 OK
Date: Mon, 26 Aug 2019 04:37:49 GMT
Server: Apache
Strict-Transport-Security: max-age=31536000
Last-Modified: Wed, 19 Dec 2018 17:25:32 GMT
ETag: "ae57fe0-6aa4e2-57d634d14db00"
Accept-Ranges: bytes
Content-Length: 6989026
Content-Type: text/csv
Access-Control-Allow-Origin: *
Access-Control-Allow-Headers: X-Requested-With, Content-Type
Connection: close
Following redirects with -L
, --location
Web pages can be tricky, hitting one might move you to another. Consider the following request to https://google.com.
curl https://google.com
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="https://www.google.com/">here</A>.
</BODY></HTML>
This doesn’t produce the full google web page; it is a redirect to the full google experience. Looking into the response headers we can see the details of this redirect.
curl -I https://google.com
HTTP/2 301
location: https://www.google.com/
content-type: text/html; charset=UTF-8
date: Mon, 26 Aug 2019 04:46:00 GMT
expires: Wed, 25 Sep 2019 04:46:00 GMT
cache-control: public, max-age=2592000
server: gws
content-length: 220
x-xss-protection: 0
x-frame-options: SAMEORIGIN
alt-svc: quic=":443"; ma=2592000; v="46,43,39"
By default curl
isn’t following this redirect to the google home page in the same way your web browser does. The -L
, --location
argument makes the request follow any redirects through to the final destination.
curl -L https://google.com
The response here is a much more complicated HTML page for the google website.
curl -I -L https://google.com
This will show the header responses for both the redirect and the final destination.
Essential curl
Downloading files with remote information.
The -O
, --remote-name
option lets us use file names from the server. Similarly, -R
, --remote-time
attempts to use the servers timestamp of the file if it is available.
Uses the remote name of the file and saves the file to the current working directory.
curl -OR https://www.ncei.noaa.gov/data/global-hourly/access/2016/72494023234.csv
The file is downloaded to ./72494023234.csv
, and ls -l
reveals that even though I am downloading this data in 2019, the last modified time of this file was in 2018.
-rw-r--r-- 1 nackjicholson staff 6989026 Dec 19 2018 72494023234.csv
Storing request header information with -D
, --dump-header
Sometimes when inspecting requests or API responses, you may want to dump header information along with the request.
curl -OR \
-D headers.txt \
https://www.ncei.noaa.gov/data/global-hourly/access/2016/72494023234.csv
Sending form data with -F
, --form
Just do man curl
and search for --form
, and read about it. It’s going to be more informative than anything I can say here.
Features:
- Send files with
@
- Send input fields
- Specify content types per field
- Load content from text fields with
<
Ignore SSL with --insecure
When hitting routes which require SSL curl
will fail.
curl 'https://xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-xxxxxxxxx.us-west-2.elb.amazonaws.com'
curl: (51) SSL: no alternative certificate subject name matches target host name 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-xxxxxxxxx.us-west-2.elb.amazonaws.com'
I encounter this occassionally trying to connect to an AWS load balancer before associating them with a DNS name and a valid SSL cert.
curl --insecure 'https://xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-xxxxxxxxx.us-west-2.elb.amazonaws.com'
You don’t know curl
Sending multiple requests using URL Syntax
The url sent to curl
can be given multiple values to create multiple requests. Using syntax {foo,bar}
brackets creates sets which replace parts of a URL. For example, we can request data for the New York La Guardia Airport (72503014732) and San Francisco International Airport (72494023234) in a single request.
curl \
'https://www.ncei.noaa.gov/data/global-hourly/access/2016/{7294023234,72503014732}.csv' \
-o scratch/sf-weather.csv \
-o scratch/ny-weather.csv
Using two -o
arguments writes two output files, one for each request, in the order they are defined. We can also use [..]
to create ranges of values. The NOAA data is organized into directories by year. We can create a URL that will get us the SFO and LGA data for a sequence of years. We could then use as many --output
arguments as necessary to write all the files out, but there is an easier way by templating the --output
names as well.
curl \
'https://www.ncei.noaa.gov/data/global-hourly/access/[2012-2019]/{7294023234,72 503014732}.csv' \
--create-dirs \
--progress-bar \
--output 'scratch/#1/#2-weather.csv'
This will download 16 files for SFO and LGA, creating a directory structure using --create-dirs
and the names #1
and #2
which correspond respectively to the two substitution parts of our URL in the order they are defined. The first is the range of years, and the second is the set of station ids.
scratch
├── 2012
│  ├── 72503014732-weather.csv
│  └── 7294023234-weather.csv
├── 2013
│  ├── 72503014732-weather.csv
│  └── 7294023234-weather.csv
├── 2014
│  ├── 72503014732-weather.csv
│  └── 7294023234-weather.csv
├── 2015
│  ├── 72503014732-weather.csv
│  └── 7294023234-weather.csv
├── 2016
│  ├── 72503014732-weather.csv
│  └── 7294023234-weather.csv
├── 2017
│  ├── 72503014732-weather.csv
│  └── 7294023234-weather.csv
├── 2018
│  ├── 72503014732-weather.csv
│  └── 7294023234-weather.csv
└── 2019
├── 72503014732-weather.csv
└── 7294023234-weather.csv
8 directories, 16 files
Saving request configurations with -K
, --config
When requests start getting large, difficult to remember, or tedious to type; you can use the --config
feature of curl
to save the request options to a file.
This saves our complicated configuration.
bulk_station_download.txt
url = "https://www.ncei.noaa.gov/data/global-hourly/access/[2012-2019]/{7294023234,72503014732}.csv"
output = "scratch/#1/#2-weather.csv"
--create-dirs
--progress-bar
Now we can perform this request using:
curl -K bulk_station_download.txt
Separate Requests with -:
, --next
This will let you chain posts in order. A sample use for this might be to do a POST creating a resource, and then follow it with GET to make sure it’s accessible.
curl \
-X POST \
-H 'Content-Type: application/json' \
-d @my_thing.json \
http://api.example.com/things \
-: \
http://api.example.com/things/my_thing \
-o my_thing.json
Writing out request information and statistics with -w
, --write-out
This StackOverflow has great information about how to do this, and the curl
manual has a list of the templating variables you can use in your -w
template files.
curl_request_info.txt
time_namelookup: %{time_namelookup}\n
time_connect: %{time_connect}\n
time_appconnect: %{time_appconnect}\n
time_pretransfer: %{time_pretransfer}\n
time_redirect: %{time_redirect}\n
time_starttransfer: %{time_starttransfer}\n
----------\n
time_total: %{time_total}\n
Then make a request. This one doesn’t even download the file, it just pipes it’s content to /dev/null
and shows the stats.
curl -w "@curl_request_info.txt" -s -o /dev/null https://www.ncei.noaa.gov/data/global-hourly/access/2016/72494023234.csv
And you’ll get back something like:
time_namelookup: 0.001
time_connect: 0.037
time_appconnect: 0.000
time_pretransfer: 0.037
time_redirect: 0.000
time_starttransfer: 0.092
----------
time_total: 0.164
Conclusion
There were a few other things I thought looked cool, but didn’t play with.
- Abort slow downloads
-Y
,--speed-limit
in bytes per seconds - Abort slow downloads after a given time
-y
,--speed-time
- Retries with
--retry
Remember to RTFM! Thanks for reading my review.