William Vaughn

The curl manual review

Review

I can now recommend the curl manual. It was definitely worth the read. I discovered things about curl that I didn’t know before and a great deal about how to more effectively use it. The manual contains a lot of information about niche topics like socks, proxies, and FTP servers. Most of which I skimmed. I tried to concentrate on aspects of curl I thought I would use often.

I chose to read it because I wanted to better understand the capabilities of curl and to recognize common spots where employing it could save me time and effort. I found the URL syntax documentation especially informative.

Because I work for a company focused on the climate, many of the examples below involve this index of global hourly NOAA weather data. It is organized by years, and USAF and WBAN station ids. Referring to the structure of this directory may help you better understand the examples below.

https://www.ncei.noaa.gov/data/global-hourly/

Basic curl

Output content to stdout

The station id for SFO (San Francisco International Airport) is 72494023234. This request prints the csv data to stdout.

curl https://www.ncei.noaa.gov/data/global-hourly/access/2016/72494023234.csv

Using grep on the output to view data for a particular day.

curl \
  https://www.ncei.noaa.gov/data/global-hourly/access/2016/72494023234.csv \
  | grep '2016-07-28'

Downloading files with -o, --output and -O, --remote-name

Downloading the SFO data.

mkdir -p scratch/2016/
curl \
  https://www.ncei.noaa.gov/data/global-hourly/access/2016/72494023234.csv \
  --output scratch/2016/sf-weather.csv

Using the alias -o instead.

curl \
  https://www.ncei.noaa.gov/data/global-hourly/access/2016/72494023234.csv \
  -o scratch/2016/sf-weather.csv

Using a progress bar with -#, --progress-bar

curl will display a progress meter with the amount of transferred information and how long the request took.

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 6825k  100 6825k    0     0  1988k      0  0:00:03  0:00:03 --:--:-- 1988k

Providing -# or --progress-bar arguments will display a progress bar of the transfer instead.

curl \
  https://www.ncei.noaa.gov/data/global-hourly/access/2016/72494023234.csv \
  -o scratch/2016/sf-weather.csv \
  -#
############################################################################### 100.0%

If you don’t want any progress meter, use -s, --silent.

API requests with --request, --data, and --header

The NOAA website doesn’t provide any use for POST requests, so consider a micro-blogging API which is completely made up. ;-)

curl \
    -X POST \
    --url 'https://api.microblog.com/1.1/statuses/update.json' \
    -d '{"status": "hello"}' \
    -H 'Content-Type: application/json' \
    -H 'authorization: OAuth ...oauth info...'

Args in this request:

To post data contained in a file, use the @ symbol with the --data arg:

--data @path/to/big_data_file.json

Making HEAD requests with -I, --head

Retrieve only the header of a document. Useful when the content of a request is not consequential. I use this sometimes as a simple test of network connectivity to a server, or to see the raw 301 response before a redirected request.

curl \
  -I \
  https://www.ncei.noaa.gov/data/global-hourly/access/2016/72494023234.csv

Produces:

HTTP/1.1 200 OK
Date: Mon, 26 Aug 2019 04:37:49 GMT
Server: Apache
Strict-Transport-Security: max-age=31536000
Last-Modified: Wed, 19 Dec 2018 17:25:32 GMT
ETag: "ae57fe0-6aa4e2-57d634d14db00"
Accept-Ranges: bytes
Content-Length: 6989026
Content-Type: text/csv
Access-Control-Allow-Origin: *
Access-Control-Allow-Headers: X-Requested-With, Content-Type
Connection: close

Following redirects with -L, --location

Web pages can be tricky, hitting one might move you to another. Consider the following request to https://google.com.

curl https://google.com
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="https://www.google.com/">here</A>.
</BODY></HTML>

This doesn’t produce the full google web page; it is a redirect to the full google experience. Looking into the response headers we can see the details of this redirect.

curl -I https://google.com
HTTP/2 301
location: https://www.google.com/
content-type: text/html; charset=UTF-8
date: Mon, 26 Aug 2019 04:46:00 GMT
expires: Wed, 25 Sep 2019 04:46:00 GMT
cache-control: public, max-age=2592000
server: gws
content-length: 220
x-xss-protection: 0
x-frame-options: SAMEORIGIN
alt-svc: quic=":443"; ma=2592000; v="46,43,39"

By default curl isn’t following this redirect to the google home page in the same way your web browser does. The -L, --location argument makes the request follow any redirects through to the final destination.

curl -L https://google.com

The response here is a much more complicated HTML page for the google website.

curl -I -L https://google.com

This will show the header responses for both the redirect and the final destination.

Essential curl

Downloading files with remote information.

The -O, --remote-name option lets us use file names from the server. Similarly, -R, --remote-time attempts to use the servers timestamp of the file if it is available.

Uses the remote name of the file and saves the file to the current working directory.

curl -OR https://www.ncei.noaa.gov/data/global-hourly/access/2016/72494023234.csv

The file is downloaded to ./72494023234.csv, and ls -l reveals that even though I am downloading this data in 2019, the last modified time of this file was in 2018.

-rw-r--r--  1 nackjicholson  staff  6989026 Dec 19  2018 72494023234.csv

Storing request header information with -D, --dump-header

Sometimes when inspecting requests or API responses, you may want to dump header information along with the request.

curl -OR \
  -D headers.txt \
  https://www.ncei.noaa.gov/data/global-hourly/access/2016/72494023234.csv

Sending form data with -F, --form

Just do man curl and search for --form, and read about it. It’s going to be more informative than anything I can say here.

Features:

Ignore SSL with --insecure

When hitting routes which require SSL curl will fail.

curl 'https://xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-xxxxxxxxx.us-west-2.elb.amazonaws.com'
curl: (51) SSL: no alternative certificate subject name matches target host name 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-xxxxxxxxx.us-west-2.elb.amazonaws.com'

I encounter this occassionally trying to connect to an AWS load balancer before associating them with a DNS name and a valid SSL cert.

curl --insecure 'https://xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-xxxxxxxxx.us-west-2.elb.amazonaws.com'

You don’t know curl

Sending multiple requests using URL Syntax

The url sent to curl can be given multiple values to create multiple requests. Using syntax {foo,bar} brackets creates sets which replace parts of a URL. For example, we can request data for the New York La Guardia Airport (72503014732) and San Francisco International Airport (72494023234) in a single request.

curl \
    'https://www.ncei.noaa.gov/data/global-hourly/access/2016/{7294023234,72503014732}.csv' \
    -o scratch/sf-weather.csv \
    -o scratch/ny-weather.csv

Using two -o arguments writes two output files, one for each request, in the order they are defined. We can also use [..] to create ranges of values. The NOAA data is organized into directories by year. We can create a URL that will get us the SFO and LGA data for a sequence of years. We could then use as many --output arguments as necessary to write all the files out, but there is an easier way by templating the --output names as well.

curl \
    'https://www.ncei.noaa.gov/data/global-hourly/access/[2012-2019]/{7294023234,72 503014732}.csv' \
    --create-dirs \
    --progress-bar \
    --output 'scratch/#1/#2-weather.csv'

This will download 16 files for SFO and LGA, creating a directory structure using --create-dirs and the names #1 and #2 which correspond respectively to the two substitution parts of our URL in the order they are defined. The first is the range of years, and the second is the set of station ids.

scratch
├── 2012
│   ├── 72503014732-weather.csv
│   └── 7294023234-weather.csv
├── 2013
│   ├── 72503014732-weather.csv
│   └── 7294023234-weather.csv
├── 2014
│   ├── 72503014732-weather.csv
│   └── 7294023234-weather.csv
├── 2015
│   ├── 72503014732-weather.csv
│   └── 7294023234-weather.csv
├── 2016
│   ├── 72503014732-weather.csv
│   └── 7294023234-weather.csv
├── 2017
│   ├── 72503014732-weather.csv
│   └── 7294023234-weather.csv
├── 2018
│   ├── 72503014732-weather.csv
│   └── 7294023234-weather.csv
└── 2019
    ├── 72503014732-weather.csv
    └── 7294023234-weather.csv

8 directories, 16 files

Saving request configurations with -K, --config

When requests start getting large, difficult to remember, or tedious to type; you can use the --config feature of curl to save the request options to a file.

This saves our complicated configuration.

bulk_station_download.txt

url = "https://www.ncei.noaa.gov/data/global-hourly/access/[2012-2019]/{7294023234,72503014732}.csv"
output = "scratch/#1/#2-weather.csv"
--create-dirs
--progress-bar

Now we can perform this request using:

curl -K bulk_station_download.txt

Separate Requests with -:, --next

This will let you chain posts in order. A sample use for this might be to do a POST creating a resource, and then follow it with GET to make sure it’s accessible.

curl \
  -X POST \
  -H 'Content-Type: application/json' \
  -d @my_thing.json \
  http://api.example.com/things \
  -: \
  http://api.example.com/things/my_thing \
  -o my_thing.json

Writing out request information and statistics with -w, --write-out

This StackOverflow has great information about how to do this, and the curl manual has a list of the templating variables you can use in your -w template files.

curl_request_info.txt

   time_namelookup:  %{time_namelookup}\n
      time_connect:  %{time_connect}\n
   time_appconnect:  %{time_appconnect}\n
  time_pretransfer:  %{time_pretransfer}\n
     time_redirect:  %{time_redirect}\n
time_starttransfer:  %{time_starttransfer}\n
                   ----------\n
        time_total:  %{time_total}\n

Then make a request. This one doesn’t even download the file, it just pipes it’s content to /dev/null and shows the stats.

curl -w "@curl_request_info.txt" -s -o /dev/null https://www.ncei.noaa.gov/data/global-hourly/access/2016/72494023234.csv

And you’ll get back something like:

   time_namelookup:  0.001
      time_connect:  0.037
   time_appconnect:  0.000
  time_pretransfer:  0.037
     time_redirect:  0.000
time_starttransfer:  0.092
                   ----------
        time_total:  0.164

Conclusion

There were a few other things I thought looked cool, but didn’t play with.

Remember to RTFM! Thanks for reading my review.