8 Working with Data and APIs from the Command Line
Modern software development is largely about data in motion. APIs returning JSON. Webhooks delivering payloads. Databases exporting CSVs. Log pipelines emitting structured events. Configuration files in YAML and TOML. Data files in formats ranging from clean CSV to deeply nested JSON to XML that looks like it was designed to cause suffering.
Most developers handle this data through GUI tools — Postman for API testing, TablePlus for database inspection, Excel for CSV manipulation. These tools are fine for what they are. But the terminal gives you something none of them can: the ability to compose data operations into pipelines, automate them, and apply the same transformation to one file or ten thousand files with equal ease.
This chapter covers the terminal toolkit for working with data and APIs: curl for HTTP, jq for JSON, and a collection of tools for CSV, YAML, and other formats you’ll encounter in the wild. By the end of it, you’ll be able to explore, transform, and manipulate data from the command line as fluently as you manipulate code.
8.1 curl: The Universal HTTP Client
curl is the command-line tool for making HTTP requests. It supports every HTTP method, every authentication scheme, file uploads, cookies, redirects, and dozens of other features. It’s available on virtually every system, which means a curl command you write today will work on any server you SSH into for the foreseeable future.
8.1.1 Basic requests
curl https://api.example.com/users # GET request
curl -s https://api.example.com/users # silent mode (no progress bar)
curl -sS https://api.example.com/users # silent but show errorsThe -s flag suppresses the progress bar that curl shows by default — essential when piping output to jq or other tools. -sS keeps errors visible while suppressing progress.
8.1.2 HTTP methods
# POST with JSON body
curl -X POST https://api.example.com/users \
-H "Content-Type: application/json" \
-d '{"name": "Alice", "email": "alice@example.com"}'
# PUT
curl -X PUT https://api.example.com/users/42 \
-H "Content-Type: application/json" \
-d '{"name": "Alice Updated"}'
# PATCH
curl -X PATCH https://api.example.com/users/42 \
-H "Content-Type: application/json" \
-d '{"email": "newemail@example.com"}'
# DELETE
curl -X DELETE https://api.example.com/users/428.1.3 Headers and authentication
# Custom header
curl -H "X-API-Key: your-key-here" https://api.example.com/data
# Bearer token authentication
curl -H "Authorization: Bearer eyJhbGc..." https://api.example.com/data
# Basic authentication
curl -u username:password https://api.example.com/data
# Multiple headers
curl -H "Content-Type: application/json" \
-H "Authorization: Bearer token" \
-H "X-Request-ID: abc123" \
https://api.example.com/data8.1.4 Reading request body from a file
For large or complex request bodies, putting the JSON inline gets unwieldy. Read it from a file instead:
curl -X POST https://api.example.com/users \
-H "Content-Type: application/json" \
-d @payload.jsonThe @ prefix tells curl to read the body from a file. This is also useful for sending binary data or form submissions.
8.1.5 Response inspection
By default curl only shows the response body. To see more:
curl -i https://api.example.com/users # include response headers
curl -I https://api.example.com/users # HEAD request (headers only)
curl -v https://api.example.com/users # verbose: full request and response
curl -w "\nStatus: %{http_code}\nTime: %{time_total}s\n" \
-s https://api.example.com/users # custom output formatThe -w flag (write-out) is particularly powerful. It lets you print specific response metadata after the request:
curl -s -o /dev/null \
-w "Status: %{http_code} | Time: %{time_total}s | Size: %{size_download} bytes\n" \
https://api.example.com/users-o /dev/null discards the body; -w prints just the metrics you care about. This is the pattern for quick performance checks and health monitoring scripts.
8.1.7 Downloading files
curl -O https://example.com/file.zip # save with original filename
curl -o custom-name.zip https://example.com/file.zip # save with custom name
curl -C - -O https://example.com/large.zip # resume an interrupted download8.1.8 Using a config file for repeated requests
When you’re working repeatedly with the same API during development, a curl config file saves you from typing the same headers every time:
# ~/.curlrc or project-specific curl.config
header = "Authorization: Bearer your-token"
header = "Content-Type: application/json"
silentcurl -K curl.config https://api.example.com/users8.2 jq: JSON as a First-Class Citizen
We introduced jq briefly in Chapter 2. Here we go deep. jq is a complete query and transformation language for JSON — think of it as sed and awk combined, but designed specifically for structured data.
8.2.1 Installation
brew install jq # macOS
apt install jq # Ubuntu8.2.2 The basics
echo '{"name":"Alice","age":30}' | jq '.' # pretty-print
echo '{"name":"Alice","age":30}' | jq '.name' # extract field
echo '{"name":"Alice","age":30}' | jq '.age + 1' # compute valueThe . identity filter passes input through unchanged but pretty-prints it. Field access uses dot notation: .name, .user.profile.email, .config.database.host.
8.2.4 Filtering arrays with select
# Users with role "admin"
cat users.json | jq '.[] | select(.role == "admin")'
# Items with price over 100
cat products.json | jq '.[] | select(.price > 100)'
# Multiple conditions
cat users.json | jq '.[] | select(.role == "admin" and .active == true)'
# Using regex in select
cat users.json | jq '.[] | select(.email | test("@example\\.com$"))'8.2.5 Transforming and reshaping
This is where jq becomes genuinely powerful. You can reshape JSON into entirely different structures:
# Extract specific fields into a new object
cat users.json | jq '.[] | {id: .id, email: .email}'
# Rename fields
cat users.json | jq '.[] | {userId: .id, emailAddress: .email}'
# Add computed fields
cat orders.json | jq '.[] | . + {total: (.price * .quantity)}'
# Build a flat array from a nested structure
cat users.json | jq '[.[] | .email]'
# Convert array to object keyed by id
cat users.json | jq 'map({(.id | tostring): .}) | add'8.2.6 Aggregation
# Count items
cat users.json | jq '[.[] | select(.active)] | length'
# Sum a field
cat orders.json | jq '[.[].amount] | add'
# Average
cat metrics.json | jq '[.[].response_time] | add / length'
# Min and max
cat metrics.json | jq '[.[].response_time] | min, max'
# Group by a field
cat orders.json | jq 'group_by(.status)'
# Count by status
cat orders.json | jq 'group_by(.status) | map({status: .[0].status, count: length})'8.2.7 Practical patterns with curl and jq
The real power of jq in development is as the downstream processor of API responses:
# Fetch users and list their emails
curl -s https://api.example.com/users | jq '.[].email'
# Find users created in the last 7 days
curl -s https://api.example.com/users | \
jq --arg date "$(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%SZ)" \
'[.[] | select(.createdAt > $date)]'
# Get a specific user and format output
curl -s https://api.example.com/users/42 | \
jq '{name: .name, email: .email, roles: .roles | join(", ")}'
# Extract pagination info alongside data
curl -s "https://api.example.com/users?page=1" | \
jq '{total: .meta.total, page: .meta.page, users: [.data[].email]}'8.2.8 Output formatting
jq -r '.name' # raw output (no quotes around strings)
jq -c '.' # compact output (single line)
jq -r '.[] | .name' users.json | sort # raw output suitable for pipingThe -r flag is important when piping jq output into other commands — without it, strings are quoted, which causes issues with tools that expect plain text.
8.2.9 Building JSON with jq
jq isn’t just for reading JSON — it can create it:
# Build a JSON object from shell variables
NAME="Alice"
EMAIL="alice@example.com"
jq -n --arg name "$NAME" --arg email "$EMAIL" \
'{name: $name, email: $email}'
# Build from command output
jq -n \
--arg hostname "$(hostname)" \
--arg date "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
--argjson uptime "$(cat /proc/uptime | awk '{print $1}')" \
'{hostname: $hostname, timestamp: $date, uptime: $uptime}'This pattern — building JSON payloads with jq -n — is useful in deployment scripts and monitoring tools where you need to send structured data to an API.
8.3 Working with CSV Data
JSON is the format of APIs. CSV is the format of everything else — spreadsheet exports, database dumps, analytics data, billing records. The terminal handles CSV well, but with some caveats.
8.3.1 Basic CSV processing with awk and cut
For simple cases, awk and cut from Chapter 4 are sufficient:
cut -d',' -f1,3 data.csv # extract columns 1 and 3
awk -F',' '$3 > 1000 {print $1, $3}' data.csv # filter and extractThe limitation is that basic awk and cut don’t handle quoted fields — a CSV value like "Smith, John" (with a comma inside quotes) will break naive field splitting. For well-formed, quote-free CSVs, they work fine. For real-world CSVs, you need something better.
8.3.2 csvkit: the CSV toolkit
csvkit is a suite of command-line tools specifically designed for CSV files. It handles quoted fields, type inference, and encoding issues correctly.
pip install csvkitThe toolkit includes several commands:
csvstat data.csv # summary statistics for each column
csvcut -c 1,3 data.csv # extract columns by index or name
csvgrep -c email -m "@example.com" data.csv # filter rows by column value
csvsort -c amount -r data.csv # sort by column, descending
csvjoin -c id users.csv orders.csv # join two CSVs on a column
csv2json data.csv # convert CSV to JSON8.3.3 csvstat for quick data profiling
csvstat data.csv 1. id
Type of data: Number
Contains null values: False
Unique values: 1000
Sum: 500500
Min: 1
Max: 1000
2. email
Type of data: Text
Contains null values: True (excluded from calculations)
Unique values: 998
Max length: 42
This gives you a complete statistical profile of every column in a CSV — types, null counts, unique values, min/max — in seconds. It’s the first thing to run when you receive an unfamiliar data file.
8.3.4 csvgrep for filtering
# Find rows where status is "error"
csvgrep -c status -m "error" events.csv
# Find rows where email matches a pattern
csvgrep -c email -r "@example\.com$" users.csv
# Find rows where amount is above threshold (pipe through awk after)
csvgrep -c status -m "completed" orders.csv | awk -F',' '$3 > 1000'8.3.5 Converting between formats
csv2json data.csv | jq '.[0]' # CSV to JSON, inspect first row
csv2json data.csv > data.json # convert entire file
json2csv -k id,name,email data.json # JSON array to CSVThe ability to convert between formats unlocks cross-tool processing: convert CSV to JSON, process with jq, convert back to CSV, or feed directly into an API with curl.
8.4 Working with YAML
YAML appears constantly in modern development — Kubernetes manifests, Docker Compose files, CI/CD configuration, Ansible playbooks, application config. Unlike JSON, it’s designed to be human-readable, which also makes it harder to process programmatically.
8.4.1 yq: jq for YAML
yq is a YAML processor modeled closely on jq. Its syntax is nearly identical, making it easy to learn if you already know jq.
brew install yq # macOS# Read a field
yq '.database.host' config.yaml
# Update a field
yq '.database.host = "new-host"' config.yaml
# In-place edit
yq -i '.database.host = "new-host"' config.yaml
# Extract a nested array
yq '.services[].name' docker-compose.yaml
# Filter array items
yq '.jobs[] | select(.name == "deploy")' .github/workflows/deploy.yaml8.4.2 Practical YAML workflows
# Find all Kubernetes deployments with fewer than 2 replicas
cat deployment.yaml | yq 'select(.kind == "Deployment" and .spec.replicas < 2)'
# Update the image tag in a Kubernetes manifest
yq -i '.spec.template.spec.containers[0].image = "myapp:v2.1.0"' deployment.yaml
# Extract all environment variable names from a Docker Compose file
yq '.services[].environment | keys' docker-compose.yaml
# Compare two config files field by field
diff <(yq -o=json 'sort_keys(..)' config-dev.yaml) \
<(yq -o=json 'sort_keys(..)' config-prod.yaml)The last example is particularly useful for auditing differences between environment configs: convert both YAML files to sorted JSON, then diff them. The conversion to JSON normalizes formatting differences so the diff shows only actual value changes.
8.4.3 Converting YAML to JSON and back
yq -o=json config.yaml # YAML to JSON
yq -o=json config.yaml | jq '.database' # extract JSON field from YAML source
cat data.json | yq -P '.' # JSON to pretty YAMLBeing able to convert between YAML and JSON means you can use jq’s richer query language on YAML files — convert to JSON, process with jq, convert back if needed.
8.5 Working with Other Formats
8.5.1 XML with xmllint
XML appears less frequently in modern development than it used to, but it’s still common in enterprise systems, Maven build files, Android resources, and SOAP APIs. xmllint is available on most systems without installation:
# Pretty-print XML
xmllint --format messy.xml
# Validate against a schema
xmllint --schema schema.xsd data.xml
# XPath queries
xmllint --xpath "//user[@role='admin']/email/text()" users.xml
# Extract elements
xmllint --xpath "string(//version)" pom.xml # Maven versionFor heavier XML processing, xsltproc applies XSLT transformations, but that’s beyond the scope of this book.
8.5.2 SQLite from the command line
SQLite databases are surprisingly common in development — mobile apps, local caches, test databases, embedded systems. The sqlite3 command-line client lets you query them directly:
sqlite3 database.db # open interactive shell
sqlite3 database.db "SELECT * FROM users LIMIT 10;" # one-off query
sqlite3 -csv database.db "SELECT * FROM users;" > users.csv # export to CSV
sqlite3 -json database.db "SELECT * FROM users;" | jq '.[0]' # export to JSONThe -csv and -json output flags make sqlite3 a natural part of data pipelines — query a database, convert to JSON, process with jq, post to an API with curl.
8.5.3 Environment variables and .env files
# Load a .env file into the current shell
export $(cat .env | grep -v '#' | xargs)
# Check which variables are set
env | grep "API_"
# Run a command with specific environment variables
env API_KEY=test123 node scripts/migrate.js
# Compare .env files across environments
diff <(cat .env.development | sort) <(cat .env.production | sort)8.6 Building API Workflows
Individual curl and jq commands are useful. Combined into workflows, they become small but complete data processing systems.
8.6.1 Paginating through an API
Many APIs return data in pages. This script fetches all pages automatically:
#!/bin/bash
page=1
all_users="[]"
while true; do
response=$(curl -s "https://api.example.com/users?page=$page&per_page=100" \
-H "Authorization: Bearer $API_TOKEN")
users=$(echo "$response" | jq '.data')
count=$(echo "$users" | jq 'length')
if [ "$count" -eq 0 ]; then
break
fi
all_users=$(echo "$all_users $users" | jq -s '.[0] + .[1]')
((page++))
done
echo "$all_users" | jq 'length'
echo "$all_users" > all-users.json8.6.2 Transforming and posting data
A common ETL (extract, transform, load) pattern — read data from one API, transform it, post to another:
#!/bin/bash
# Migrate users from old API to new API
curl -s https://old-api.example.com/users \
-H "Authorization: Bearer $OLD_API_TOKEN" | \
jq -c '.[] | {
name: .full_name,
email: .email_address,
role: (if .admin then "admin" else "user" end)
}' | \
while read -r user; do
curl -s -X POST https://new-api.example.com/users \
-H "Authorization: Bearer $NEW_API_TOKEN" \
-H "Content-Type: application/json" \
-d "$user"
echo "Migrated: $(echo $user | jq -r '.email')"
done8.6.3 Testing an API endpoint
A quick smoke test script for an API:
#!/bin/bash
BASE_URL="https://api.example.com"
TOKEN="$API_TOKEN"
PASS=0
FAIL=0
check() {
local description=$1
local expected=$2
local actual=$3
if [ "$actual" = "$expected" ]; then
echo "[PASS] $description"
((PASS++))
else
echo "[FAIL] $description (expected: $expected, got: $actual)"
((FAIL++))
fi
}
# Test: list users returns 200
status=$(curl -s -o /dev/null -w "%{http_code}" \
-H "Authorization: Bearer $TOKEN" \
"$BASE_URL/users")
check "GET /users returns 200" "200" "$status"
# Test: create user returns 201
status=$(curl -s -o /dev/null -w "%{http_code}" -X POST \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"name":"Test User","email":"test@example.com"}' \
"$BASE_URL/users")
check "POST /users returns 201" "201" "$status"
echo ""
echo "Results: $PASS passed, $FAIL failed"8.6.4 Monitoring an endpoint
#!/bin/bash
# Monitor an endpoint and alert on non-200 responses
URL="https://api.example.com/health"
INTERVAL=30
while true; do
response=$(curl -s -w "\n%{http_code}" "$URL")
body=$(echo "$response" | head -n -1)
status=$(echo "$response" | tail -n 1)
timestamp=$(date -u +%Y-%m-%dT%H:%M:%SZ)
if [ "$status" != "200" ]; then
echo "[$timestamp] ALERT: $URL returned $status"
echo "Body: $body" | head -5
else
echo "[$timestamp] OK: $URL returned $status"
fi
sleep $INTERVAL
done8.7 httpie: A More Ergonomic Alternative
httpie (the http command) is a modern alternative to curl with a more readable syntax, automatic JSON formatting, and sensible defaults for API development.
pip install httpie
# or
brew install httpie# GET request
http https://api.example.com/users
# POST with JSON (automatically sets Content-Type)
http POST https://api.example.com/users name=Alice email=alice@example.com
# With headers
http https://api.example.com/users Authorization:"Bearer token"
# With query parameters
http https://api.example.com/users page==2 per_page==50The = operator sets JSON string fields, := sets non-string JSON values, == sets query parameters, and : sets headers. It’s more readable than curl for interactive API exploration, though curl is still the right choice for scripts — it’s available everywhere and its behavior is more predictable.
8.8 Putting It Together: A Complete Data Pipeline
Here’s a realistic end-to-end pipeline: fetch sales data from an API, enrich it with user data from a CSV, generate a summary report, and post the results to a Slack webhook.
#!/bin/bash
# 1. Fetch orders from API
echo "Fetching orders..."
curl -s "https://api.example.com/orders?status=completed" \
-H "Authorization: Bearer $API_TOKEN" \
> /tmp/orders.json
# 2. Load user data from CSV and convert to JSON for joining
echo "Loading user data..."
csv2json users.csv > /tmp/users.json
# 3. Join orders with user data, compute metrics
echo "Computing metrics..."
jq -s '
.[0] as $orders |
.[1] as $users |
($users | map({(.id | tostring): .}) | add) as $user_map |
{
total_orders: ($orders | length),
total_revenue: ([.$orders[].amount] | add),
average_order: ([.$orders[].amount] | add / length),
top_customers: (
$orders |
group_by(.user_id) |
map({
user: $user_map[.[0].user_id | tostring].name,
orders: length,
revenue: ([.[].amount] | add)
}) |
sort_by(-.revenue) |
.[0:5]
)
}
' /tmp/orders.json /tmp/users.json > /tmp/report.json
# 4. Format as readable text
echo "Formatting report..."
REPORT=$(jq -r '
"Sales Report - \(now | strftime("%Y-%m-%d"))\n" +
"Total Orders: \(.total_orders)\n" +
"Total Revenue: $\(.total_revenue | . * 100 | round / 100)\n" +
"Average Order: $\(.average_order | . * 100 | round / 100)\n\n" +
"Top Customers:\n" +
(.top_customers[] | " \(.user): \(.orders) orders, $\(.revenue)")
' /tmp/report.json)
echo "$REPORT"
# 5. Post to Slack
curl -s -X POST "$SLACK_WEBHOOK_URL" \
-H "Content-Type: application/json" \
-d "$(jq -n --arg text "$REPORT" '{text: $text}')"
echo "Report posted to Slack."This pipeline — fetch, transform, aggregate, report, notify — is the kind of thing that might take an hour to build in a proper programming language and a few minutes to assemble from command-line tools. It’s also easy to debug: you can run each step independently, inspect the intermediate files in /tmp, and modify one stage without touching the others.
8.9 Chapter Summary
The terminal is a complete environment for working with data and APIs. curl handles any HTTP interaction. jq handles any JSON transformation. csvkit and yq cover CSV and YAML. And the Unix pipeline model means all of these tools compose naturally — the output of one becomes the input of the next.
The key habits to build:
- Always use
-swithcurlin pipelines to suppress progress output - Use
jq -rwhen pipingjqoutput into other commands — raw strings, not quoted JSON - Use
jq -nwith--argand--argjsonfor building JSON payloads from shell variables - Run
csvstatfirst on any unfamiliar CSV to understand its structure before processing - Convert YAML to JSON with
yq -o=jsonwhen you needjq’s richer query capabilities - Build pipelines incrementally — get each step working before adding the next
8.10 Exercises
1. Pick any public REST API (GitHub, Open-Meteo, JSONPlaceholder). Use curl to fetch a collection endpoint and jq to extract a specific field from every item in the response. Then filter the results to items matching a condition.
2. Find a CSV file with at least five columns. Run csvstat on it to understand its structure. Then use csvgrep to filter it, csvsort to sort the result, and csv2json to convert it to JSON for further processing with jq.
3. Write a shell script that fetches data from an API endpoint, checks that the HTTP status code is 200, extracts a count from the response body with jq, and prints a pass/fail result.
4. Take a Kubernetes YAML manifest or Docker Compose file and use yq to extract all image names. If you don’t have one handy, find one in any open source repository.
5. Build a pipeline that fetches JSON from an API, extracts a list of email addresses, sorts them, deduplicates them, and writes the result to a file — one email per line.
8.11 Quick Reference
| Command | What it does |
|---|---|
curl -s URL |
GET request, silent mode |
curl -X POST -d @file.json URL |
POST with JSON body from file |
curl -H "Authorization: Bearer token" URL |
Request with auth header |
curl -s -o /dev/null -w "%{http_code}" URL |
Get HTTP status code only |
curl -L URL |
Follow redirects |
jq '.' |
Pretty-print JSON |
jq '.field' |
Extract a field |
jq '.[].field' |
Extract field from each array item |
jq 'select(.x == "y")' |
Filter by condition |
jq '{a: .x, b: .y}' |
Reshape into new object |
jq '[.[].field] \| add' |
Sum a field across array |
jq -r '.field' |
Raw string output (no quotes) |
jq -n --arg x "$VAR" '{key: $x}' |
Build JSON from shell variables |
csvstat file.csv |
Profile a CSV file |
csvcut -c 1,3 file.csv |
Extract CSV columns |
csvgrep -c col -m value file.csv |
Filter CSV by column value |
csv2json file.csv |
Convert CSV to JSON |
yq '.field' file.yaml |
Extract YAML field |
yq -i '.field = "value"' file.yaml |
Edit YAML in place |
yq -o=json file.yaml |
Convert YAML to JSON |
sqlite3 -json db.sqlite "SELECT..." |
Query SQLite, output JSON |