MJN All Blog Cheatsheets Elasticsearch GCP JS LinuxBash Misc Notes Other ShortcutKeys / - Search

Home / Blog / 22-06-22 - One Liner to Run SQL Over a CSV File Using Sqlite


Assume we have a cvs file people.csv:

> cat people.csv
name,sex,age
martin,m,21
paul,m,50
dave,m,58
sarah,f,33
anna,f,53

We can run a SQL command on this file using a single line (Linux) sqlite3 command:

> sqlite3 :memory: \                             # Use an in memory database
         -cmd '.mode csv' \                      # Switch to csv mode for import
         -cmd '.import <filename> <tablename>' \ # Import file as csv
         -cmd '.mode column'                     # Switch to column mode for output
         -header \                               # Show header for output
         '<sql_statement>'

An example over people.csv:

> sqlite3 :memory: -cmd '.mode csv' -cmd '.import people.csv people' \
          -cmd '.mode column' -header \
          'SELECT sex, COUNT(*), AVG(age) FROM people GROUP BY sex'

sex         COUNT(*)    AVG(age)
----------  ----------  ----------
f           2           43.0
m           3           43.0

How well does it perform? Let’s run over 500,000 records:

# Create a 500,000 record file.  Take 5 rows (no header) from people.csv and write it
# 100 times to people2.csv

> for ((i=0; i<100; i++)); do tail -5 people.csv >> people2.csv; done

# Write people2.csv 1000 times into people3.csv

> for ((i=0; i<1000; i++)); do cat people2.csv >> people3.csv; done

# Add the header and 500,000 lines to people4.csv

> head -1 people.csv > people4.csv; cat people3.csv >> people4.csv

# Check we have 500,000 records and 1 line header

> wc -l people4.csv
500001 people4.csv

# Run the SQL query over people4.csv

> time sqlite3 :memory: -cmd '.mode csv' -cmd '.import people4.csv people' \
       -cmd '.mode column' -header \
       'SELECT sex, COUNT(*), AVG(age) FROM people GROUP BY sex'

sex         COUNT(*)    AVG(age)
----------  ----------  ----------
f           200000      43.0
m           300000      43.0

real    0m0.924s
user    0m0.641s
sys     0m0.188s

Roughly 1 second. Not bad.


This page was generated by GitHub Pages. Page last modified: 24/05/17 16:32