The following screens demonstrate the exploratory data analysis features of Datapup when using it in Python workspace mode.
NOTE: The following guide uses data from data.gov.au and is licensed under Creative Commons Attribution:
– Airport traffic data https://data.gov.au/dataset/ds-dga-cc5d888f-5850-47f3-815d-08289b22f5a8
– NSW COVID-19 cases by age range https://data.gov.au/dataset/ds-nsw-3dc5dc39-40b4-4ee9-8ec6-2d862a916dcf
First, create a python workspace, here called “example”. Specify a file where you will put your Python script code.
data:image/s3,"s3://crabby-images/de565/de5654300f3fa871c6d2bf26425c05747e32a728" alt=""
Click on DataFrames > Python Script, or click on the code button shown below to access the Python editor window:
data:image/s3,"s3://crabby-images/9465f/9465f7f2bbc53540b4ee89fc58295d168409b827" alt=""
This window allows you to enter Python code that creates Pandas DataFrames. Any variable that is a dataframe will be available for querying. Here I load a csv file into the variable df_pax.
data:image/s3,"s3://crabby-images/3fd33/3fd338a7bac4e0628c3665690807874f9e1b1ebd" alt=""
DataFrames appear on the right. Right click on one, and you can do standard EDA operations like correlations, data types and describe from the Python Pandas library.
data:image/s3,"s3://crabby-images/1b8b6/1b8b61dd13c62e98df155f1313d489be90032c8a" alt=""
data:image/s3,"s3://crabby-images/16b01/16b019f7f9c2e015d70d5c59e2c718641eb78665" alt=""
Next, write your SQL directly in the SQL window, and execute using the Run button, or Ctrl-Enter or Shift-F9. (Press Ctrl-H to see all shortcut keys).
data:image/s3,"s3://crabby-images/797b9/797b9e570605ddcbc368b526de219676a5aac91a" alt=""
Right click on the results, or use the buttons on the right, to run commands such as correlation and describe on the output table.
data:image/s3,"s3://crabby-images/2ca37/2ca37d139ee8710c0c34476840f98b100a29f795" alt=""
Plots and counts
Right click on a column header in the results, to run commands such as box plots and scatter plots.
data:image/s3,"s3://crabby-images/c6566/c656638ce58c01547606608b18af6b9382aab429" alt=""
Here I’ve chosen Int_pax_in for the Y axis value.
data:image/s3,"s3://crabby-images/7304d/7304da0d55a54436141a1a29517f4dbf8ba72080" alt=""
Importing CSV using the GUI
Back to the main window, from the menu DataFrames > Import CSV, you can import CSV files without writing Python. Here I’ve given the DataFrame a name of covidnsw, and accepted defaults.
data:image/s3,"s3://crabby-images/19aba/19abaeccec1612ab8e3750ba8105c372b3968fd4" alt=""
Data Pup has automatically updated the python script (since I left the update python script option checked)
data:image/s3,"s3://crabby-images/fd756/fd756716e13cb0c6c2d25d7837da672351be3a6f" alt=""
Back in the main window, add some additional SQL to query the new DataFrame covidnsw, and execute it.
data:image/s3,"s3://crabby-images/d25cc/d25ccdea74138845e4d2f0f0f0e37b14a5ca4859" alt=""
Right click on column age_group and choose value counts.
data:image/s3,"s3://crabby-images/dd4b2/dd4b219c0797ec2dc7a25a7a7bdd303f27c3c4aa" alt=""
data:image/s3,"s3://crabby-images/54eb9/54eb98f837552ed6985a1d4f0e14e5d3a3ff923d" alt=""