The following screens demonstrate the exploratory data analysis features of Datapup when using it in Python workspace mode.
NOTE: The following guide uses data from data.gov.au and is licensed under Creative Commons Attribution:
– Airport traffic data https://data.gov.au/dataset/ds-dga-cc5d888f-5850-47f3-815d-08289b22f5a8
– NSW COVID-19 cases by age range https://data.gov.au/dataset/ds-nsw-3dc5dc39-40b4-4ee9-8ec6-2d862a916dcf
First, create a python workspace, here called “example”. Specify a file where you will put your Python script code.
![](http://thedatapup.com/home/wp-content/uploads/2020/10/ss1.png)
Click on DataFrames > Python Script, or click on the code button shown below to access the Python editor window:
![](http://thedatapup.com/home/wp-content/uploads/2020/10/ss2.png)
This window allows you to enter Python code that creates Pandas DataFrames. Any variable that is a dataframe will be available for querying. Here I load a csv file into the variable df_pax.
![](http://thedatapup.com/home/wp-content/uploads/2020/10/ss3.png)
DataFrames appear on the right. Right click on one, and you can do standard EDA operations like correlations, data types and describe from the Python Pandas library.
![](http://thedatapup.com/home/wp-content/uploads/2020/10/ss5.png)
![](http://thedatapup.com/home/wp-content/uploads/2020/10/ss6.png)
Next, write your SQL directly in the SQL window, and execute using the Run button, or Ctrl-Enter or Shift-F9. (Press Ctrl-H to see all shortcut keys).
![](http://thedatapup.com/home/wp-content/uploads/2020/10/ss8.png)
Right click on the results, or use the buttons on the right, to run commands such as correlation and describe on the output table.
![](http://thedatapup.com/home/wp-content/uploads/2020/10/ss9.png)
Plots and counts
Right click on a column header in the results, to run commands such as box plots and scatter plots.
![](http://thedatapup.com/home/wp-content/uploads/2020/10/pcss1.png)
Here I’ve chosen Int_pax_in for the Y axis value.
![](http://thedatapup.com/home/wp-content/uploads/2020/10/pcss2.png)
Importing CSV using the GUI
Back to the main window, from the menu DataFrames > Import CSV, you can import CSV files without writing Python. Here I’ve given the DataFrame a name of covidnsw, and accepted defaults.
![](http://thedatapup.com/home/wp-content/uploads/2020/10/impss1.png)
Data Pup has automatically updated the python script (since I left the update python script option checked)
![](http://thedatapup.com/home/wp-content/uploads/2020/10/impss2.png)
Back in the main window, add some additional SQL to query the new DataFrame covidnsw, and execute it.
![](http://thedatapup.com/home/wp-content/uploads/2020/10/impss4.png)
Right click on column age_group and choose value counts.
![](http://thedatapup.com/home/wp-content/uploads/2020/10/impss5.png)
![](http://thedatapup.com/home/wp-content/uploads/2020/10/impss6.png)