{"id":319,"date":"2020-10-09T07:28:47","date_gmt":"2020-10-09T07:28:47","guid":{"rendered":"http:\/\/thedatapup.com\/home\/?page_id=319"},"modified":"2021-01-15T05:47:07","modified_gmt":"2021-01-15T05:47:07","slug":"sql-equivalent-quick-reference-for-python-and-r","status":"publish","type":"page","link":"https:\/\/thedatapup.com\/home\/index.php\/code-tips\/sql-equivalent-quick-reference-for-python-and-r\/","title":{"rendered":"SQL Statement equivalents in Python and R"},"content":{"rendered":"\n<p>A Python\/R cheat sheet for SQL developers.  The variable &#8220;df&#8221; is the data frame \/ table name.  For python code, pandas has been imported as pd, and numpy as np.  For R, some code assumes dplyr is loaded via libary(dplyr).<\/p>\n\n\n\n<div class=\"wp-block-group\"><div class=\"wp-block-group__inner-container\">\n<h3>Select individual column<\/h3>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"sql\" data-enlighter-theme=\"dracula\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"ind1\">select column_name from df<\/pre>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"dracula\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"ind2\">df.column_name\n\n# Alternative \ndf['column_name']\n\n# Using a variable\nmyvar = 'column_name'\ndf[myvar]<\/pre>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"r\" data-enlighter-theme=\"dracula\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"ind2\">df$column_name\n\n# Alternative, needed if column name has spaces\/weird characters\ndf$`column_name`\n\n# Alternative\ndf[, 'column_name']\n\n# Using a variable\nmyvar &lt;- 'column_name'\ndf[, myvar]<\/pre>\n\n\n\n<h3>Select multiple columns<\/h3>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"sql\" data-enlighter-theme=\"dracula\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"bs1\">select column_a, column_b, column_c from df<\/pre>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"dracula\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"bs2\"># By name\ndf[['column_a', 'column_b', 'column_c']]\n\n# By index, where they are the first three columns\ndf.iloc[:, 0:4]\n<\/pre>\n\n\n\n<div class=\"wp-block-group\"><div class=\"wp-block-group__inner-container\">\n<div class=\"wp-block-group\"><div class=\"wp-block-group__inner-container\">\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"r\" data-enlighter-theme=\"dracula\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"bs2\"># By name\ndf[, c('column_a', 'column_b', 'column_c')]\n\n# By name using dplyr\ndf %>% select(column_a, column_b, column_c)\n\n# By index, where they are the first three columns\ndf[, 1:3]<\/pre>\n\n\n\n<h3>Top 10 rows<\/h3>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"sql\" data-enlighter-theme=\"dracula\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"head1\">select * from df limit 10<\/pre>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"dracula\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"head2\">df.head(10)<\/pre>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"r\" data-enlighter-theme=\"dracula\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"head2\"># standard way\nhead(df, 10)\n\n# with dplyr\ndf %>% head(10)<\/pre>\n\n\n\n<h3>Unique Values and count for a column<\/h3>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"sql\" data-enlighter-theme=\"dracula\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"unq1\"># Just the values\nselect distinct(column_name) from df\n\n# Values with a count\nselect column_name, count(*) from df\ngroup by column_name<\/pre>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"dracula\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"unq2\"># Just the values\ndf.column_name.unique()\n\n# Values with a count\ndf.column_name.value_counts()<\/pre>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"r\" data-enlighter-theme=\"dracula\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"unq2\"># Just the values\nunique(df$column_name)\n\n# Values with a count\ntable(df$column_name)\n\n# Values with a count using dplyr\ndf %>% select(column_name) \n   %>% count(column_name)\n\n<\/pre>\n\n\n\n<h3>Filtering by column value<\/h3>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"sql\" data-enlighter-theme=\"dracula\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"filt1\"># Equal to a specified value\nselect * from df where col_name = 'value'\n\n# Equal to a value in a set\nselect * from df where col_name in ('value1', 'value2')\n\n# Value is empty\nselect * from df where col_name is null\n\n# Find a word within a string\nselect * from df where col_string like '%word%'\n\n# Filter by numeric value greater than 100\nselect * from df where col_numeric > 100<\/pre>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"dracula\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"filt2\"># Equal to a specified value\ndf[df.col_name == 'value']\n\n# Equal to a value in a set\ndf[df.col_name.isin(['value1', 'value2'])]\n\n# Value is empty\ndf[df.col_name.isnull()]\n\n# Find a word within a string\ndf[df.col_string.str.contains('word') == True]\n\n# Filter by numeric value greater than 100\ndf[df.col_numeric > 100]<\/pre>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"r\" data-enlighter-theme=\"dracula\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"filt2\"># Equal to a specified value\ndf[df$col_name == 'value', ]\n\n# Equal to a value in a set\ndf[df$col_name %in% c('value1', 'value2'), ]\n\n# Value is empty\ndf[is.na(df$col_name), ]\n\n# Find a word within a string\ndf[grepl('word', df$col_string, fixed=T), ]\n\n# Filter by numeric value greater than 100\ndf[df$col_name > 100, ]<\/pre>\n\n\n\n<h3>Multi column and multi table\/dataframe conditions<\/h3>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"sql\" data-enlighter-theme=\"dracula\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"mult1\"># Condition between two columns of same table\nselect * from df where col_a != col_b\n\n# Inner join between two tables\nselect df1.col_a, df1.col_b, df2.col_x, df2.col_y from df1, df2\nwhere df1.col_common = df2.col_common <\/pre>\n\n\n\n<p><\/p>\n<\/div><\/div>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"dracula\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"mult2\"># Condition between two columns of same table\ndf[df['col_a'] != df['col_b']]\n\n# Inner join between two tables\npd.merge(df1, df2, on='col_common', how='inner')[['col_a', 'col_b', 'col_x', 'col_y']]<\/pre>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"r\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"mult2\"># Condition between two columns of same table\ndf[df$col_a != df$col_b, ]\n\n# Inner join between two tables (uses dplyr)\nmerge(df1, df2, by=\"col_common\") %>% \n  select(col_a, col_b, col_x, col_y)<\/pre>\n\n\n\n<h3>Update and Alter Statements<\/h3>\n<\/div><\/div>\n\n\n\n<p><\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"sql\" data-enlighter-theme=\"dracula\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"upd1\"># Setting values that are null to fill_value\nupdate df set column_name = 'fill_value' where column_name is null\n\n# Setting values conditionally\nupdate df set column_name = 'new_value'\nwhere column_name in ('old_value1', 'old_value2')\n\n# Conditional delete of rows\ndelete from df where column_name = 'some_value'\n\n# Create copy of column old_column in new column called new_column \n# (Oracle syntax - other DBs differ)\nalter table df add (new_column varchar2(50));\nupdate df set new_column = old_column;\n\n# Renaming a column (Oracle syntax - other DBs differ)\nalter table df rename column \"old column name\" \"newcolumnname\" \n<\/pre>\n\n\n\n<p><\/p>\n<\/div><\/div>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"dracula\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"upd2\"># Setting values that are null to fill_value\ndf['column_name'] = df['column_name'].fillna('fill_value')\n\n# Setting values conditionally\ndf.loc[df['column_name'].isin(['old_value1', 'old_value2']), \n       'column_name'] = 'new_value'\n\n# Conditional delete of rows\ndf.drop(np.where(df['column_name'] == 'some_value')[0], inplace=True)\n\n# Create copy of column old_column in new column called new_column \ndf['new_column'] = df['old_column']\n\n# Renaming a column\ndf.rename(columns = {'old column name':'newcolumnname'}, inplace=True)<\/pre>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"r\" data-enlighter-theme=\"dracula\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"upd2\"># Setting values that are null to fill_value\ndf$column_name[is.na(df$column_name)] &lt;- 'fill_value'\n\n# Setting values conditionally\ndf$column_name[df$column_name %in% c('old_value1', 'old_value2')] &lt;- 'new_value'\n\n# Conditional delete of rows\ndf &lt;- df[df$column_name != 'some_value',]\n\n# Create copy of column old_column in new column called new_column \ndf$new_column &lt;- df$old_column\n\n# Create copy of column old column in new column called new_column (i.e. deal with spaces)\ndf$new_column = df$`old column`\n\n# Renaming a column\nnames(df)[names(df) == \"old column name\"] &lt;- \"newcolumnname\"\n\n# Renaming a column with dplyr\ndf &lt;- df %>% rename(\"newcolumname\" = \"old column name\")   \n\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>A Python\/R cheat sheet for SQL developers. The variable &#8220;df&#8221; is the data frame \/ table name. For python code, pandas has been imported as pd, and numpy as np. For R, some code assumes dplyr is loaded via libary(dplyr). Select individual column Select multiple columns Top 10 rows Unique Values and count for a &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/thedatapup.com\/home\/index.php\/code-tips\/sql-equivalent-quick-reference-for-python-and-r\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;SQL Statement equivalents in Python and R&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":212,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":[],"_links":{"self":[{"href":"https:\/\/thedatapup.com\/home\/index.php\/wp-json\/wp\/v2\/pages\/319"}],"collection":[{"href":"https:\/\/thedatapup.com\/home\/index.php\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/thedatapup.com\/home\/index.php\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/thedatapup.com\/home\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/thedatapup.com\/home\/index.php\/wp-json\/wp\/v2\/comments?post=319"}],"version-history":[{"count":67,"href":"https:\/\/thedatapup.com\/home\/index.php\/wp-json\/wp\/v2\/pages\/319\/revisions"}],"predecessor-version":[{"id":526,"href":"https:\/\/thedatapup.com\/home\/index.php\/wp-json\/wp\/v2\/pages\/319\/revisions\/526"}],"up":[{"embeddable":true,"href":"https:\/\/thedatapup.com\/home\/index.php\/wp-json\/wp\/v2\/pages\/212"}],"wp:attachment":[{"href":"https:\/\/thedatapup.com\/home\/index.php\/wp-json\/wp\/v2\/media?parent=319"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}