From 3b2554e5db2b5a80d988679c5f5def40bbc356e1 Mon Sep 17 00:00:00 2001 From: Maneesha Sane Date: Tue, 16 Dec 2025 14:33:03 -0500 Subject: [PATCH] assorted typos/spelling/grammar/etc --- episodes/01-relational-database.md | 4 ++-- episodes/02-db-browser.md | 6 +++--- episodes/03-select.md | 2 +- episodes/04-missing-data.md | 6 +++--- episodes/05-creating-new-columns.md | 10 +++++----- episodes/06-aggregation.md | 8 ++++---- episodes/07-creating-tables-views.md | 14 +++++++------- episodes/08-sqlite-command-line.md | 10 +++++----- episodes/09-joins.md | 22 +++++++++++----------- episodes/10-other-environments.md | 8 ++++---- instructors/instructor-notes.md | 2 +- 11 files changed, 46 insertions(+), 46 deletions(-) diff --git a/episodes/01-relational-database.md b/episodes/01-relational-database.md index ae2db47d..f5ae1a41 100644 --- a/episodes/01-relational-database.md +++ b/episodes/01-relational-database.md @@ -36,7 +36,7 @@ Databases are designed to allow efficient querying against very large tables, mo ## What is a table? -As were have noted above, a single table is very much like a spreadsheet. It has rows and it has columns. A row represents a single observation and the columns represents the various variables contained within that observation. +As we have noted above, a single table is very much like a spreadsheet. It has rows and it has columns. A row represents a single observation and the columns represent the various variables contained within that observation. Often one or more columns in a row will be designated as a 'primary key' This column or combination of columns can be used to uniquely identify a specific row in the table. The columns typically have a name associated with them indicating the variable name. A column always represents the same variable for each row contained in the table. Because of this the data in each column will always be of the same *type*, such as an Integer or Text, of values for all of the rows in the table. Datatypes are discussed in the next section. @@ -108,7 +108,7 @@ for these and use the built-in Date And Time Functions to manipulate them. We wi ## Why do tables have primary key columns? -Whenever you create a table, you will have the option of designating one of the columns as the primary key column. The main property of the primary key column is that the values contained in it must uniquely identify that particular row. That is you cannot have duplicate primary keys. This can be an advantage which adding rows to the table as you will not be allowed to add the same row (or a row with the same primary key) twice. +Whenever you create a table, you will have the option of designating one of the columns as the primary key column. The main property of the primary key column is that the values contained in it must uniquely identify that particular row. That is you cannot have duplicate primary keys. This can be an advantage when adding rows to the table as you will not be allowed to add the same row (or a row with the same primary key) twice. The primary key column for a table is usually of type Integer although you could have Text. For example if you had a table of car information, then the "Reg\_No" column could be made the primary key as it can be used to uniquely identify a particular row in the table. diff --git a/episodes/02-db-browser.md b/episodes/02-db-browser.md index ab2610b6..1daccfe2 100644 --- a/episodes/02-db-browser.md +++ b/episodes/02-db-browser.md @@ -52,7 +52,7 @@ We will make a couple of initial changes to the layout of the screen. These will ![](fig/DB_Browser_run_2.png){alt='Data Browser Preferences'} -Towards the bottom there is a section dealing with Field colors. You will see three bars below the word Text, to the right there are in fact three invisible bars for the Background. Click in the area for the Background color for NULL. A colour selector window will open, select Red. The bar will turn Red. This is now the default background cell colour that will be used to display NULL values in you tables. We will discuss the meaning of NULL values in a table in a later episode. +Towards the bottom there is a section dealing with Field colors. You will see three bars below the word Text, to the right there are in fact three invisible bars for the Background. Click in the area for the Background color for NULL. A colour selector window will open, select Red. The bar will turn Red. This is now the default background cell colour that will be used to display NULL values in your tables. We will discuss the meaning of NULL values in a table in a later episode. You can now close the preference window by clicking OK. @@ -76,7 +76,7 @@ These are the same actions that are available from the toolbar at the top of the ![](fig/DB_Browser_run_3.png){alt='Table Actions'} If you select 'Browse Table', the data from the table is loaded into the 'Browse Data' pane from where it can be examined or filtered. -You can also select the table you wish to Browse directly from here. +You can also select the table you wish to browse directly from here. There are options for 'New Record' and 'Delete Record'. As our interest is in analysing existing data not creating or deleting data, it is unlikely that you will want to use these options. @@ -97,7 +97,7 @@ The second pane has the tabular results, and the bottom pane has a message indic On the toolbar at the top there are eight buttons. Left to right they are: - Open Tab (creates a new tab in the editor) -- Open SQL file (allows you to load a prepared file of SQL into the editor - the tab takes the name of he file) +- Open SQL file (allows you to load a prepared file of SQL into the editor - the tab takes the name of the file) - Save SQL file (allows you to save the current contents of the active pane to the local file system) - Execute SQL (Executes all of the SQL statements in the editor pane) - Execute current line (Actually executes whatever is selected) diff --git a/episodes/03-select.md b/episodes/03-select.md index 79571261..09efdd3c 100644 --- a/episodes/03-select.md +++ b/episodes/03-select.md @@ -172,7 +172,7 @@ WHERE B17_parents_liv = 'yes' ; ``` -Notice that the columns being used in the `WHERE` clause do not need to returned as part of the `SELECT` clause. +Notice that the columns being used in the `WHERE` clause do not need to be returned as part of the `SELECT` clause. You can ensure the precedence of the operators by using brackets. Judicious use of brackets can also aid readability diff --git a/episodes/04-missing-data.md b/episodes/04-missing-data.md index 263d304b..ad77d263 100644 --- a/episodes/04-missing-data.md +++ b/episodes/04-missing-data.md @@ -23,7 +23,7 @@ exercises: 0 At the beginning of this lesson we noted that all database systems have the concept of a NULL value; Something which is missing and nothing is known about it. In DB Browser we can choose how we want NULLs in a table to be displayed. When we had our initial look at DB Browser, -we used the `View | Preference` option to change the background colour of cells in a table which has a `NULL` values as **red**. +we used the `View | Preference` option to change the background colour of cells in a table which has `NULL` values as **red**. The example below, using the 'Browse data' tab, shows a section of the Farms table in the SQL\_SAFI database showing column values which are `NULL`. ![](fig/SQL_04_Nulls_01.png){alt='Farms NULLs'} @@ -78,10 +78,10 @@ the value of `NULL` is appropriate. ## Dealing with missing data -There are several statistical techniques that can be used to allow for `NULL` values, which one you might will depend on what has caused the `NULL` value to be recorded. +There are several statistical techniques that can be used to allow for `NULL` values. Which one you might use will depend on what has caused the `NULL` value to be recorded. You may want to change the `NULL` value to something else. For example if we knew that the `NULL` values in the `F14_items_owned` column actually meant that the Farmer had no possessions then we -might want to change the `NULL` values to '[]' to represent and empty list. We can do that in SQL with an `UPDATE` query. +might want to change the `NULL` values to '[]' to represent an empty list. We can do that in SQL with an `UPDATE` query. The update query is shown below. We are not going to run it as it would change our data. You need to be very sure of the effect you are going to have before you change data in this way. diff --git a/episodes/05-creating-new-columns.md b/episodes/05-creating-new-columns.md index 9cbe858e..7d07b427 100644 --- a/episodes/05-creating-new-columns.md +++ b/episodes/05-creating-new-columns.md @@ -65,7 +65,7 @@ Full details of the available built-in functions are available from the SQLite.o We will look at some of the arithmetic and statistical functions when we deal with aggregations in a later lesson. -You may have noticed in the output from are last query that the number of decimal places can change from one row to another. In order to make the output +You may have noticed in the output from our last query that the number of decimal places can change from one row to another. In order to make the output more tidy, we may wish to always produce the same number of decimal places, e.g. 2. We can do this using the `ROUND` function. The `ROUND` function works in a similar way as its spreadsheet equivalent, you specify the value you wish to round and the required number of decimal places. @@ -113,10 +113,10 @@ sometimes with different names. | substr(a,b,c) | mid(a,b,c) | | instr(a,b) | find(a,b) | -`instr` can be used to check a character or string of characters occurs within another string. +`instr` can be used to check if a character or string of characters occurs within another string. `substr` can be used to extract a portion of a string based on a starting position and the number of characters required. -In the Farms table, the three columns A01\_interview\_date, A04\_start and A05\_end are all recognisable as a dates with the A04\_start and A05\_end also including times. +In the Farms table, the three columns A01\_interview\_date, A04\_start and A05\_end are all recognisable as dates with the A04\_start and A05\_end also including times. These last two are automatically generated by the eSurvey software when the data is collected, i.e. they are automatically entered. The A01\_interview\_date however is manually input. In all three cases however SQLite thinks that they are all just strings of characters. We can confirm this by selecting the `Database Structure` tab and expanding the `Farms` entry and notice that the data type for all three columns is listed as 'TEXT' @@ -268,7 +268,7 @@ ORDER BY year, month, day ``` By default the `ORDER BY` clause will sort in ascending order, smallest to -biggest; we can make this explicit by usingthe `ASC` keyword. Or if we want to +biggest; we can make this explicit by using the `ASC` keyword. Or if we want to sort in descending order we can use the `DESC` keyword. ```sql @@ -296,7 +296,7 @@ FROM Farms ; ``` -There is a more general form which allows to to perform any kind of test. +There is a more general form which allows us to perform any kind of test. ## Using SQL syntax to create ‘binned' values diff --git a/episodes/06-aggregation.md b/episodes/06-aggregation.md index 161c6541..840e1769 100644 --- a/episodes/06-aggregation.md +++ b/episodes/06-aggregation.md @@ -22,7 +22,7 @@ exercises: 10 ## Using built-in statistical functions -Aggregate functions are used perform some kind of mathematical or statistical calculation across a group of rows. The rows in each group are determined +Aggregate functions are used to perform some kind of mathematical or statistical calculation across a group of rows. The rows in each group are determined by the different values in a specified column or columns. Alternatively you can aggregate across the entire table. If we wanted to know the minimum, average and maximum values of the 'A11\_years\_farm' column across the whole Farms table, we could write a query such as this; @@ -76,7 +76,7 @@ We get ![](fig/SQL_06_villages.png){alt='Villages'} -The problem with allowing free-form text quite obvious. Having two villages, one called 'Massequece' and the other called 'Massequese' is unlikely. +The problem with allowing free-form text may be quite obvious. Having two villages, one called 'Massequece' and the other called 'Massequese' is unlikely. Detecting this type of problem in a large dataset can be very difficult if you are just 'eyeballing' the content. This small SQL query makes it very clear, and in the OpenRefine lesson we provide approaches to detecting and correcting such errors. SQL is not the best tool for correcting this type of error. @@ -110,7 +110,7 @@ ORDER BY A06_province, A07_district, A08_ward, A09_village; ## The `GROUP BY` clause to summarise data -Just knowing the combinations is of limited use. You really want to know **How many** of each of the values there are. +Just knowing the combinations is of limited use. You really want to know **how many** of each of the values there are. To do this we use the `GROUP BY` clause. ```sql @@ -124,7 +124,7 @@ This query tells us how many records in the table have each different value in t In the first example of this episode, three aggregations were performed over the single column 'A11\_years\_farm'. In addition to calculating multiple aggregation values over a single column, it is also possible to aggregate over multiple columns by specifying -them in all in the `SELECT` clause **and** the `GROUP BY` clause. +them all in the `SELECT` clause **and** the `GROUP BY` clause. The grouping will take place based on the order of the columns listed in the `GROUP BY` clause. There will be one row returned for each unique combination of the columns mentioned in the `GROUP BY` clause diff --git a/episodes/07-creating-tables-views.md b/episodes/07-creating-tables-views.md index 5c862169..b56ea27b 100644 --- a/episodes/07-creating-tables-views.md +++ b/episodes/07-creating-tables-views.md @@ -77,16 +77,16 @@ If any of the datatypes are not as expected or wanted we can change them. In this particular case DB Browser correctly selected the datatypes. Notice that the `A01_interview_date` was allocated a datatype of 'TEXT'. This isn't a problem as we have to use the Date and Time functions to manipulate dates anyway. -Notice that the bottom pane in the Window shows the SQL DDL statement that would create the table that you modifying. +Notice that the bottom pane in the Window shows the SQL DDL statement that would create the table that you are modifying. When you change one of the columns from TEXT to INTEGER, this is immediately reflected in the Create Table statement. It is slightly misleading because in fact we are modifying an existing table and in SQL-speak, this would be an **Alter Table...** statement. However it does illustrate quite well the fact that whatever you do in the GUI, it is essentially translated into an SQL statement and executed. -You could copy and paste this definition into the SQL editor and if you change the table name before you ran it, you would create a new table with that name. +You could copy and paste this definition into the SQL editor and if you changed the table name before you ran it, you would create a new table with that name. This new table would have no data in it. This is how the insert table wizard works. It uses the header row from your data to create a `CREATE TABLE` statement which it runs. It then transforms each of the rows of data into SQL `INSERT INTO...` statements which it also runs to get the data into the table. -In addition to changing the data types there are several other options which can be set when you are creating of modifying a table. +In addition to changing the data types there are several other options which can be set when you are creating or modifying a table. For our tables we don't need to make use of them but for completeness we will describe what they are; **PK** - Or Primary Key, a unique identifier for the row. In the Farms table, there is an `Id` column which uniquely identifies a Farm. @@ -98,7 +98,7 @@ This could act as a unique identifier for the row as a whole. We could mark thi In real datasets missing values are quite common and we have already looked at ways of dealing with them when they occur in tables. If you were to **check** this box and the data did have missing values for this column, the record from the file would be rejected and the load of the file will fail. -**U** - Or Unique. This allows you to say that the contents of the column, which is not the primary key column has to have unique values in it. Like Allow Null this is another way of providing some data validation as the data is imported. Although it doesn't really apply with the DB Browser import wizard as the data is imported before you are allowed to set this option. +**U** - Or Unique. This allows you to say that the contents of the column, which is not the primary key column, has to have unique values in it. Like Allow Null, this is another way of providing some data validation as the data is imported (although it doesn't really apply with the DB Browser import wizard as the data is imported before you are allowed to set this option). **Default** - This is used in conjunction with 'Not Null', if a value is not provided in the dataset, then if provided, the default value for that column will be used. @@ -133,7 +133,7 @@ line added. ## Creating a table using an SQL command -You could copy and paste this definition into the SQL editor and if you change the table name before you ran it, you would create a new table with that name. +You could copy and paste this definition into the SQL editor and if you changed the table name before you ran it, you would create a new table with that name. This new table would have no data in it. This is how the insert table wizard works. It uses the header row from your data to create a `CREATE TABLE` statement which it runs. It then transforms each of the rows of data into SQL `INSERT INTO...` statements which it also runs to get the data into the table. @@ -172,7 +172,7 @@ SELECT Id, FROM Farms; ``` -If we wanted to create a table from the Crops table which contains only the rows where the D\_curr\_crop value was 'rice' we could use a query like this: +If we wanted to create a table from the Crops table, which contains only the rows where the D\_curr\_crop value was 'rice' we could use a query like this: ```sql CREATE TABLE crops_rice AS @@ -215,7 +215,7 @@ The advantage of using Views is that it allows you to restrict how you see the d In the example we used above it may be far easier to work with only the 6 columns that we need from the full Farms table rather than the full table with 61 columns. -A View isn't restricted to simple `SELECT` statements it can be the result of aggregations and joins as well. +A View isn't restricted to simple `SELECT` statements. It can be the result of aggregations and joins as well. This can help reduce the complexity of queries based on the View and so aid readability. :::::::::::::::::::::::::::::::::::::::: keypoints diff --git a/episodes/08-sqlite-command-line.md b/episodes/08-sqlite-command-line.md index 0a6e7575..4b13cfd1 100644 --- a/episodes/08-sqlite-command-line.md +++ b/episodes/08-sqlite-command-line.md @@ -26,7 +26,7 @@ I will assume that you have added the location of the program to your local PATH will make it easier to refer to the database file and other files we may want to use. The instructions in this episode are written from a Windows user perspective. If you are using Linux or a Mac, -open a terminal window instead a command prompt. +open a terminal window instead of a command prompt. 1. Open a command prompt (cmd.exe) and 'cd' to the folder location of the SQL\_SAFI.sqlite database file. 2. run the command 'sqlite3' This should open the SQLite shell and present a screen similar to that below. @@ -42,7 +42,7 @@ open a terminal window instead a command prompt. It is important to remember the .sqlite suffix, otherwise a new database simply called SQL\_SAFI would be created 4. Once the database is opened you can run queries by typing directly in the shell. Unlike in DB Browser, - you must always terminate your select command with a ";". This is how the shell knows that **You** think the statement is complete. Although easy to forget, it generally works to your advantage as it allows you to split a long query command across lines as you did in the DB Browser application. + you must always terminate your select command with a ";". This is how the shell knows that **you** think the statement is complete. Although easy to forget, it generally works to your advantage as it allows you to split a long query command across lines as you did in the DB Browser application. ![](fig/SQL_08_SQLite_shell_query_example.png){alt='SQLite shell query example'} @@ -81,7 +81,7 @@ The file will be created if needed or it will overwrite an already existing file ![](fig/SQL_08_SQLite_shell_dot_commands.png){alt='SQLite shell dot commands'} -Yes you can have a file called "my.filename" if you want. The contents of which contains the expected output from the query. +Yes you can have a file called "my.filename" if you want, the contents of which contain the expected output from the query. ![](fig/SQL_08_my_filename.png){alt='SQLite my.filename'} @@ -91,7 +91,7 @@ Notice the use of quotes in the rows where the value of the data item themselves So far we have used the shell in much the same way as we might have used the DB Browser application. We run the program, connect to a database, run a query and save the output. -Because the shell will accept any valid SQL statements as well as have numerous 'dot' commands of it own +Because the shell will accept any valid SQL statements as well as have numerous 'dot' commands of its own to configure how it works it could be considered as powerful as the DB Browser application. You could use it as a replacement in most cases. @@ -122,7 +122,7 @@ Notice that there is no output to the screen and that the shell is closed. The r There are two key advantages of using this approach. -1. It aids automation. It would be straightforward to have the one line command line instruction to be run automatically, perhaps on a timed basis. The SQL statements in the executed file doesn't have to be a simple query. It could be appending rows of data to a series of tables which become available on a regular basis. +1. It aids automation. It would be straightforward to have the one line command line instruction to be run automatically, perhaps on a timed basis. The SQL statements in the executed file don't have to be a simple query. It could be appending rows of data to a series of tables which become available on a regular basis. 2. It aids reproducibility. Although it is convenient to use the DB Browser application to play around and try things out, eventually you will decide on approach, create relevant queries to perform your analysis or research and at this point you will need to ensure that the complete sequence is documented and is reproducible. This is what the file of SQLite commands will do for you. diff --git a/episodes/09-joins.md b/episodes/09-joins.md index 19c8a323..bedbf3db 100644 --- a/episodes/09-joins.md +++ b/episodes/09-joins.md @@ -76,17 +76,17 @@ ON a.Id = b.Id AND a.B_no_membrs > 12 AND b.D_curr_crop = 'maize' There are several things to notice about this query: -1. We have used alias' for the table names in the same way as we used with columns in a previous lesson. In this case though, it is not to provide - more meaningful names, in fact alias' for tables are often single letters to save key strokes. +1. We have used `alias` for the table names in the same way as we used with columns in a previous lesson. In this case though, it is not to provide +more meaningful names, in fact `alias` for tables are often single letters to save key strokes. 2. We use the table alias as a prefix, plus a '.' when we refer to a column name from the table. You don't have to do this, but it generally adds clarity to the query. 3. You will need to use an alias when you need to refer to a column with the same name in both tables. In our case we need to compare the `Id` column in both tables. -4. In the select clause, we list all of the columns, from both table that we want in the output. We use the alias' for clarity. If the column name is not ambiguous, i.e it only occurs in one of the tables it - can be omitted, but as we have said it is better to leave it in for clarity. +4. In the select clause, we list all of the columns, from both tables that we want in the output. We use the alias' for clarity. If the column name is not ambiguous, i.e it only occurs in one of the tables it +can be omitted, but as we have said it is better to leave it in for clarity. 5. The name of the second table is given in the `JOIN` clause. 6. The conditions of the `JOIN` are given in the `ON` clause. The `ON` clause is very much like a `WHERE` clause, in that you specific expressions which restrict what rows are output. In our example we have three expressions. The last two are the individual expressions we used in the previous, single table queries. The first expression `a.Id = b.Id` is the expression which determines how we want the two tables to be joined. - We are only interested in rows from both table where the `Id` values match. + We are only interested in rows from both tables where the `Id` values match. When we run this query we get output like the following: @@ -140,7 +140,7 @@ Although typically the values being matched from the first table are a unique (D to be unique. This is why in the results of our previous query there are two entries with Id 111. In the second table there are two records with Id 111 and so the record from the first table gets combined with both the records in the second table and two records are output. -Because every Farm grows some crops, there will be at least one record for each Id output. I for whatever reason the was a Farm with no crops +Because every Farm grows some crops, there will be at least one record for each Id output. If for whatever reason there was a Farm with no crops then there would be no record output for that Farm Id. Similarly if there was an entry in the Crops table with an Id which didn't match any of the Ids in the Farms table, then it would not be output. There is only an output record when the two columns have matching values. @@ -158,8 +158,8 @@ The relational design makes use of multiple tables as a way of avoiding repetiti | Join Type | What it does | | ---------------- | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Inner Join | Matched rows in both tables are returned | -| Left outer join | All row in the left hand table are returned along with the matches from the right hand table or NULLs if there is no match | -| Right outer join | All row in the right hand table are returned along with the matches from the left hand table or NULLs if there is no match | +| Left outer join | All rows in the left hand table are returned along with the matches from the right hand table or NULLs if there is no match | +| Right outer join | All rows in the right hand table are returned along with the matches from the left hand table or NULLs if there is no match | | Full outer join | All rows from both tables are returned, with NULLs where there are no matches | | Cross join | Each row in the first table will be matched with every row in the second table. It is possible to imagine situations where this is required but in most cases it is a mistake and un-intended. | @@ -180,7 +180,7 @@ However it will also be important for you to establish rows in both of the table - You may not care that some are missing - You may need to explain why some are missing -To do this you will want to use a `FULL OUTER JOIN` or in the case of SQLite a `LEFT OUTER JOIN` run twice using both tables in the `FROM` and `JOIN` clauses. We can demonstrate ability +To do this you will want to use a `FULL OUTER JOIN` or in the case of SQLite a `LEFT OUTER JOIN` run twice using both tables in the `FROM` and `JOIN` clauses. We can demonstrate the `LEFT OUTER JOIN` using the Crops\_rice table we created earlier. The query below is similar to our original join except that we are now joining with the crops\_rice table and we have dropped the additional criteria. @@ -203,7 +203,7 @@ table are shown as NULL. Joins are not restricted to just two tables. You can have any number, but the more you have the more unreadable the SQL query is likely to become. Quite often you can create views to hide this complexity. Our original question was: 'Which Farms with more than 12 people in the household grow Maize?' We found the number of people in the household from the Farms table and the crops they grew in the crops table. -Suppose we now wanted to change the question to be: For Farms with more than 12 people in the household how much land is devoted to growing Maize? In addition to the previous +Suppose we now wanted to change the question to be: For Farms with more than 12 people in the household, how much land is devoted to growing Maize? In addition to the previous requirements we now also need the size of the plots growing maize. This information is only contained in the `plots` table. The `plots` table has both an Id column which we can use to join it with the Farms column. There is also a plot\_Id column which is used to indicate the number of the plot within the Farm. The `crops` table also has a plot\_id column used for the same purpose. @@ -229,7 +229,7 @@ Things to notice: 1. There is a `JOIN` clause for each of the additional tables 2. But there is only one `ON` clause containing all of the needed criteria. -3. The two criteria in brackets represents the join of the `plots` table to the `Crops` table. (The brackets aren't needed, I just added them for clarity). +3. The two criteria in brackets represent the join of the `plots` table to the `Crops` table. (The brackets aren't needed, I just added them for clarity). The results look like this: diff --git a/episodes/10-other-environments.md b/episodes/10-other-environments.md index 1dbd691f..51a631de 100644 --- a/episodes/10-other-environments.md +++ b/episodes/10-other-environments.md @@ -28,7 +28,7 @@ ODBC - Open Database Connectivity (or Connector) is a piece of software, often The installation of the SQLite ODBC driver for a Windows machine is explained in the [SQL setup document](../learners/setup.md) . -So far in this lesson we have accessed our SQLite database either through the DB Browser application or directly using the command line shell. Each of these methods have their own advantages. The DB Browser application provides a simple GUI (Graphical User Interface), for development and testing new queries. The shell aids automation of tasks such as adding rows to a table or allowing whole scripts of SQL commands to be run consecutively without user intervention. In both of these methods, we have seen that the 'outputs' can be saved to `csv` files from where they can be read into other applications or programs for futher processing. Using ODBC misses out the middle man (the file of output). The application or program connects directly to the SQLite database, sends it an SQL query, receives the output from that query and processes the results in an appropriate fashion. +So far in this lesson we have accessed our SQLite database either through the DB Browser application or directly using the command line shell. Each of these methods have their own advantages. The DB Browser application provides a simple GUI (Graphical User Interface), for development and testing new queries. The shell aids automation of tasks such as adding rows to a table or allowing whole scripts of SQL commands to be run consecutively without user intervention. In both of these methods, we have seen that the 'outputs' can be saved to `csv` files from where they can be read into other applications or programs for further processing. Using ODBC misses out the middle man (the file of output). The application or program connects directly to the SQLite database, sends it an SQL query, receives the output from that query and processes the results in an appropriate fashion. In the case of Excel the tabular results of the query are displayed in a worksheet. For R and Python the results are assigned to a suitable variable from where they can be examined or further processed. @@ -74,9 +74,9 @@ Select the Farms table and then click the '>' button to select all of the column They will be displayed in the right hand pane. This is the equivalent of the `SELECT *` SQL clause that we have used before. If you click the '+' button to the left of the table name, a full list of the column names is displayed allowing you to select individual columns for inclusion. Click **Next** -6. Subsequent windows allow you to filter the rows returned, this is equivalent to adding a `WHERE` clause to the query and finally you can have the returned rows sorted, equivalent to a `SORT BY` clause. We shall just default these options. The final window asks us if we want to return the data to Excel or further edit the query we have built up using Microsoft query. We will leave the default action of rturning the data to Excel. Click **Finish** +6. Subsequent windows allow you to filter the rows returned, this is equivalent to adding a `WHERE` clause to the query and finally you can have the returned rows sorted, equivalent to a `SORT BY` clause. We shall just default these options. The final window asks us if we want to return the data to Excel or further edit the query we have built up using Microsoft query. We will leave the default action of returning the data to Excel. Click **Finish** -The overall effect of this wizard is to construct an SQL query, in this case `SELECT * FROM Farms` send it to the SQLite system to be run and then to recieve back the results. +The overall effect of this wizard is to construct an SQL query, in this case `SELECT * FROM Farms` send it to the SQLite system to be run and then to receive back the results. ![](fig/SQL_10_return_data.png){alt='SQL\_10\_return\_data'} @@ -120,7 +120,7 @@ data dbClearResult(results) ``` -We will not discuss the working of the code, that is covered in the Python and R lessons. +We will not discuss the working of the code; that is covered in the Python and R lessons. Even without coding experience of these languages, you will be able to spot that in both cases we need to specify a connection string (the SQLite database filename) and also the text of the query itself. In both cases the results are stored in a variable object of the language. diff --git a/instructors/instructor-notes.md b/instructors/instructor-notes.md index fa168ad3..3463d441 100644 --- a/instructors/instructor-notes.md +++ b/instructors/instructor-notes.md @@ -79,7 +79,7 @@ How to automate a script is covered. The need for table joins is discussed The different types of joins is discussed and why you may need to do more than just inner joins to investigate your data. -There are examples of usingthe `join` and `on` SQL syntax. +There are examples of using the `join` and `on` SQL syntax. There is more discussion on Alias'. [Using database tables in other environments](../episodes/10-other-environments.md)