datacarpentry · maneesha · Dec 16, 2025
diff --git a/episodes/01-relational-database.md b/episodes/01-relational-database.md
@@ -36,7 +36,7 @@ Databases are designed to allow efficient querying against very large tables, mo
 
 ## What is a table?
 
-As were have noted above, a single table is very much like a spreadsheet. It has rows and it has columns. A row represents a single observation and the columns represents the various variables contained within that observation.
+As we have noted above, a single table is very much like a spreadsheet. It has rows and it has columns. A row represents a single observation and the columns represent the various variables contained within that observation.
 Often one or more columns in a row will be designated as a 'primary key' This column or combination of columns can be used to uniquely identify a specific row in the table.
 The columns typically have a name associated with them indicating the variable name. A column always represents the same variable for each row contained in the table. Because of this the data in each column will always be of the same *type*, such as an Integer or Text, of values for all of the rows in the table. Datatypes are discussed in the next section.
 
@@ -108,7 +108,7 @@ for these and use the built-in Date And Time Functions to manipulate them. We wi
 
 ## Why do tables have primary key columns?
 
-Whenever you create a table, you will have the option of designating one of the columns as the primary key column. The main property of the primary key column is that the values contained in it must uniquely identify that particular row. That is you cannot have duplicate primary keys. This can be an advantage which adding rows to the table as you will not be allowed to add the same row (or  a row with the same primary key) twice.
+Whenever you create a table, you will have the option of designating one of the columns as the primary key column. The main property of the primary key column is that the values contained in it must uniquely identify that particular row. That is you cannot have duplicate primary keys. This can be an advantage when adding rows to the table as you will not be allowed to add the same row (or  a row with the same primary key) twice.
 
 The primary key column for a table is usually of type Integer although you could have Text. For example if you had a table of car information, then the "Reg\_No" column could be made the primary key as it can be used to uniquely identify a particular row in the table.
 

diff --git a/episodes/02-db-browser.md b/episodes/02-db-browser.md
@@ -52,13 +52,13 @@
 
 ![](fig/DB_Browser_run_2.png){alt='Data Browser Preferences'}
 
-Towards the bottom there is a section dealing with Field colors. You will see three bars below the word Text, to the right there are in fact three invisible bars for the Background. Click in the area for the Background color for NULL. A colour selector window will open, select Red. The bar will turn Red. This is now the default background cell colour that will be used to display NULL values in you tables. We will discuss the meaning of NULL values in a table in a later episode.
+Towards the bottom there is a section dealing with Field colors. You will see three bars below the word Text, to the right there are in fact three invisible bars for the Background. Click in the area for the Background color for NULL. A colour selector window will open, select Red. The bar will turn Red. This is now the default background cell colour that will be used to display NULL values in your tables. We will discuss the meaning of NULL values in a table in a later episode.
 
 You can now close the preference window by clicking OK.
 
 ## Opening a database

 For this lesson we will be making extensive use of the SQL\_SAFI database. If you do not already have a copy of this database you can download it from [here](data/SQL_SAFI.sqlite).

 To open the database in DB Browser do the following;

@@ -76,7 +76,7 @@
 ![](fig/DB_Browser_run_3.png){alt='Table Actions'}
 
 If you select 'Browse Table', the data from the table is loaded into the 'Browse Data' pane from where it can be examined or filtered.
-You can also select the table you wish to Browse directly from here.
+You can also select the table you wish to browse directly from here.
 
 There are options for 'New Record' and 'Delete Record'. As our interest is in analysing existing data not creating or deleting data, it is unlikely that you will want to use these options.
 
@@ -97,7 +97,7 @@
 On the toolbar at the top there are eight buttons. Left to right they are:
 
 - Open Tab        (creates a new tab in the editor)
-- Open SQL file   (allows you to load a prepared file of SQL into the editor - the tab takes the name of he file)
+- Open SQL file   (allows you to load a prepared file of SQL into the editor - the tab takes the name of the file)
 - Save SQL file   (allows you to save the current contents of the active pane to the local file system)
 - Execute SQL     (Executes all of the SQL statements in the editor pane)
 - Execute current line    (Actually executes whatever is selected)

diff --git a/episodes/03-select.md b/episodes/03-select.md
@@ -172,7 +172,7 @@ WHERE    B17_parents_liv = 'yes'
 ;
 ```
 
-Notice that the columns being used in the `WHERE` clause do not need to returned as part of the `SELECT` clause.
+Notice that the columns being used in the `WHERE` clause do not need to be returned as part of the `SELECT` clause.
 
 You can ensure the precedence of the operators by using brackets. Judicious use of brackets can also aid readability
 

diff --git a/episodes/04-missing-data.md b/episodes/04-missing-data.md
@@ -23,7 +23,7 @@ exercises: 0
 At the beginning of this lesson we noted that all database systems have the concept of a NULL value; Something which is missing and nothing is known about it.
 
 In DB Browser we can choose how we want NULLs in a table to be displayed. When we had our initial look at DB Browser,
-we used the `View | Preference` option to change the background colour of cells in a table which has a `NULL` values as  **red**.
+we used the `View | Preference` option to change the background colour of cells in a table which has `NULL` values as  **red**.
 The example below, using the 'Browse data' tab,  shows a section of the Farms table in the SQL\_SAFI database showing column values which are `NULL`.
 
 ![](fig/SQL_04_Nulls_01.png){alt='Farms NULLs'}
@@ -78,10 +78,10 @@ the value of `NULL` is appropriate.
 
 ## Dealing with missing data
 
-There are several statistical techniques that can be used to allow for `NULL` values, which one you might will depend on what has caused the `NULL` value to be recorded.
+There are several statistical techniques that can be used to allow for `NULL` values. Which one you might use will depend on what has caused the `NULL` value to be recorded.
 
 You may want to change the `NULL` value to something else. For example if we knew that the `NULL` values in the `F14_items_owned` column actually meant that the Farmer had no possessions then we
-might want to change the `NULL` values to '[]' to represent and empty list. We can do that in SQL with an `UPDATE` query.
+might want to change the `NULL` values to '[]' to represent an empty list. We can do that in SQL with an `UPDATE` query.
 
 The update query is shown below. We are not going to run it as it would change our data.
 You need to be very sure of the effect you are going to have before you change data in this way.

diff --git a/episodes/05-creating-new-columns.md b/episodes/05-creating-new-columns.md
@@ -61,11 +61,11 @@
 ## Using built-in functions to create new values

 In addition to using simple arithmetic operations to create new columns, you can also use some of the SQLite built-in functions.
 Full details of the available built-in functions are available from the SQLite.org website [here](https://sqlite.org/lang_corefunc.html#instr).
 
 We will look at some of the arithmetic and statistical functions when we deal with aggregations in a later lesson.
 
-You may have noticed in the output from are last query that the number of decimal places can change from one row to another. In order to make the output
+You may have noticed in the output from our last query that the number of decimal places can change from one row to another. In order to make the output
 more tidy, we may wish to always produce the same number of decimal places, e.g. 2. We can do this using the `ROUND` function.
 
 The `ROUND` function works in a similar way as its spreadsheet equivalent, you specify the value you wish to round and the required number of decimal places.
@@ -113,10 +113,10 @@
 | substr(a,b,c)   | mid(a,b,c)       | 
 | instr(a,b)      | find(a,b)        | 
 
-`instr` can be used to check a character or string of characters occurs within another string.
+`instr` can be used to check if a character or string of characters occurs within another string.
 `substr` can be used to extract a portion of a string based on a starting position and the number of characters required.
 
-In the Farms table, the three columns  A01\_interview\_date, A04\_start and A05\_end are all recognisable as a dates with the A04\_start and A05\_end also including times.
+In the Farms table, the three columns  A01\_interview\_date, A04\_start and A05\_end are all recognisable as dates with the A04\_start and A05\_end also including times.
 These last two are automatically generated by the eSurvey software when the data is collected, i.e. they are automatically entered. The A01\_interview\_date however is manually input.
 In all three cases however SQLite thinks that they are all just strings of characters.
 We can confirm this by selecting the `Database Structure` tab and expanding the `Farms` entry and notice that the data type for all three columns is listed as 'TEXT'
@@ -268,7 +268,7 @@
 ```
 
 By default the `ORDER BY` clause will sort in ascending order, smallest to
-biggest; we can make this explicit by usingthe `ASC` keyword. Or if we want to
+biggest; we can make this explicit by using the `ASC` keyword. Or if we want to
 sort in descending order we can use the `DESC` keyword.
 
 ```sql
@@ -296,7 +296,7 @@
 ;
 ```
 
-There is a more general form which allows to to perform any kind of test.
+There is a more general form which allows us to perform any kind of test.
 
 ## Using SQL syntax to create ‘binned' values
 

diff --git a/episodes/06-aggregation.md b/episodes/06-aggregation.md
@@ -22,7 +22,7 @@
 
 ## Using built-in statistical functions
 
-Aggregate functions are used perform some kind of mathematical or statistical calculation across a group of rows. The rows in each group are determined
+Aggregate functions are used to perform some kind of mathematical or statistical calculation across a group of rows. The rows in each group are determined
 by the different values in a specified column or columns.  Alternatively you can aggregate across the entire table.
 
 If we wanted to know the minimum, average and maximum values of the 'A11\_years\_farm' column across the whole Farms table, we could write a query such as this;
@@ -38,7 +38,7 @@
 This sort of query provides us with a general view of the values for a particular column or field across the whole table.

 `min` , `max` and `avg` are built-in aggregate functions in SQLite (and any other SQL database system). There are other such functions available.
 A complete list can be found in the SQLite documentation [here](https://sqlite.org/lang_aggfunc.html).

 It is more likely that we would want to find such values for a range, or multiple ranges of rows where each range is determined by the
 values of some other column in the table. Before we do this we will look at how we can find out what different values are contained in a given column.
@@ -76,7 +76,7 @@
 
 ![](fig/SQL_06_villages.png){alt='Villages'}
 
-The problem with allowing free-form text quite obvious. Having two villages, one called 'Massequece' and the other called 'Massequese' is unlikely.
+The problem with allowing free-form text may be quite obvious. Having two villages, one called 'Massequece' and the other called 'Massequese' is unlikely.
 
 Detecting this type of problem in a large dataset can be very difficult if you are just 'eyeballing' the content. This small SQL query makes it very clear,
 and in the OpenRefine lesson we provide approaches to detecting and correcting such errors. SQL is not the best tool for correcting this type of error.
@@ -110,7 +110,7 @@
 
 ## The `GROUP BY` clause to summarise data
 
-Just knowing the combinations is of limited use. You really want to know **How many** of each of the values there are.
+Just knowing the combinations is of limited use. You really want to know **how many** of each of the values there are.
 To do this we use  the `GROUP BY` clause.
 
 ```sql
@@ -124,7 +124,7 @@
 
 In the first example of this episode, three aggregations were performed over the single column 'A11\_years\_farm'.
 In addition to calculating multiple aggregation values over a single column, it is also possible to aggregate over multiple columns by specifying
-them in all in the `SELECT` clause **and** the `GROUP BY` clause.
+them all in the `SELECT` clause **and** the `GROUP BY` clause.
 
 The grouping will take place based on the order of the columns listed in the `GROUP BY` clause. There will be one row returned for each unique combination of the columns mentioned in the `GROUP BY` clause
 

diff --git a/episodes/07-creating-tables-views.md b/episodes/07-creating-tables-views.md
@@ -77,16 +77,16 @@ If any of the datatypes are not as expected or wanted we can change them.
 In this particular case DB Browser correctly selected the datatypes. Notice that the `A01_interview_date` was allocated a datatype of 'TEXT'. This isn't a problem
 as we have to use the Date and Time functions to manipulate dates anyway.
 
-Notice that the bottom pane in the Window shows the SQL DDL statement that would create the table that you modifying.
+Notice that the bottom pane in the Window shows the SQL DDL statement that would create the table that you are modifying.
 
 When you change one of the columns from TEXT to INTEGER, this is immediately reflected in the Create Table statement.  
 It is slightly misleading because in fact we are modifying an existing table and in SQL-speak, this would be an **Alter Table...** statement.
 However it does illustrate quite well the fact that whatever you do in the GUI, it is essentially translated into an SQL statement and executed.
-You could copy and paste this definition into the SQL editor and if you change the table name before you ran it, you would create a new table with that name.
+You could copy and paste this definition into the SQL editor and if you changed the table name before you ran it, you would create a new table with that name.
 This new table would have no data in it. This is how the insert table wizard works. It uses the header row from your data to create a `CREATE TABLE` statement which it runs.
 It then transforms each of the rows of data into SQL `INSERT INTO...` statements which it also runs to get the data into the table.
 
-In addition to changing the data types there are several other options which can be set when you are creating of modifying a table.
+In addition to changing the data types there are several other options which can be set when you are creating or modifying a table.
 For our tables we don't need to make use of them but for completeness we will describe what they are;
 
 **PK** - Or Primary Key, a unique identifier for the row. In the Farms table, there is an `Id` column which uniquely identifies a Farm.
@@ -98,7 +98,7 @@ This could  act as a unique identifier for the row as a whole. We could mark thi
 
 In real datasets missing values are quite common and we have already looked at ways of dealing with them when they occur in tables. If you were to **check** this box and the data did have missing values for this column, the record from the file would be rejected and the load of the file will fail.
 
-**U** - Or Unique. This allows you to say that the contents of the column, which is not the primary key column has to have unique values in it. Like Allow Null this is another way of providing some data validation as the data is imported. Although it doesn't really apply with the DB Browser import wizard as the data is imported before you are allowed to set this option.
+**U** - Or Unique. This allows you to say that the contents of the column, which is not the primary key column, has to have unique values in it. Like Allow Null, this is another way of providing some data validation as the data is imported (although it doesn't really apply with the DB Browser import wizard as the data is imported before you are allowed to set this option).
 
 **Default** - This is used in conjunction with 'Not Null', if a value is not provided in the dataset, then if provided, the default value for that column will be used.
 
@@ -133,7 +133,7 @@ line added.
 
 ## Creating a table using an SQL command
 
-You could copy and paste this definition into the SQL editor and if you change the table name before you ran it, you would create a new table with that name.
+You could copy and paste this definition into the SQL editor and if you changed the table name before you ran it, you would create a new table with that name.
 This new table would have no data in it. This is how the insert table wizard works. It uses the header row from your data to create a `CREATE TABLE` statement which it runs.
 It then transforms each of the rows of data into SQL `INSERT INTO...` statements which it also runs to get the data into the table.
 
@@ -172,7 +172,7 @@ SELECT Id,
 FROM Farms;
 ```
 
-If we wanted to create a table from the Crops table which contains only the rows where the D\_curr\_crop value was 'rice' we could use a query like this:
+If we wanted to create a table from the Crops table, which contains only the rows where the D\_curr\_crop value was 'rice' we could use a query like this:
 
 ```sql
 CREATE TABLE crops_rice AS
@@ -215,7 +215,7 @@ The advantage of using Views is that it allows you to restrict how you see the d
 In the example we used above it may be far easier to work with only the 6 columns that we need from the full Farms table
 rather than the full table with 61 columns.
 
-A View isn't restricted to simple `SELECT` statements it can be the result of aggregations and joins as well.
+A View isn't restricted to simple `SELECT` statements. It can be the result of aggregations and joins as well.
 This can help reduce the complexity of queries based on the View and so aid readability.
 
 :::::::::::::::::::::::::::::::::::::::: keypoints
-Original file line number
+Diff line change
@@ Expand Up / @@ -172,7 +172,7 @@ WHERE B17_parents_liv = 'yes' @@
     ;
     ```
-    Notice that the columns being used in the `WHERE` clause do not need to returned as part of the `SELECT` clause.
+    Notice that the columns being used in the `WHERE` clause do not need to be returned as part of the `SELECT` clause.
     You can ensure the precedence of the operators by using brackets. Judicious use of brackets can also aid readability
@@ Expand Down @@