From 20ed9d24da64654db0ee950d9ae2cc9185da1419 Mon Sep 17 00:00:00 2001 From: Stuart Napier Date: Thu, 3 Jul 2025 13:38:39 +0100 Subject: [PATCH 1/2] Changes to session 2 section 3 exercises. Remove comma from working script 3.1.4 Add explanation of UKgas data set in exercise 3.2. Fix link not working in script due to brackets. Add suggested name for values_to variable in pivot_longer in exercise 2. Clarify purpose of bonus questions with more explanation. Add pivot_wider() code to solution. Add ungroup() to exercise solutions. Update question text in .rmd files. --- session2/intro_to_R_session2.Rmd | 204 +++++++++++------- session2/intro_to_R_session2_incomplete.Rmd | 16 +- session2/intro_to_R_session2_solutions.R | 32 +-- session2/intro_to_r_session2_working_script.R | 14 +- 4 files changed, 167 insertions(+), 99 deletions(-) diff --git a/session2/intro_to_R_session2.Rmd b/session2/intro_to_R_session2.Rmd index 637a24d..2da1abf 100644 --- a/session2/intro_to_R_session2.Rmd +++ b/session2/intro_to_R_session2.Rmd @@ -3,6 +3,9 @@ title: "Intro to R session2" author: "" date: "2023" output: html_document +editor_options: + markdown: + wrap: 72 --- ```{r setup, include=FALSE} @@ -11,16 +14,20 @@ knitr::opts_chunk$set(echo = TRUE, eval = TRUE, warning = FALSE, message = FALSE ## 1. Introduction -In this session, we will analyse a mock dataset. +In this session, we will analyse a mock dataset. -Social Security Scotland administers four benefits: Benefit A - Benefit D. You have collected data summarising the monthly applications for each of these benefits between January 2020 and January 2025. +Social Security Scotland administers four benefits: Benefit A - Benefit +D. You have collected data summarising the monthly applications for each +of these benefits between January 2020 and January 2025. You need to: - - 1. Import and explore the data - 2. Manipulate the data to prepare it for plotting - 3. Plot the data over time for each of the benefits as well as the total - 4. Plot a bar graph with error bars to compare the mean number of applications for the four benefits + +1. Import and explore the data +2. Manipulate the data to prepare it for plotting +3. Plot the data over time for each of the benefits as well as the + total +4. Plot a bar graph with error bars to compare the mean number of + applications for the four benefits ```{r LoadandSource} @@ -34,7 +41,9 @@ library(lubridate) ### 2.1 Import the data -The data is saved in the session 2 folder, in a file "benefits.csv". We import the data to create a tibble called benefits. We view the first few lines to see what the data is that is included +The data is saved in the session 2 folder, in a file "benefits.csv". We +import the data to create a tibble called benefits. We view the first +few lines to see what the data is that is included ```{r ImportAndExplore} @@ -69,7 +78,8 @@ summary(benefits) Are there any points that are indicative of errors in data capture? -It is good practice to look if there are any missing values in the data. The function is.na() will tell you if there are any missing data. +It is good practice to look if there are any missing values in the data. +The function is.na() will tell you if there are any missing data. ```{r Explore3} @@ -77,20 +87,21 @@ is.na(benefits) ``` - How can we get the total number of missing values in the dataset? + ```{r Explore4} sum(is.na(benefits)) ``` -If you want to return the index of datapoints with missing data, you can use the function which() - +If you want to return the index of datapoints with missing data, you can +use the function which() ### 2.3 Exploratory plots -Now we can create quick exploratory plots of the data. Look at the help function for plot() +Now we can create quick exploratory plots of the data. Look at the help +function for plot() ```{r QuickPlot} @@ -102,44 +113,58 @@ plot(benefits$date, benefits$D) ``` - ## 3. Data Wrangling -R has powerful (and pretty plotting) functionality in the library ggplot2, which is part of the tidyverse environment. All of tidyverse uses long data. In long format, each row corresponds to data from one measurement, and repeated measurements are on different rows. +R has powerful (and pretty plotting) functionality in the library +ggplot2, which is part of the tidyverse environment. All of tidyverse +uses long data. In long format, each row corresponds to data from one +measurement, and repeated measurements are on different rows. -We need to get the data in the correct structure. Tidyverse has two functions for reshaping datasets, pivot_wider() and pivot_longer(). -pivot_longer() transforms a dataset so that it is has more rows and fewer columns, e.g. with one measurement per row, whilst pivot_wider() transforms a dataset so that it has more columns and fewer rows e.g. one row per observation. +We need to get the data in the correct structure. Tidyverse has two +functions for reshaping datasets, pivot_wider() and pivot_longer(). +pivot_longer() transforms a dataset so that it is has more rows and +fewer columns, e.g. with one measurement per row, whilst pivot_wider() +transforms a dataset so that it has more columns and fewer rows e.g. one +row per observation. -(Note, pivot_longer and a related function, pivot_wider, became available in a relatively recent version of R. Older code may use gather() and spread()) +(Note, pivot_longer and a related function, pivot_wider, became +available in a relatively recent version of R. Older code may use +gather() and spread()) ### 3.1 Tidying the Data -Our dataset is currently in wide format, with the number of applications for each benefit in different columns. +Our dataset is currently in wide format, with the number of applications +for each benefit in different columns. ![Wide Data](wide-data.svg) -We want to reshape our data to long format, with a variable called "benefit" specifying the benefit, and one called "apps" for the number of applications. +We want to reshape our data to long format, with a variable called +"benefit" specifying the benefit, and one called "apps" for the number +of applications. ![Long Data](long-data.svg) +As we want to reshape the dataset to have fewer columns and more rows, +we will use the pivot_longer() function. See the help section for all +the options available for pivot_longer() -As we want to reshape the dataset to have fewer columns and more rows, we will use the pivot_longer() function. See the help section -for all the options available for pivot_longer() +pivot_longer() and pivot_wider() are powerful functions that can reshape +datasets in a variety of ways. For pivoting longer we need to provide +the dataset, and the columns we want to pivot to a long format. The +remaining parameters are all optional and are used to tell R how we want +the reshaping to be done. -pivot_longer() and pivot_wider() are powerful functions that can reshape datasets in a variety of ways. -For pivoting longer we need to provide the dataset, and the columns we want to pivot to a long format. The -remaining parameters are all optional and are used to tell R how we want the reshaping to be done. - -Typically the two additional parameters used are names_to and values_to which specify the variables containing -the pivoted variables names and values respectively. +Typically the two additional parameters used are names_to and values_to +which specify the variables containing the pivoted variables names and +values respectively. In our case we want to: -* use the benefits dataset -* pivot the columns A, B, C and D -* add a column 'benefit' containing the name of the benefit -* add a column 'apps' containing the values of the number of applications, for each benefit - +- use the benefits dataset +- pivot the columns A, B, C and D +- add a column 'benefit' containing the name of the benefit +- add a column 'apps' containing the values of the number of + applications, for each benefit ```{r WideToLong} @@ -160,10 +185,12 @@ head(benefit_long) ``` -To get the total applications for each month, we need to use group_by() and summarise() from the package dplyr which is part of the tidyverse suite. - -Once we summarise, we can add this to our benefit_long dataframe. We need to specify the benefit, and here we use the label "Total" +To get the total applications for each month, we need to use group_by() +and summarise() from the package dplyr which is part of the tidyverse +suite. +Once we summarise, we can add this to our benefit_long dataframe. We +need to specify the benefit, and here we use the label "Total" ```{r AddTotal} @@ -181,23 +208,23 @@ benefit_long <- bind_rows(benefit_long, benefit_total) ``` - ### 3.2 Exercises -1. Read in "UKgas.csv" and inspect the data. -(The data has been created from one of R datasets https://www.rdocumentation.org/packages/datasets/versions/3.6.2/topics/UKgas) +1. Read in "UKgas.csv" and inspect the data. (The data has been created + from one of R datasets + ). + The data set is a time series of gas consumption in the UK from 1960Q1 to 1986Q4, in millions of therms (a unit of heat energy). -2. Create a new tibble of the data in long format with a column to specify the quarter. +2. Create a new tibble of the data in long format with a column to specify the quarter, and a column called "gas_consumption" to show the values. -3. Compute the mean quarterly UKgas consumption across years (Your new tibble will have four rows and 2 columns) -4. Compute the mean gas consumption for each year (Your tibble will have 27 rows and 2 columns) +3. Compute the mean quarterly UKgas consumption across years (Your new + tibble will have four rows and 2 columns) -5. **Bonus:** Convert your long tibble back to wide. This should be the same as the UKgas data. -Compute the mean gas consumption by year. -Hint: Have a look at https://stackoverflow.com/questions/50352735/calculate-the-mean-of-some-columns-using-dplyrmutate -(Your tibble will have 27 rows and 6 columns). As you can see, working with long data is simpler in R. +4. Compute the mean gas consumption for each year (Your tibble will + have 27 rows and 2 columns) +5. **Bonus:** Convert your long tibble back to wide using the 'pivot_wider' function. This should be the same as the UKgas data. Use the wide data set to compute the mean gas consumption by year. Hint: Have a look at (Your final tibble will have 27 rows and 6 columns). As you can see, working with long data is simpler in R. ## 4. Plotting with ggplot @@ -205,12 +232,18 @@ Hint: Have a look at https://stackoverflow.com/questions/50352735/calculate-the- The syntax for ggplot has the following elements: - 1. create a ggplot object which is defined by the dataset, and the aesthetic (including the variables). - 2. Specify geometries (i.e. plot types). Geometries such as plotting points, lines and bars are available. The specification of geometries enables one to plot various plot types on the same set of axes. See help(package = ggplot2) under the letter g for the various geometries available. - 3. Customise the appearance. Colours, background, labels etc. (optional) - +1. create a ggplot object which is defined by the dataset, and the + aesthetic (including the variables). +2. Specify geometries (i.e. plot types). Geometries such as plotting + points, lines and bars are available. The specification of + geometries enables one to plot various plot types on the same set of + axes. See help(package = ggplot2) under the letter g for the various + geometries available. +3. Customise the appearance. Colours, background, labels etc. + (optional) + Here, we will plot the data with a point geometry - + ```{r TimeSeriesPlot1} time_series_plot <- ggplot(data = benefit_long, aes(x=date, y = apps)) + @@ -223,8 +256,8 @@ time_series_plot ``` -This has combined the data for all the benefits. We can separate these by colour by using the aesthetic to specify the grouping variable - +This has combined the data for all the benefits. We can separate these +by colour by using the aesthetic to specify the grouping variable ```{r} time_series_plot2 <- ggplot(data = benefit_long, @@ -248,8 +281,8 @@ time_series_plot3 ``` - -Now that we can see the plots, let's fit lines, to the points, and a straight line for the trend +Now that we can see the plots, let's fit lines, to the points, and a +straight line for the trend ```{r TimeSeriesPlot4} @@ -261,12 +294,12 @@ time_series_plot4 ``` - -Finally, let's change the axes labels, legend title and background using themes. -Themes allow us to customize the appearance. ggplot comes with several themes -such as theme_bw(), or you can create your own themes. For examples see the -Scottish Government theme [sgplot](https://github.com/DataScienceScotland/sgplot), -or the [NRS theme](https://github.com/DataScienceScotland/nrsplot) +Finally, let's change the axes labels, legend title and background using +themes. Themes allow us to customize the appearance. ggplot comes with +several themes such as theme_bw(), or you can create your own themes. +For examples see the Scottish Government theme +[sgplot](https://github.com/DataScienceScotland/sgplot), or the [NRS +theme](https://github.com/DataScienceScotland/nrsplot) ```{r TimeSeriesPlot5} @@ -280,8 +313,8 @@ time_series_plot5 ``` - -Finally, we could perform the wrangling and plotting in one concise chunk +Finally, we could perform the wrangling and plotting in one concise +chunk ```{r CombineSteps} #Import @@ -329,10 +362,12 @@ time_series_plot ``` - ### 4.2 Plotting the mean applications per year as a bar graph -To take the average by year, we need to create a factor variable for the year which we can use for grouping. The lubridate package which comes with tidyverse has a convenient function year(). We will need to mutate benefit_long to append this factor variable +To take the average by year, we need to create a factor variable for the +year which we can use for grouping. The lubridate package which comes +with tidyverse has a convenient function year(). We will need to mutate +benefit_long to append this factor variable #### Wrangling @@ -346,7 +381,9 @@ benefit_long <- benefit_long %>% head(benefit_long) ``` -Now we group the data and summarise. The summary statistics that we will evaluate are the +- sd which will be used as error bars. We only want the yearly averages by benefit, so we will (un)select "Total" +Now we group the data and summarise. The summary statistics that we will +evaluate are the +- sd which will be used as error bars. We only want +the yearly averages by benefit, so we will (un)select "Total" ```{r SummariseByYear} @@ -363,14 +400,18 @@ head(benefit_by_year) ``` - #### Plotting bar graph -Now we create a ggplot object. The x-axis will show the year, and will use colour for the benefit. +Now we create a ggplot object. The x-axis will show the year, and will +use colour for the benefit. -We will apply the geom_col and geom_errorbar geometries. Note that geom_col by default plots a stacked bar graph. To unstack the graph you will need to use the function dodge `<- position_dodge(...)` and then the argument `"position = dodge"` needs to be passed to the geometries. +We will apply the geom_col and geom_errorbar geometries. Note that +geom_col by default plots a stacked bar graph. To unstack the graph you +will need to use the function dodge `<- position_dodge(...)` and then +the argument `"position = dodge"` needs to be passed to the geometries. -We will remove the label from the x-axis and change the labels for the y-axis to "Average yearly applications" +We will remove the label from the x-axis and change the labels for the +y-axis to "Average yearly applications" ```{r BarGraph} @@ -400,14 +441,25 @@ bar_graph_plot ### 4.3 Exercises -1. Plot the UKgas consumption by year as a line graph, with quarters shown in different colours. Change the axes labels to something of your choice and add a title. Use the `UKgas_l` data set created in exercise 3.2. +1. Plot the UKgas consumption by year as a line graph, with quarters + shown in different colours. Change the axes labels to something of + your choice and add a title. Use the `UKgas_l` data set created in + exercise 3.2. -2. Plot the same as above, but include a line for the mean gas consumption across quarters. You will first need to append the UKgas_by_year to your data +2. Plot the same as above, but include a line for the mean gas + consumption across quarters. You will first need to append the + UKgas_by_year to your data -3. Create the same plot as above (including the mean), but use thin lines for quarter, and a thick line for the mean. You will need to add a new numeric variable to the data used in the previous exercise that specifies a value for line thickness. See the examples in `?geom_line` for details around specifying aesthetics for the line graph and how to do this by group. You will also need to look at `?scale_linewidth` +3. Create the same plot as above (including the mean), but use thin + lines for quarter, and a thick line for the mean. You will need to + add a new numeric variable to the data used in the previous exercise + that specifies a value for line thickness. See the examples in + `?geom_line` for details around specifying aesthetics for the line + graph and how to do this by group. You will also need to look at + `?scale_linewidth` ## 5. Further training -We have only touched on the capabilities of ggplot2. A good set of tutorials on ggplot can be found at -http://r-statistics.co/Complete-Ggplot2-Tutorial-Part1-With-R-Code.html - +We have only touched on the capabilities of ggplot2. A good set of +tutorials on ggplot can be found at + diff --git a/session2/intro_to_R_session2_incomplete.Rmd b/session2/intro_to_R_session2_incomplete.Rmd index fbcdd08..912ee94 100644 --- a/session2/intro_to_R_session2_incomplete.Rmd +++ b/session2/intro_to_R_session2_incomplete.Rmd @@ -173,15 +173,21 @@ benefit_long <- ### 3.2 Exercises -1. Read in "UKgas.csv" and inspect the data. (The data has been created from one of R datasets ) +1. Read in "UKgas.csv" and inspect the data. (The data has been created + from one of R datasets + ). + The data set is a time series of gas consumption in the UK from 1960Q1 to 1986Q4, in millions of therms (a unit of heat energy). -2. Create a new tibble of the data in long format with a column to specify the quarter. +2. Create a new tibble of the data in long format with a column to specify the quarter, and a column called "gas_consumption" to show the values. -3. Compute the mean quarterly UKgas consumption across years (Your new tibble will have four rows and 2 columns) -4. Compute the mean gas consumption for each year (Your tibble will have 27 rows and 2 columns) +3. Compute the mean quarterly UKgas consumption across years (Your new + tibble will have four rows and 2 columns) -5. **Bonus:** Convert your long tibble back to wide. This should be the same as the UKgas data. Compute the mean gas consumption by year. Hint: Have a look at (Your tibble will have 27 rows and 6 columns). As you can see, working with long data is simpler in R. +4. Compute the mean gas consumption for each year (Your tibble will + have 27 rows and 2 columns) + +5. **Bonus:** Convert your long tibble back to wide using the 'pivot_wider' function. This should be the same as the UKgas data. Use the wide data set to compute the mean gas consumption by year. Hint: Have a look at (Your final tibble will have 27 rows and 6 columns). As you can see, working with long data is simpler in R. ## 4. Plotting with ggplot diff --git a/session2/intro_to_R_session2_solutions.R b/session2/intro_to_R_session2_solutions.R index e80f08c..cc9ad4f 100644 --- a/session2/intro_to_R_session2_solutions.R +++ b/session2/intro_to_R_session2_solutions.R @@ -79,14 +79,14 @@ benefit_long <- bind_rows(benefit_long, benefit_total) ### 3.2 Exercises --------------------------------------------------------- #1. Read in "UKgas.csv" and inspect the data. -# (The data has been created from one of R datasets https://www.rdocumentation.org/packages/datasets/versions/3.6.2/topics/UKgas) +# The data has been created from one of R data sets https://www.rdocumentation.org/packages/datasets/versions/3.6.2/topics/UKgas +# The data set is a time series of gas consumption in the UK from 1960Q1 to 1986Q4, in millions of therms (a unit of heat energy). UKgas <- read_csv("./UKgas.csv") head(UKgas) - -#2. Create a new tibble of the data in long format with a column to specify the quarter. +#2. Create a new tibble of the data in long format with a column to specify the quarter, and a column called "gas_consumption" to show the values. UKgas_l <- UKgas %>% pivot_longer(cols = -year, @@ -94,32 +94,40 @@ UKgas_l <- UKgas %>% values_to = "gas_consumption") - #3. Compute the mean quarterly UKgas consumption across years (Your new tibble will have four rows and 2 columns) UKgas_by_quarter <- UKgas_l %>% group_by(quarter) %>% summarise(mean_quarterly_gas = mean(gas_consumption, - na.rm = TRUE)) - + na.rm = TRUE)) %>% + ungroup() #4. Compute the mean gas consumption for each year (Your tibble will have 27 rows and 2 columns) - + UKgas_by_year <- UKgas_l %>% group_by(year) %>% summarise(mean_annual_gas = mean(gas_consumption, - na.rm = TRUE)) - + na.rm = TRUE)) %>% + ungroup() -#5. **Bonus:** Convert your long tibble back to wide. This should be the same as the UKgas data. -# Compute the mean gas consumption by year. +#5. **Bonus:** Convert your long tibble back to wide using the 'pivot_wider' function. This should be the same as the UKgas data. +# Use the wide data set to compute the mean gas consumption by year. # Hint: Have a look at https://stackoverflow.com/questions/50352735/calculate-the-mean-of-some-columns-using-dplyrmutate -# (Your tibble will have 27 rows and 6 columns). As you can see, working with long data is simpler in R. +# (Your final tibble will have 27 rows and 6 columns). As you can see, working with long data is simpler in R. +# Use pivot_wider() to return the data to the original form +UKgas <- UKgas_l %>% + pivot_wider(names_from = "quarter", + values_from = "gas_consumption") + +# Or just re-read in the UKgas data set. +UKgas <- read_csv("./UKgas.csv") + + UKgas <- UKgas %>% mutate(mean_annual_gas =rowMeans(select(., Qtr1, Qtr2, diff --git a/session2/intro_to_r_session2_working_script.R b/session2/intro_to_r_session2_working_script.R index 4989125..80394a0 100644 --- a/session2/intro_to_r_session2_working_script.R +++ b/session2/intro_to_r_session2_working_script.R @@ -67,7 +67,7 @@ head(benefit_long) benefit_total <- benefit_long %>% group_by() %>% summarise(benefit = - apps = , + apps = ) %>% ungroup() @@ -80,14 +80,16 @@ benefit_long <- ### 3.2 Exercises --------------------------------------------------------- #1. Read in "UKgas.csv" and inspect the data. -# (The data has been created from one of R datasets https://www.rdocumentation.org/packages/datasets/versions/3.6.2/topics/UKgas) +# The data has been created from one of R data sets https://www.rdocumentation.org/packages/datasets/versions/3.6.2/topics/UKgas +# The data set is a time series of gas consumption in the UK from 1960Q1 to 1986Q4, in millions of therms (a unit of heat energy). + UKgas <- -#2. Create a new tibble of the data in long format with a column to specify the quarter. +#2. Create a new tibble of the data in long format with a column to specify the quarter, and a column called "gas_consumption" to show the values. UKgas_l <- @@ -108,10 +110,10 @@ UKgas_by_year <- -#5. **Bonus:** Convert your long tibble back to wide. This should be the same as the UKgas data. -# Compute the mean gas consumption by year. +#5. **Bonus:** Convert your long tibble back to wide using the 'pivot_wider' function. This should be the same as the UKgas data. +# Use the wide data set to compute the mean gas consumption by year. # Hint: Have a look at https://stackoverflow.com/questions/50352735/calculate-the-mean-of-some-columns-using-dplyrmutate -# (Your tibble will have 27 rows and 6 columns). As you can see, working with long data is simpler in R. +# (Your final tibble will have 27 rows and 6 columns). As you can see, working with long data is simpler in R. UKgas <- From 2e9c2870d0f8fedf8f75a19e49effa9046cd82b3 Mon Sep 17 00:00:00 2001 From: Stuart Napier Date: Thu, 3 Jul 2025 16:06:44 +0100 Subject: [PATCH 2/2] Insert sgplot examples to section 4.2 and 4.3 Also update section 4 title in .R files. Note in section 3 exercises that data sets created will be re-used in a later exercise. --- session2/intro_to_R_session2.Rmd | 46 +++++++++++++++++-- session2/intro_to_R_session2_incomplete.Rmd | 44 +++++++++++++++++- session2/intro_to_R_session2_solutions.R | 23 +++++++++- session2/intro_to_r_session2_working_script.R | 31 +++++++++++-- 4 files changed, 132 insertions(+), 12 deletions(-) diff --git a/session2/intro_to_R_session2.Rmd b/session2/intro_to_R_session2.Rmd index 2da1abf..7820ed4 100644 --- a/session2/intro_to_R_session2.Rmd +++ b/session2/intro_to_R_session2.Rmd @@ -213,16 +213,16 @@ benefit_long <- bind_rows(benefit_long, benefit_total) 1. Read in "UKgas.csv" and inspect the data. (The data has been created from one of R datasets ). - The data set is a time series of gas consumption in the UK from 1960Q1 to 1986Q4, in millions of therms (a unit of heat energy). + The data set is a time series of gas consumption in the UK from 1960Q1 to 1986Q4, in millions of therms (a unit of heat energy). -2. Create a new tibble of the data in long format with a column to specify the quarter, and a column called "gas_consumption" to show the values. +2. Create a new tibble of the data in long format with a column to specify the quarter, and a column called "gas_consumption" to show the values. Note that the data set created here will be re-used in later exercises. 3. Compute the mean quarterly UKgas consumption across years (Your new tibble will have four rows and 2 columns) 4. Compute the mean gas consumption for each year (Your tibble will - have 27 rows and 2 columns) + have 27 rows and 2 columns). Note that the data set created here will be re-used in later exercises. 5. **Bonus:** Convert your long tibble back to wide using the 'pivot_wider' function. This should be the same as the UKgas data. Use the wide data set to compute the mean gas consumption by year. Hint: Have a look at (Your final tibble will have 27 rows and 6 columns). As you can see, working with long data is simpler in R. @@ -360,6 +360,28 @@ time_series_plot +``` + +#### sgplot + +An internal R package was developed by Alice Hannah called [sgplot](https://scotgovanalysis.github.io/sgplot/), which allows us to easily style the charts we make in an accessible way with an SG theme. + +We can install sgplot in the same way would with other packages, by going to tools > install packages. + +The [documentation](https://scotgovanalysis.github.io/sgplot/) of sgplot gives us some instructions of how to use it, but for this demonstration we will just manually implement the theme and the colours. + +To add these on to our last chart, we can use the code below. Note that our chart has 5 colours so we have to use the extended palette. Accessibility publication guidelines recommend no more than 4 colours in one chart. + +```{r sgplot} +# load the sgplot package +library(sgplot) + +time_series_plot_sg <- time_series_plot + + scale_colour_discrete_sg("main-extended") + # set the colours to SG approved palette + theme_sg() # overwrite with SG theme + +time_series_plot_sg + ``` ### 4.2 Plotting the mean applications per year as a bar graph @@ -437,6 +459,24 @@ bar_graph_plot <- bar_graph_plot+ bar_graph_plot +``` +#### sgplot bar chart + +In section 4.1 we saw how we can use sgplot to style our charts. We can also apply this to our bar chart in a similar way to how we did before. + +Looking at the [documentation](https://scotgovanalysis.github.io/sgplot/reference/scale_colour_discrete_sg.html) for sgplot to see how to use the functions to change the colours of our bar chart and error bars. + +```{r} + +library(sgplot) + +bar_graph_plot_sg <- bar_graph_plot + + scale_fill_discrete_sg("main") + # set the bar colours to SG approved palette + scale_colour_discrete_sg("main") + # set error bar colours to SG palette + theme_sg() # overwrite with SG theme + +bar_graph_plot_sg + ``` ### 4.3 Exercises diff --git a/session2/intro_to_R_session2_incomplete.Rmd b/session2/intro_to_R_session2_incomplete.Rmd index 912ee94..0aa2173 100644 --- a/session2/intro_to_R_session2_incomplete.Rmd +++ b/session2/intro_to_R_session2_incomplete.Rmd @@ -178,14 +178,14 @@ benefit_long <- ). The data set is a time series of gas consumption in the UK from 1960Q1 to 1986Q4, in millions of therms (a unit of heat energy). -2. Create a new tibble of the data in long format with a column to specify the quarter, and a column called "gas_consumption" to show the values. +2. Create a new tibble of the data in long format with a column to specify the quarter, and a column called "gas_consumption" to show the values. Note that the data set created here will be re-used in later exercises. 3. Compute the mean quarterly UKgas consumption across years (Your new tibble will have four rows and 2 columns) 4. Compute the mean gas consumption for each year (Your tibble will - have 27 rows and 2 columns) + have 27 rows and 2 columns). Note that the data set created here will be re-used in later exercises. 5. **Bonus:** Convert your long tibble back to wide using the 'pivot_wider' function. This should be the same as the UKgas data. Use the wide data set to compute the mean gas consumption by year. Hint: Have a look at (Your final tibble will have 27 rows and 6 columns). As you can see, working with long data is simpler in R. @@ -306,6 +306,27 @@ time_series_plot ``` +#### sgplot + +An internal R package was developed by Alice Hannah called [sgplot](https://scotgovanalysis.github.io/sgplot/), which allows us to easily style the charts we make in an accessible way with an SG theme. + +We can install sgplot in the same way would with other packages, by going to tools > install packages. + +The [documentation](https://scotgovanalysis.github.io/sgplot/) of sgplot gives us some instructions of how to use it, but for this demonstration we will just manually implement the theme and the colours. + +To add these on to our last chart, we can use the code below. Note that our chart has 5 colours so we have to use the extended palette. Accessibility publication guidelines recommend no more than 4 colours in one chart. + +```{r sgplot} +# load the sgplot package +library(sgplot) + +time_series_plot_sg <- time_series_plot + + scale_colour_discrete_sg("main-extended") + # set the colours to SG approved palette + theme_sg() # overwrite with SG theme + +time_series_plot_sg +``` + ### 4.2 Plotting the mean applications per year as a bar graph #### Wrangling @@ -370,6 +391,25 @@ bar_graph_plot ``` +#### sgplot bar chart + +In section 4.1 we saw how we can use sgplot to style our charts. We can also apply this to our bar chart in a similar way to how we did before. + +Looking at the [documentation](https://scotgovanalysis.github.io/sgplot/reference/scale_colour_discrete_sg.html) for sgplot to see how to use the functions to change the colours of our bar chart and error bars. + +```{r} + +library(sgplot) + +bar_graph_plot_sg <- bar_graph_plot + + scale_xxx_sg() + # set the bar colours to SG approved palette + scale_xxx_sg() + # set error bar colours to SG palette + theme_sg() # overwrite with SG theme + +bar_graph_plot_sg + +``` + ### 4.3 Exercises 1. Plot the UKgas consumption by year as a line graph, with quarters shown in different colours. Change the axes labels to something of your choice and add a title. Use the `UKgas_l` data set created in exercise 3.2. diff --git a/session2/intro_to_R_session2_solutions.R b/session2/intro_to_R_session2_solutions.R index cc9ad4f..5659d02 100644 --- a/session2/intro_to_R_session2_solutions.R +++ b/session2/intro_to_R_session2_solutions.R @@ -87,7 +87,8 @@ head(UKgas) #2. Create a new tibble of the data in long format with a column to specify the quarter, and a column called "gas_consumption" to show the values. - +# Note that the data set created here will be re-used in later exercises. + UKgas_l <- UKgas %>% pivot_longer(cols = -year, names_to = "quarter", @@ -105,6 +106,7 @@ UKgas_by_quarter <- UKgas_l %>% #4. Compute the mean gas consumption for each year (Your tibble will have 27 rows and 2 columns) +# Note that the data set created here will be re-used in later exercises. UKgas_by_year <- UKgas_l %>% group_by(year) %>% @@ -137,7 +139,7 @@ UKgas <- UKgas %>% -## Section 4: Data Wrangling ---------------------------------------------- +## Section 4: Plotting with ggplot ---------------------------------------------- ### 4.1 Examples ---------------------------------------------------------- @@ -220,8 +222,14 @@ time_series_plot <- ggplot(data = benefit_long, time_series_plot +#4.1.7 using sgplot +library(sgplot) +time_series_plot_sg <- time_series_plot + + scale_colour_discrete_sg("main") + # set the colours to SG approved palette + theme_sg() # overwrite with SG theme +time_series_plot_sg ### 4.2 Examples ---------------------------------------------------------- @@ -264,6 +272,17 @@ bar_graph_plot <- bar_graph_plot+ bar_graph_plot +# use sgplot with the bar chart +library(sgplot) + +bar_graph_plot_sg <- bar_graph_plot + + scale_fill_discrete_sg("main") + # set the bar colours to SG approved palette + scale_colour_discrete_sg("main") + # set error bar colours to SG palette + theme_sg() # overwrite with SG theme + +bar_graph_plot_sg + + ### 4.3 Exercises --------------------------------------------------------- #1. Plot the UKgas consumption by year as a line graph, with quarters shown in different colours. diff --git a/session2/intro_to_r_session2_working_script.R b/session2/intro_to_r_session2_working_script.R index 80394a0..916d94f 100644 --- a/session2/intro_to_r_session2_working_script.R +++ b/session2/intro_to_r_session2_working_script.R @@ -90,7 +90,8 @@ UKgas <- #2. Create a new tibble of the data in long format with a column to specify the quarter, and a column called "gas_consumption" to show the values. - +# Note that the data set created here will be re-used in later exercises. + UKgas_l <- @@ -103,7 +104,8 @@ UKgas_by_quarter <- -#4. Compute the mean gas consumption for each year (Your tibble will have 27 rows and 2 columns) +#4. Compute the mean gas consumption for each year (Your tibble will have 27 rows and 2 columns). +# Note that the data set created here will be re-used in later exercises. UKgas_by_year <- @@ -120,9 +122,9 @@ UKgas <- -## Section 4: Data Wrangling ---------------------------------------------- +## Section 4: Plotting with ggplot ---------------------------------------------- -### 4.1 Examples ---------------------------------------------------------- +### 4.1 Examples --------------------------------------------------------------- #4.1.1 Time series plot time_series_plot <- ggplot(data = benefit_long, @@ -203,7 +205,14 @@ time_series_plot <- ggplot(data = benefit_long, time_series_plot +#4.1.7 using sgplot +library(sgplot) + +time_series_plot_sg <- time_series_plot + + scale_colour_discrete_sg("main") + # set the colours to SG approved palette + theme_sg() # overwrite with SG theme +time_series_plot_sg ### 4.2 Examples (incomplete) --------------------------------------------- @@ -245,6 +254,18 @@ bar_graph_plot <- bar_graph_plot + bar_graph_plot +# use sgplot with the bar chart: consult the documentation of sgplot. +# This page describes which functions to use colour the chart +# https://scotgovanalysis.github.io/sgplot/reference/scale_colour_discrete_sg.html +library(sgplot) + +bar_graph_plot_sg <- bar_graph_plot + + scale_xxx_sg() + # set the bar colours to SG approved palette + scale_xxx_sg() + # set error bar colours to SG palette + theme_sg() # overwrite with SG theme + +bar_graph_plot_sg + ### 4.3 Exercises --------------------------------------------------------- @@ -268,7 +289,7 @@ UKgas_l_with_mean <- UKgas_l %>% # Specify the value that should appear in the "quarter" column mutate() %>% # ensure that column names match - rename() + rename()) g2 <- ggplot(UKgas_l_with_mean, aes(x = , y = ,