BCB744 Intro R Example 2

Below is an example of a test or exam question similar to those you may encounter in the BCB744 Intro R course.

This is a practice exercise. While I will not assess your script, I will provide a rubric to guide your self-evaluation. You are expected to complete the task within the allocated time and submit your script to iKamva by the deadline. This allows me to track participation, and I have reason to believe that engagement with these practice tasks correlates with improved performance in the final exam—a hypothesis supported by prior observations.

For your own benefit, I strongly encourage you to work independently. Doing so will ensure that you develop the problem-solving skills necessary for success in the final assessment.

Due date: Thursday, 13 March 2025, 17:00.

Question 1

The datasets::UKDriverDeaths and datasets::Seatbelts datasets

These datasets are meant to be used together—UKDriverDeaths has the same data as is provided in the variable drivers in seatbelts, but it also provides information about the temporal structure of the Seatbelts dataset. You will have to devise a way to use this temporal information in your analysis.

  • Produce a dataframe that combines the temporal information provided in UKDriverDeaths with the other information in Seatbelts.
  • Produce a faceted graph (using facet_wrap(), placing drivers, front, rear, and VanKilled in facets) showing a timeline of monthly means of deaths (means taken across years) whilst distinguishing between the two levels of law.
  • What do you conclude from your analysis?

Question 2

Please use the nycflights13 package for this exercise.

2.a

What are the 10 most common destinations for flights from NYC airports in 2013, and what is the total distance travelled to each of these airports? Make a 2-panel figure and display these data graphically.

2.b

Which airlines have the most flights departing from NYC airports in 2013? Make a table that lists these in descending order of frequency and shows the number of flights for each airline. In your table, list the names of the airlines as well. Hint: you can use the airlines dataset to look up the airline name based on carrier code.

2.c

Consider only flights that have non-missing arrival delay information. Your answer should include the name of the carrier in addition to the carrier code and the values asked.

  1. Which carrier had the highest mean arrival delay?

  2. Which carrier had the lowest mean arrival delay?

Make sure that your answer includes the name of the carrier and the calculated mean (±SD) delay times, and use a sensible number of decimal digits.

2.d

What were the mean values for the weather variables at the origin airport on the top 10 days with the highest departure delays? Contrast this with a similar view on the 10 days with the lowest departure delays. Your table(s) should include the names of origin airports, the dates with the highest (lowest) departure delays, and the mean (±SD) weather variables on these days.

Can you make any inferences about the effect of weather conditions on flight delays? Are there any problems with this analysis, and how might you improve this analysis for a clearer view of the effect of weather conditions on the ability of flights to depart on time?

2.e

Partition each day into four equal time intervals, e.g. 00:01-06:00, 06:01-12:00, 12:01-18:00, and 18:01-00:00.

  1. At each time interval, what is the proportion of flights delayed at departure? Illustrate your finding in a figure.
  1. Based on your analysis, does the chance of being delayed change throughout the day?

See answer to Question 2.e.i.

  1. For each weekday (1-7) aggregated over 2013, which of the time intervals has the most flights? Create a figure to show your finding.

2.f

Find the 10 planes that spend the longest time (cumulatively) in the air.

  1. For each model, what are the cumulative and mean flight times? In this table, also mention their type, manufacturer, model, number of engines, and speed.
  1. Create a table that lists, for each air-plane identified in (i.), each flight (and associated destination) that it undertook during 2013.
  1. Summarise all the in formation in (ii.) on a map of the USA. Use lines to connect departure and destination locations (each labelled). Different facets in the figure must be used for each of the 10 planes. You can use the alpha value in ggplot2 such that the colour intensity of overlapping flight lines is proportional to the number of flights taken along the path. For bonus marks, ensure that the curvature of Earth is indicated in the flight lines. Hint: such lines would display as curves, not straight lines.

2.g

Limit this analysis to only the coldest three winter and warmest three summer months (show evidence for how this is decided). For each of these two seasons, create a visualisation to explore if there is a relationship between the mean daily departure delay and the mean daily temperature. Be as economical with your code as possible.

Discuss your answer.

Reuse

Citation

BibTeX citation:
@online{smit,_a._j.,
  author = {Smit, A. J.,},
  title = {BCB744 {Intro} {R} {Example} 2},
  url = {http://tangledbank.netlify.app/assessments/examples/BCB744_Intro_R_Example_2.html},
  langid = {en}
}
For attribution, please cite this work as:
Smit, A. J. BCB744 Intro R Example 2. http://tangledbank.netlify.app/assessments/examples/BCB744_Intro_R_Example_2.html.