Lake Ginninderra College

Maths - Collecting and Entering Data


Collecting and Entering Data:

In the following exercises you will use Excel to analyse data and generate various types of graphs.  In our discussion so far we have indicated that there are different types of data:  See below,

Once raw data is collected it needs to be organised into frequency tables.  When you analysed the data for the number of children in a household you drew up a table.  This table listed the the range in number of children that we found in the class (from 1 to 6 children per family) and then determined how many times each one occurred, that is, we determined the frequency of each number occurring.  

The information that follows refers to another survey.

This information can be summarised into a table similar to the following.

This data can then be graphed in a variety of ways.  The graph below is a column graph done using Excel.

Another type of graph that could have been drawn is a pie or sector graph,

Your task now is to use Excel to produce graphs just like those show above.

Open an Excel spreadsheet.  When it is opened it should look in part like the following,

Your task now is to enter the data from the Favourite Sports data table, shown below, into the spreadsheet.  

Data Table:

The following graphic will show you how the data table should look when you type it into Excel.  Don't forget to immediately save your work.  Name it DATAGRAPHING_08.

To graph this data select the the two columns that you wish to graph.  See below.

 Now click on the icon from the MENU bar.

The following dialog box will appear.

Choose the options as shown above in the Chart type and the Chart sub-type dialog boxes.  Now click Next

The following dialog box should appear,

Choose Next again and type in the options as shown.


Click on the Legend tab and select the following options.

Click Next again an choose the option shown below,

Finally click the Finish button.  Resize the graph.  Save your work and print it a printer is available.  

Your graph should look similar to the following.

Your next task is to produce a PIE chart using the Chart Wizard as before.  This time you choose the PIE chart option.  See below,

When you have completed this your graph should look similar to that shown below.

Don't forger to change the title to Favourite Sports of 24 Students.  Investigate some of the options that are available in Excel to display the pie chart in various ways.  Generate at least one other type of pie chart based on the favourite sports data. 

Using Excel to Create Frequency Tables.

The following table shows a set of data about make of car owned by students.  

Click here to download a frequency spreadsheet.  Your spreadsheet should look like the following.  Do a Save As immediately and place the file in your own folder as you cannot save to the web site.

Type in the names on the all the makes of cars into column C under the word Groups.  The word Holden has already been entered for you.

Work through the table of cars and every time the word Holden appears type the number 1 into the yellow row next to the word Holden.  Repeat for all the other makes.  See below,

Your task now is to graph this data as a Column Graph and a Pie Graph.  To do this we need to select the Groups column and the Total (frequency) column.  As the columns are not next to each other it is necessary to to firstly select the data from the Groups column, including the word Group, then press and hold down the Ctrl key and then select the data from the Total (frequency) column.  See below.

Create Column and Pie charts for the following data. With this data set we will tally the numbers in in ranges.  

Use the Excel Tally sheet analyse this data.  The table below shows the categories to use.

Graph the data as you did in the previous example.

Using Excel to Display Categorical Coded Data:



Lake G Student Lifestyle Survey Questions:

As mentioned in class each person is expected to write three open ended questions and three closed questions for the survey.  A Lifestyle Survey will ask questions that relate to activities such as,

employment, spare time, use of technology, aspects of school, transportation, family, cost of living

and other aspects of how people interact with the world around them.

I will collate the questions and then we will complete the survey as a class group.  In the previous Shoppers Survey all of the questions were closed type questions.  Read the following material about question types before you write your questions.

Write your six questions in a word document and then email it to me.  The email address is as follows,

robertd@lakeonline.act.edu.au

Make sure that you include your name as part of the header.  It will then print on every page if there is more than one page.

Worksheet 9.1 - Collecting and Entering Data:

Once you have completed the survey questionnaire and emailed the document to the email address given above then you can work on Worksheet 9.1.  Click here to download the worksheet.  If you need access to the textbook then it can be downloaded by clicking here.

The worksheet will download as a Word document.  You can answer the questions on the worksheet.  Some questions ask you to complete graphs.  These are best completed using Excel.  When the particular graph is completed copy and paste it into the Word document. When you have completed the exercise email it to me for correction. 

Methods of Misrepresenting Data:

Many people have reasons for misrepresenting data: politicians may wish to magnify the progress achieved during their term, or business people may wish to accentuate their reported profits. There are numerous ways of misrepresenting data. In this section, only graphical methods of misrepresentation are considered.

Vertical and Horizontal Axis:

It is a truism that the steeper the graph the better the growth appears. A ‘rule of thumb’ for statisticians is that for the sake of appearances, the vertical axis should be two thirds to three-quarters the length of the horizontal axis. This rule was established in order to have some comparability between graphs. 

Changing the Scale on the Vertical Axis.

Look carefully at the following data: The table gives the holdings for ROPE corporation during 2001.

This data can be represented graphically in the following ways

These graphs show what happens when you distort the vertical axis.  

Omitting Certain Values:

The graph below show what happens when some data is actually left out.

Foreshortening the Vertical Axis:

Look carefully at the following graphs showing the rate of growth in the Queensland Police Force.

The graph on the left appears to show a very rapid increase in police numbers, however, the same data is shown on the right and with this representation the same data looks very different.

Visual Impression:

Look carefully at the following graph.  It is the height of the money bag that gives the true representation of the quantity being graphed, however, because the volume of the bag also increased the overall effect is to give a false impression of the increase.

A non-linear scale on one axis or both:

This is a very old trick.  The graph on the left has a linear vertical scale.  The graph on the right has a non-linear vertical scale.

Activity:

Your task is to create an Excel spreadsheet that look similar to the following.

Important Note:  In the graphing exercise below you will note that the YEAR column is also numerical.  When you try to graph such data using the Column Graph option you will not be able to produce any sort of a graph at all.  The reason for this is that the Column Graph option needs a Category type data and Numerical type data.  You cannot have both vertical and horizontal axes having numerical data.  To avoid this problem proceed as follows.  Before you type any of the years into your spreadsheet you must format the cells as Text.  See below,

This tells Excel that the cells you have selected although containing numerical data will be read as though they contained text.  Your graphs should now work properly.

To create the graphs follow the procedures shown below.  

When you have completed Q8 make sure you print or email your graphs.

You are expected to complete Questions 1 - 4 from Exercise 9D.  These are to be completed using Word.  Copy and paste the questions and then give an appropriate answer.  If you are asked to redraw a graph extract the data and use Excel.  Make sure any redrawn graphs are printed or emailed.

Note: For Q1 (above) make sure that the Years column in your spreadsheet MUST be formatted as text as discussed previously.

Other Forms of Graphical Display:

Displaying Data Using Frequency Histograms:

Activity:  Your task now is to use Excel to draw a combined histogram and frequency polygon.  The graph will look something like the following graphic.  Note that this graphing technique will only function correctly in Excel if all of the cells have an entry.  Therefore note the * values before A and after G.

The data table for this graph is shown below,

Click here to download the instructions and spreadsheet template for this activity.  You will note that in the spreadsheet that you download you will have to type the table shown above and then by following the instruction you will to need to produce a frequency polygon with histogram similar to the graphic above.  To draw the frequency polygon with histogram it is necessary to duplicate the frequency column.  Remember to always convert the category column to text and make sure all cells contain some form of entry - leave no cell blank.

When you have completed this activity you will need to print it or email it to me.

Complete the following exercises - 9E from your text
Use Excel to complete these activities.  Remember to ensure that all Category data is converted to Text and all cells have some form of entry - don't leave cells blank.

For question 4 you could use the COUNTIF() function to do the tallying for you.

When you have completed Exercise 9E print it or email for correction.

Using a Graphics Calculator:

Your task is to use the TI-83 Graphics Calculator to generate a histogram. This will be similar to that which you produced using Excel. Follow the instructions in Worked Example 11 - shown below

With a graphics calculator it is always best to reset it before you start any activities. To do this do the following.

Turn the calculator on by pressing the On button

Press 2nd MEM (the + key) Select 3 (Clear all entries) press Enter

Press 2nd MEM (the + key) again Select 7 (Reset) Press 2 (Defaults) press Enter

You should also check that any equations placed in the Y= register have been deleted or made inactive. To check this press the Y= key.

Now you are ready to start.

When you have completed this exercise complete Q6 from Exercise 9E.  Note you will not be able to do a polygon as suggested in the question using the graphics calculator.




 

How do we compare sets of data.  So far we have looked at STEM and LEAF plots and BOX and WHISKER plots.

If you were asked to compare the results that a class achieved in their first test compared with the results they achieved in the final test you would probably use the MEAN score achieved by the class in each test.

Just how good is the mean value in comparing the class' performance in each test?  Provided there are no extreme scores it is probably a very good measure.  The mean is thus said to be a good measure of central tendency.

In Chapter 9 we also looked at medians, and quartiles.  They also allow us to compare sets of data in a meaningful way.

 

The following table shows measurements of skulls found in Egypt from two different periods in Egyptian history.  Our task is to compare the data by calculating mean (or average) values for breadth, height and length for each set of skulls.  To do this we will use Excel 

The table that follows is to be copied and pasted into Excel.  Once you have done this you will use the =AVERAGE() function to find the average values.  Remember that the =AVERAGE function adds up all of the measurements (scores) in any selected column and then divides by the total number of measurements (scores).

4000 BC 150 AD

Breadth

Height

Length

Breadth

Height

Length

131

138

89

137

123

91

125

131

92

136

131

95

131

132

99

128

126

91

119

132

96

130

134

92

136

143

100

138

127

86

138

137

89

126

138

101

139

130

108

136

138

97

125

136

93

126

126

92

131

134

102

132

132

99

134

134

99

139

135

92

129

138

95

143

120

95

134

121

95

141

136

101

126

129

109

135

135

95

132

136

100

137

134

93

141

140

100

142

135

96

131

134

97

139

134

95

135

137

103

138

125

99

132

133

93

137

135

96

129

136

96

133

125

92

132

131

101

145

129

89

126

133

102

138

136

92

135

135

103

131

129

97

134

124

93

143

126

88

128

134

103

134

124

91

130

130

104

132

127

97

138

135

100

137

125

85

128

132

93

129

128

81

127

129

106

140

135

103

131

136

114

147

129

87

124

138

101

136

133

97

Here is a screen shot of how to write the formula.

You will need to calculate the average value for each column.  Copy and paste the information into a Word document. Look carefully at each mean value.  Count the number of measurements that are above the mean for each column and below the mean for each column.  If the number above are about equal to the number below then this indicates that what you calculated is a good measure of central tendency, that is the mean is close to the centre of each data set.

This document can be emailed to me when you have completed Questions 1 to 5 from Exercise 10A.  

You may use Excel to complete all of these calculations.  However Exercise 10A Q2 should be done using a calculator as you would do in a test situation.

Here is an example of how to calculate the mean or. as it is frequently called. the average.

The S  symbol means sum of,  which means to add up.  

For each of the following questions copy and paste the question into your Word document and then paste or write the answer.  When you have completed Exercise 10A email it to me at robertd@lakeonline.act.edu.au   Use MyClasses for this.

1) Copy and complete the following:

Another word commonly used for ‘mean’ is __________. The mean is calculated by finding the __________ of the scores, then dividing by the __________ of scores. The mean is a measure of __________ tendency. Two other measures are __________ and____________

For Q2 use your calculator or the calculator on the computer.  For the other questions you can use Excel or your calculator.

For Q3 treat the percentages as normal numbers to do the calculation.

For Q4 ignore the c/L when you do the calculations.

For Q5 ignore the m when you do the calculations.

 

When you have large amounts of data it is sometimes easier to approach the calculation of the mean slightly differently.  We still add up all of the scores and divide by the total number of scores.  

Look carefully at the following example and then complete Questions 1, 2 and 3 that follow.

Question 1:

Copy the data in the following table across to your Excel spreadsheet  There is no need to include a Tally column.  Note you will neeed to type the data for this question as the table will only copy as a picture.

Your data should look similar to the following spreadsheet

You will need to complete the frequency column based on the tally column.  To calculate the fx column write a formula to multiply the cell in column C by the cell in column A.  See the example below

Copy the formula down column D.  You now need to total the values in column C and in column D. To do this write formulas as shown below. The function to use in Excel is the =SUM() function.  Note in the diagram below some of the function name is hidden.

Repeat for column D.  To calculate the mean write a formula as shown below.

Q3) Copy the following table across  to Excel and calculate the mean using the the technique shown above.

No. of television sets sold (x)

No. of weeks ( f )

 fx

16

4

 

17

4

 

18

3

 

19

6

 

20

7

 

21

12

 

22

8

 

23

2

 

24

4

 

25

2

 
 

S f =

S fx =

Sometimes we need to group data into classes.  Look carefully at the following example.  The problem with classes is, however, that you are unable to enter grouped data into calculators.  You need to select a class centre.  In the following example note the first class is from 25 - 29.  There are actually five numbers included here, 25 26 27 28 29.  The middle number, the class centre is 27.

Complete the following exercises.  Use Excel.  Paste the questions into a word document, do the necessary calculations and then paste these answers back into Word.  Email the sheets when you have completed the two questions.




Copy the table across to Excel and complete the calculations.  Note here that there are 10 scores in each class.  The middle score will be the five and one half score. That is for the class 31 - 40 the middle score is 35.5

Class

Class centre (x)

Frequency  (f)

 fx

31–40

   1  

41–50

   3  

51–60

   4  

61–70

   7  

71–80

   11  

81–90

   2  

91–100

   2  
   

S f =

S fx =



Copy the table across to Excel and determine the mean.  Note the class centre.  There are 100 numbers between 50.01 and 51.00, in increments of 0.01  The centre number is (100 + 1) divided by 2. i.e., 50.5.  The class centre is therefore 50.50

Time

Class centre (x)

No. of swimmers 
( f )

fx

50.01–51.00

 

4

 

51.01–52.00

 

12

 

52.01–53.00

 

23

 

53.01–54.00

 

38

 

54.01–55.00

 

15

 

55.01–56.00

 

3

 
   

S f =

S fx =

We have already done a considerable amount of work on calculating the median value for a set of scores.  Here is a brief review along with a definition of another term - MODE - the most frequently occurring score.




You will need to be able to determine the median from a cumulative frequency column.

Complete the following exercises.  Include questions and answers in a word document and then email them.

To do the following question type the numbers in numerical order into an Excel spreadsheet.  Determine the median manually and then calculate the mean using the =average function in Excel.

For questions 6 and 7 use Excel.  You will need to add a cumulative frequency column in Q7.  Use the following example to determine how to calculate a cumulative frequency data using Excel formulas.

To generate data in the Cumulative Frequency column you will add the cells in the frequency column together.  This is done by writing an appropriate formula.  That is the value in the top cell of the frequency column is added to the cell immediately below it.  This value is then placed in the cumulative frequency column.

Look carefully at the following diagram.  You must write a formula in the cell with the circle that will be able to be copied down the column.  The formula to create the value "1" in the cumulative frequency column just equals the cell containing the "1" in the frequency column.  However the "1" in the cumulative frequency column must be added to the "3" in the frequency column to give you the value "4".  This pattern must be able to be repeated down the column.



When to use the Mean, Median or Mode - read the following.  Read the following very carefully.  Although we very frequently decide that the calculating the mean or average is often the best way to measure this "central tendency" frequently we are better to calculate the median.

When you have completed the following exercises email them to me for correction.  

 

 

For Q5 there are two methods of calculating the mean value.  You can just add the scores as written and divide by the number of scores.  However, what you need to do with this set of data is to create a frequency distribution table with class values as indicated.  The mean is then calculated from this table rather than directly from the data.  The mean calculated in this way is slightly different.  Use Excel to create your table

Much of the work in this section is just a review and application of material that we have looked at previously.  In particular QUARTILES are discussed along with INTERQUARTILE RANGE.  RANGE - that is the difference between the lowest and highest score is also discussed.  Note the statement in the following box concerning the use of range and interquartile range.  Interquartile range is often a better measure of dispersion than the range.

When you are calculating the upper and lower quartiles the median value is NOT included in the calculation.

When you have completed this section of work there are sample tests on MyClasses  for Chapter 9 and Chapter 10..  Download these and complete them using Excel.

Graphics calculators as well as your notes can be used in the final test.  If you need to practice using the calculator to determine 5 number summary data please ask to borrow one during class time.