# Lake Ginninderra College

Maths - Collecting and Entering Data

Collecting and Entering Data:

In the following exercises you will use Excel to analyse data and generate various types of graphs.  In our discussion so far we have indicated that there are different types of data:  See below,

Once raw data is collected it needs to be organised into frequency tables.  When you analysed the data for the number of children in a household you drew up a table.  This table listed the the range in number of children that we found in the class (from 1 to 6 children per family) and then determined how many times each one occurred, that is, we determined the frequency of each number occurring.

The information that follows refers to another survey.

This information can be summarised into a table similar to the following.

This data can then be graphed in a variety of ways.  The graph below is a column graph done using Excel.

Another type of graph that could have been drawn is a pie or sector graph,

Your task now is to use Excel to produce graphs just like those show above.

Open an Excel spreadsheet.  When it is opened it should look in part like the following,

Your task now is to enter the data from the Favourite Sports data table, shown below, into the spreadsheet.

Data Table:

The following graphic will show you how the data table should look when you type it into Excel.  Don't forget to immediately save your work.  Name it DATAGRAPHING_08.

To graph this data select the the two columns that you wish to graph.  See below.

Now click on the icon from the MENU bar.

The following dialog box will appear.

Choose the options as shown above in the Chart type and the Chart sub-type dialog boxes.  Now click Next

The following dialog box should appear,

Choose Next again and type in the options as shown.

Click on the Legend tab and select the following options.

Click Next again an choose the option shown below,

Finally click the Finish button.  Resize the graph.  Save your work and print it a printer is available.

Your graph should look similar to the following.

Your next task is to produce a PIE chart using the Chart Wizard as before.  This time you choose the PIE chart option.  See below,

When you have completed this your graph should look similar to that shown below.

Don't forger to change the title to Favourite Sports of 24 Students.  Investigate some of the options that are available in Excel to display the pie chart in various ways.  Generate at least one other type of pie chart based on the favourite sports data.

Using Excel to Create Frequency Tables.

The following table shows a set of data about make of car owned by students.

Type in the names on the all the makes of cars into column C under the word Groups.  The word Holden has already been entered for you.

Work through the table of cars and every time the word Holden appears type the number 1 into the yellow row next to the word Holden.  Repeat for all the other makes.  See below,

Your task now is to graph this data as a Column Graph and a Pie Graph.  To do this we need to select the Groups column and the Total (frequency) column.  As the columns are not next to each other it is necessary to to firstly select the data from the Groups column, including the word Group, then press and hold down the Ctrl key and then select the data from the Total (frequency) column.  See below.

Create Column and Pie charts for the following data. With this data set we will tally the numbers in in ranges.

Use the Excel Tally sheet analyse this data.  The table below shows the categories to use.

Graph the data as you did in the previous example.

Using Excel to Display Categorical Coded Data:

Lake G Student Lifestyle Survey Questions:

As mentioned in class each person is expected to write three open ended questions and three closed questions for the survey.  A Lifestyle Survey will ask questions that relate to activities such as,

employment, spare time, use of technology, aspects of school, transportation, family, cost of living

and other aspects of how people interact with the world around them.

I will collate the questions and then we will complete the survey as a class group.  In the previous Shoppers Survey all of the questions were closed type questions.  Read the following material about question types before you write your questions.

Write your six questions in a word document and then email it to me.  The email address is as follows,

robertd@lakeonline.act.edu.au

Make sure that you include your name as part of the header.  It will then print on every page if there is more than one page.

Worksheet 9.1 - Collecting and Entering Data:

The worksheet will download as a Word document.  You can answer the questions on the worksheet.  Some questions ask you to complete graphs.  These are best completed using Excel.  When the particular graph is completed copy and paste it into the Word document. When you have completed the exercise email it to me for correction.

Methods of Misrepresenting Data:

Many people have reasons for misrepresenting data: politicians may wish to magnify the progress achieved during their term, or business people may wish to accentuate their reported profits. There are numerous ways of misrepresenting data. In this section, only graphical methods of misrepresentation are considered.

Vertical and Horizontal Axis:

It is a truism that the steeper the graph the better the growth appears. A rule of thumb for statisticians is that for the sake of appearances, the vertical axis should be two thirds to three-quarters the length of the horizontal axis. This rule was established in order to have some comparability between graphs.

Changing the Scale on the Vertical Axis.

Look carefully at the following data: The table gives the holdings for ROPE corporation during 2001.

This data can be represented graphically in the following ways

These graphs show what happens when you distort the vertical axis.

Omitting Certain Values:

The graph below show what happens when some data is actually left out.

Foreshortening the Vertical Axis:

Look carefully at the following graphs showing the rate of growth in the Queensland Police Force.

The graph on the left appears to show a very rapid increase in police numbers, however, the same data is shown on the right and with this representation the same data looks very different.

Visual Impression:

Look carefully at the following graph.  It is the height of the money bag that gives the true representation of the quantity being graphed, however, because the volume of the bag also increased the overall effect is to give a false impression of the increase.

A non-linear scale on one axis or both:

This is a very old trick.  The graph on the left has a linear vertical scale.  The graph on the right has a non-linear vertical scale.

Activity:

Important Note:  In the graphing exercise below you will note that the YEAR column is also numerical.  When you try to graph such data using the Column Graph option you will not be able to produce any sort of a graph at all.  The reason for this is that the Column Graph option needs a Category type data and Numerical type data.  You cannot have both vertical and horizontal axes having numerical data.  To avoid this problem proceed as follows.  Before you type any of the years into your spreadsheet you must format the cells as Text.  See below,

This tells Excel that the cells you have selected although containing numerical data will be read as though they contained text.  Your graphs should now work properly.

To create the graphs follow the procedures shown below.

When you have completed Q8 make sure you print or email your graphs.

You are expected to complete Questions 1 - 4 from Exercise 9D.  These are to be completed using Word.  Copy and paste the questions and then give an appropriate answer.  If you are asked to redraw a graph extract the data and use Excel.  Make sure any redrawn graphs are printed or emailed.

Note: For Q1 (above) make sure that the Years column in your spreadsheet MUST be formatted as text as discussed previously.

Other Forms of Graphical Display:

Displaying Data Using Frequency Histograms:

Activity:  Your task now is to use Excel to draw a combined histogram and frequency polygon.  The graph will look something like the following graphic.  Note that this graphing technique will only function correctly in Excel if all of the cells have an entry.  Therefore note the * values before A and after G.

The data table for this graph is shown below,

Click here to download the instructions and spreadsheet template for this activity.  You will note that in the spreadsheet that you download you will have to type the table shown above and then by following the instruction you will to need to produce a frequency polygon with histogram similar to the graphic above.  To draw the frequency polygon with histogram it is necessary to duplicate the frequency column.  Remember to always convert the category column to text and make sure all cells contain some form of entry - leave no cell blank.

When you have completed this activity you will need to print it or email it to me.

Complete the following exercises - 9E from your text
Use Excel to complete these activities.  Remember to ensure that all Category data is converted to Text and all cells have some form of entry - don't leave cells blank.

For question 4 you could use the COUNTIF() function to do the tallying for you.

When you have completed Exercise 9E print it or email for correction.

Using a Graphics Calculator:

Your task is to use the TI-83 Graphics Calculator to generate a histogram. This will be similar to that which you produced using Excel. Follow the instructions in Worked Example 11 - shown below

With a graphics calculator it is always best to reset it before you start any activities. To do this do the following.

Turn the calculator on by pressing the On button

Press 2nd MEM (the + key) Select 3 (Clear all entries) press Enter

Press 2nd MEM (the + key) again Select 7 (Reset) Press 2 (Defaults) press Enter

You should also check that any equations placed in the Y= register have been deleted or made inactive. To check this press the Y= key.

Now you are ready to start.

When you have completed this exercise complete Q6 from Exercise 9E.  Note you will not be able to do a polygon as suggested in the question using the graphics calculator.

How do we compare sets of data.  So far we have looked at STEM and LEAF plots and BOX and WHISKER plots.

If you were asked to compare the results that a class achieved in their first test compared with the results they achieved in the final test you would probably use the MEAN score achieved by the class in each test.

Just how good is the mean value in comparing the class' performance in each test?  Provided there are no extreme scores it is probably a very good measure.  The mean is thus said to be a good measure of central tendency.

In Chapter 9 we also looked at medians, and quartiles.  They also allow us to compare sets of data in a meaningful way.

The following table shows measurements of skulls found in Egypt from two different periods in Egyptian history.  Our task is to compare the data by calculating mean (or average) values for breadth, height and length for each set of skulls.  To do this we will use Excel

The table that follows is to be copied and pasted into Excel.  Once you have done this you will use the =AVERAGE() function to find the average values.  Remember that the =AVERAGE function adds up all of the measurements (scores) in any selected column and then divides by the total number of measurements (scores).

 4000 BC 150 AD Breadth Height Length Breadth Height Length 131 138 89 137 123 91 125 131 92 136 131 95 131 132 99 128 126 91 119 132 96 130 134 92 136 143 100 138 127 86 138 137 89 126 138 101 139 130 108 136 138 97 125 136 93 126 126 92 131 134 102 132 132 99 134 134 99 139 135 92 129 138 95 143 120 95 134 121 95 141 136 101 126 129 109 135 135 95 132 136 100 137 134 93 141 140 100 142 135 96 131 134 97 139 134 95 135 137 103 138 125 99 132 133 93 137 135 96 129 136 96 133 125 92 132 131 101 145 129 89 126 133 102 138 136 92 135 135 103 131 129 97 134 124 93 143 126 88 128 134 103 134 124 91 130 130 104 132 127 97 138 135 100 137 125 85 128 132 93 129 128 81 127 129 106 140 135 103 131 136 114 147 129 87 124 138 101 136 133 97

Here is a screen shot of how to write the formula.

You will need to calculate the average value for each column.  Copy and paste the information into a Word document. Look carefully at each mean value.  Count the number of measurements that are above the mean for each column and below the mean for each column.  If the number above are about equal to the number below then this indicates that what you calculated is a good measure of central tendency, that is the mean is close to the centre of each data set.

This document can be emailed to me when you have completed Questions 1 to 5 from Exercise 10A.

You may use Excel to complete all of these calculations.  However Exercise 10A Q2 should be done using a calculator as you would do in a test situation.

Here is an example of how to calculate the mean or. as it is frequently called. the average.

The S  symbol means sum of,  which means to add up.

For each of the following questions copy and paste the question into your Word document and then paste or write the answer.  When you have completed Exercise 10A email it to me at robertd@lakeonline.act.edu.au   Use MyClasses for this.

1) Copy and complete the following:

Another word commonly used for mean is __________. The mean is calculated by finding the __________ of the scores, then dividing by the __________ of scores. The mean is a measure of __________ tendency. Two other measures are __________ and____________

For Q2 use your calculator or the calculator on the computer.  For the other questions you can use Excel or your calculator.

For Q3 treat the percentages as normal numbers to do the calculation.

For Q4 ignore the c/L when you do the calculations.

For Q5 ignore the m when you do the calculations.

When you have large amounts of data it is sometimes easier to approach the calculation of the mean slightly differently.  We still add up all of the scores and divide by the total number of scores.

Look carefully at the following example and then complete Questions 1, 2 and 3 that follow.

Question 1:

Copy the data in the following table across to your Excel spreadsheet  There is no need to include a Tally column.  Note you will neeed to type the data for this question as the table will only copy as a picture.

You will need to complete the frequency column based on the tally column.  To calculate the fx column write a formula to multiply the cell in column C by the cell in column A.  See the example below

Copy the formula down column D.  You now need to total the values in column C and in column D. To do this write formulas as shown below. The function to use in Excel is the =SUM() function.  Note in the diagram below some of the function name is hidden.

Repeat for column D.  To calculate the mean write a formula as shown below.

Q3) Copy the following table across  to Excel and calculate the mean using the the technique shown above.

 No. of television sets sold (x) No. of weeks ( f ) fx 16 4 17 4 18 3 19 6 20 7 21 12 22 8 23 2 24 4 25 2 S f = S fx =

Sometimes we need to group data into classes.  Look carefully at the following example.  The problem with classes is, however, that you are unable to enter grouped data into calculators.  You need to select a class centre.  In the following example note the first class is from 25 - 29.  There are actually five numbers included here, 25 26 27 28 29.  The middle number, the class centre is 27.

Complete the following exercises.  Use Excel.  Paste the questions into a word document, do the necessary calculations and then paste these answers back into Word.  Email the sheets when you have completed the two questions.

Copy the table across to Excel and complete the calculations.  Note here that there are 10 scores in each class.  The middle score will be the five and one half score. That is for the class 31 - 40 the middle score is 35.5

 Class Class centre (x) Frequency  (f) fx 3140 1 4150 3 5160 4 6170 7 7180 11 8190 2 91100 2 S f = S fx =

Copy the table across to Excel and determine the mean.  Note the class centre.  There are 100 numbers between 50.01 and 51.00, in increments of 0.01  The centre number is (100 + 1) divided by 2. i.e., 50.5.  The class centre is therefore 50.50

 Time Class centre (x) No. of swimmers  ( f ) fx 50.0151.00 4 51.0152.00 12 52.0153.00 23 53.0154.00 38 54.0155.00 15 55.0156.00 3 S f = S fx =

We have already done a considerable amount of work on calculating the median value for a set of scores.  Here is a brief review along with a definition of another term - MODE - the most frequently occurring score.

You will need to be able to determine the median from a cumulative frequency column.

Complete the following exercises.  Include questions and answers in a word document and then email them.

To do the following question type the numbers in numerical order into an Excel spreadsheet.  Determine the median manually and then calculate the mean using the =average function in Excel.

For questions 6 and 7 use Excel.  You will need to add a cumulative frequency column in Q7.  Use the following example to determine how to calculate a cumulative frequency data using Excel formulas.

To generate data in the Cumulative Frequency column you will add the cells in the frequency column together.  This is done by writing an appropriate formula.  That is the value in the top cell of the frequency column is added to the cell immediately below it.  This value is then placed in the cumulative frequency column.

Look carefully at the following diagram.  You must write a formula in the cell with the circle that will be able to be copied down the column.  The formula to create the value "1" in the cumulative frequency column just equals the cell containing the "1" in the frequency column.  However the "1" in the cumulative frequency column must be added to the "3" in the frequency column to give you the value "4".  This pattern must be able to be repeated down the column.

When to use the Mean, Median or Mode - read the following.  Read the following very carefully.  Although we very frequently decide that the calculating the mean or average is often the best way to measure this "central tendency" frequently we are better to calculate the median.

When you have completed the following exercises email them to me for correction.

For Q5 there are two methods of calculating the mean value.  You can just add the scores as written and divide by the number of scores.  However, what you need to do with this set of data is to create a frequency distribution table with class values as indicated.  The mean is then calculated from this table rather than directly from the data.  The mean calculated in this way is slightly different.  Use Excel to create your table

Much of the work in this section is just a review and application of material that we have looked at previously.  In particular QUARTILES are discussed along with INTERQUARTILE RANGE.  RANGE - that is the difference between the lowest and highest score is also discussed.  Note the statement in the following box concerning the use of range and interquartile range.  Interquartile range is often a better measure of dispersion than the range.

When you are calculating the upper and lower quartiles the median value is NOT included in the calculation.

When you have completed this section of work there are sample tests on MyClasses  for Chapter 9 and Chapter 10..  Download these and complete them using Excel.

Graphics calculators as well as your notes can be used in the final test.  If you need to practice using the calculator to determine 5 number summary data please ask to borrow one during class time.