Project - Hazardous Waste and Statistical AnalysisMaterials copyrighted, March 2002 by Greg Langkamp and Joe Hull
Notes to the Instructor
Exercise # 6 links the topics of normal distributions with hazardous waste. In this exercise, the students were presented with hazardous waste production data from the Federal EPA's Biennial Reporting System, the only nationwide, uniform database on hazardous waste generation and management (see student background information). The exercise focused on just the hazardous waste categorized and monitored under the Resource Conservation and Recovery Act (RCRA). RCRA waste, such as lead and toluene, is assigned to one of 500 federal hazardous waste codes. Statewide data can be found in the Biennial Reporting System (BSR) database at the Right-to-Know website:http://www.rtk.net/rtkdata.html.
We pre-selected for the students 12 populous states, and identified the most populous counties (about 30 per state), to ensure a reasonably-sized sample and ensure that counties reported data for most years. We provided 1990 population data for every county in the 12 states, taken from a US census web site (http://www.census.gov/population/censusdata/90den_stco.txt). Therefore, much of the "grunt work" was done in advance for the students, however the alternative would have been extremely time consuming and would have diverted the focus away from the mathematical and environmental analysis.
Each group of students is presented with a data sheet (see data sheet for Illinois given at the bottom of this page). We have students inspect the data and make observations. Because the RCRA waste production sometimes fluctuate wildly (up to two orders of magnitude) from biennium to biennium the students calculated the mean RCRA waste production over the 8 year period for each of the 30 counties in their state. The county-by-county mean data is fairly skewed for each state. Our goal was to obtain hazardous waste data that showed an approximately normal distribution, but with outliers. Students progressively work towards obtaining a normal distribution by computing per capita waste production, and then afterwards, transforming the data using logarithms. Finally, students investigate a regulatory scheme based upon z-scores.
A few additional notes for the instructor:1. As with all data, the reliability and meaning of the RCRA waste production values should be questioned immediately. Note Madison County, where waste generated increased from 48 kilotons in 1989 to 9.5 Megatons in 1991, a factor of 200. The students should be asked to identify such variability among the data, comment on or explain its possible origins, and draw a tentative conclusion about reliability.(See question #2) The EPA web site also discusses data reliability.
2. The students normalize the RCRA waste generation data by population, and then by also logging the data. We introduce normalization specifically in lecture; most students grasp the concept intuitively when it involves population density. However, other examples of normalizations that don't involve people need to be introduced, as many students do not readily grasp the very general applicability of normalization.3. We came up with an enforcement/regulatory theme ("sticks and carrots") for this waste exercise, which is carried throughout. The enforcement rubric allowed us to introduce a utilitarian aspect to the mathematical analysis at all major steps in this exercise; how would you punish and reward polluters? The students create increasingly more sophisticated punishment/reward schemes, including one that uses z-scores. Are these schemes mathematically sound?
4. How does the federal, state, county or local government punish and/or reward polluters? Are their enforcement schemes mathematically sound? These are tie-in questions that could be addressed in lecture.
See Data Set #003 for individual state data and more instructor suggestions.
Student Background Information top of page
Hazardous waste is any waste that may be considered toxic, flammable (i.e. burns readily), corrosive, reactive or explosive. Many types of businesses produce hazardous waste. Some are small businesses such as dry cleaners, auto repair shops, hospitals, and photo processing centers. Others are larger firms which may generate large quantities of hazardous waste, such as chemical manufacturers, electroplating companies, and petroleum refineries.The Resource Conservation and Recovery Act (RCRA) is the main Federal law that regulates hazardous and other wastes to ensure that they are managed properly. RCRA waste is solid waste assigned a federal hazardous waste code and regulated by RCRA either because it was managed subject to RCRA permitting standards or because it was shipped subject to RCRA hazardous transportation requirements. EPA has a list of specific hazardous wastes, defined by 504 different waste codes. Not all hazardous waste is RCRA waste.
Information on hazardous wastes comes from the Federal EPA's Biennial Reporting System (BRS). BRS contains data from Hazardous Waste Report Forms submitted by regulated hazardous waste generators and handlers. BRS represents the only nationally consistent reporting of information on hazardous waste generation and management activities in the United States. Although the information collected is not designed to measure environmental impact, it is the most comprehensive source available for information on the management and generation of hazardous wastes. The data are collected every other year.
Some hazardous wastes are not picked up in the BRS database. Hazardous wastes that are generated in the home, like mineral spirits and old paint, are not regulated by the federal RCRA program. In addition, not all hazardous waste generators are required to report, some waste is exempted from regulation, and some waste is regulated under other environmental statutes (particularly at the state level). Some facilities may fail to report.
RCRA data for this project were taken from the original BRS database. This database is no longer posted on the EPA’s website, but can be accessed through the Right To Know Network website: http://www.rtk.net/ The 1990 county population data is taken from a US census web site http://www.census.gov/population/censusdata/90den_stco.txt.
Project # 6 top of page
Name _______________________ Name____________________________RCRA data: 1991=L1, 1993=L2, 1995=L3, 1997=L4
Population data = L5Compute "mean RCRA waste" for each county, and store the results in list L6. Transfer the mean values to the data sheet. What are the units of measure for the mean?
2. Analyze the data
Find the county with the biggest change in RCRA waste generation from one biennium to the next: which county, how much waste one year, how much in the next report? What is the percent change from one biennium to the next?
Such extreme changes in hazard waste production do not seem reasonable, maybe the numbers are in error. But maybe not! Give one reasonable explanation why RCRA waste generation might change so much in one biennium.
Use your TI-83+ to make a frequency histogram of the mean RCRA waste values. (For review information, consult Chapter 3 in your text.) Sketch the histogram on graph paper. Label axes appropriately.
Are the mean RCRA waste values normally distributed? How can you easily tell without doing any computations?
Compute the mean and standard deviation of the mean RCRA waste values. Use the TI-83+ for assistance.
Is the standard deviation less than, equal to, or greater than the mean?
In your opinion, is the standard deviation "small", "medium" or "large"? Explain briefly.
Compute the following 7 numbers:
Do any of the 7 numbers come out negative? ___________ If so, do these numbers have any physical meaning, can you have negative mean RCRA waste in reality? What do the negative numbers tell you?
Sometimes there are data that seem to be "way out of bounds." These numbers can be accurate or they can be caused by error. In either case they tend to dominate the calculations. Statisticians call these numbers outliers; outliers are numbers that lie more than 3 standard deviations away from the mean. Are there any outliers in your mean RCRA waste values? If so, what are the names of the counties?
3. Per Cap Waste
The EPA hires you as a consultant, to impose fines on counties that are "environmentally bad." Your supervisor suggests that counties that generate the most RCRA waste should be fined the most. Discuss why this system might not be fair.
Another method of fines is to punish the people, not the counties. In other words, fine the counties that have the highest mean RCRA waste per capita (per person). Compute the mean RCRA waste per capita for each county. Convert the result so that the units are in pounds per person. (Note: 1 ton = 2000 pounds) Store the final result in L7 and record on the data sheet.
Use your TI-83+ to make a frequency histogram of the per capita mean RCRA waste values. Sketch the histogram on separate graph paper. Label axes appropriately.
What is the mean of the mean per capita RCRA waste? What is the standard deviation? (Use correct symbols when writing values.)
Is the standard deviation large, medium or small compared to the mean?
Measuring spread in skewed data using standard deviation is problematic because standard deviation is often many times bigger than the mean. Has normalization by population "improved" the standard deviation of the data? In other words, is the per capita waste data less skewed than the unnormalized waste values?
4. Transform the data
When data are skewed to the right, we can often make the distribution more symmetrical by logging the data. Do this now: log the mean per capita RCRA values for each county, and store the results in list L8. Record the logged values on your data sheet. Then sketch a frequency histogram of the logged values. Include units and labels.
How does the histogram of the transformed data (log of the per capita mean RCRA values) compare to the two histograms that you sketched previously?
Compute the mean and standard deviation for the transformed data. Include units of measure.
Is the standard deviation less than, equal to, or greater than the mean?
Is the standard deviation "small", "medium" or "large", as compared to the mean? Explain briefly.
For the transformed data, calculate the 7 numbers:
Use these 7 numbers to determine if the transformed data are normally distributed. Show work.
5. Carrots and Sticks
You have transformed the county data into a distribution that is closer to normal. Now you come up with the following idea to impose waste fines. Based on the transformed data, impose the highest fines on counties that lie more than 3 standard deviations above the mean, impose moderate fines on counties that lie between 2 and 3 standard deviations above the mean, impose small fines on counties that lie between 1 and 2 standard deviations above the mean, and very small fines for those counties between the mean and 1 standard deviation above the mean. On your data sheet, under the column "st. dev. category", indicate which counties are in the categories: ">3", "2 to 3", "1 to 2", or "0 to 1".To reward counties that produce the least amount of RCRA waste per person, you will give waste credits that can be sold in the market. On your data sheet, for those counties whose RCRA wastes are below the mean, mark categories "<-3", "-3 to -2", "-2 to -1", and "-1 to 0".
Now you get good results with this penalty and reward system. Overall, polluters are given monetary incentives to improve their standard deviation score. In fact, you suggest that all states take up your system. Your boss likes the idea, but she has some questions:
Is it possible that in some state most of the counties would be in the "above 3" or "below -3" categories? This could be seen as politically "heavy handed", with lots of money flowing back and forth in fines and credits. What is your answer?
How would this system work with a state like South Dakota , whose mean per capita RCRA waste is very low? Won't most of the counties in South Dakota be getting pollution credits?
You've convinced your boss that this system will work, but now she has a third question. When two counties lie in the same standard deviation category they are penalized or rewarded the same, even if their mean RCRA waste per capita numbers are different. Is there some way to refine the rewards and incentives so that there is a continuous scale?
A continuous scale can be based on "z-scores" for each county. A z-score is a number that indicates how many standard deviations each county lies above or below the mean. Z-scores are computed with the simple formula:
Here x is each county's logged per capita mean RCRA waste, xbar is the mean of logged per capita wastes, and s is the standard deviation. The z-scores are positive if the county lies above the mean, and negative if they lie below. Fill out the last column on the data sheet with the z-score for each county; round to 2 decimal places of accuracy.
Your boss thinks your z-score idea is great. She now gives you enough money to impose fines and give credits. She suggests a $100,000 fine or credit per z-score (fines for positive z-scores, credits for negative z-scores). Will your agency lose money, earn money, or break even? Explain in detail.
Sample Student Data Sheet (for Illinois)
Illinois county
RCRA waste produced (tons/yr)
Pop.
Mean RCRA waste (tons/year)
Per Capita Mean RCRA Waste (lbs/person)
Log(Per Capita Mean RCRA Waste)
St. Dev. category
z score
1991
1993
1995
1997
Adams
19,142
267
328
19,515
66,090
Champaign
1,751
804
2,158
996
173,025
Coles
241
705
137
198
51,644
Cook
1,962,005
323,486
1,367,858
1,773,073
5,105,067
DeKalb
1,528
1,591
3,690
2,263
77,932
Du Page
8,259
8,206
23,319
57,110
781,666
Henry
29
34
20
6,312
51,159
Jackson
4,151
374
362
584
61,067
Kane
6,962
12,931
26,794
6,751
317,471
Kankakee
2,948
25,320
14,456
64,693
96,255
Knox
2,912
5,404
426
6,197
56,393
Lake
18,922
83,161
25,291
19,926
516,418
La Salle
11,010
1,418
4,697
7,556
106,913
McHenry
7,345
8,503
5,543
0
183,241
McLean
62,737
2,390
25,496
1,045
129,180
Macon
1,072
2,755
4,776
2,131
117,206
Madison
9,475,186
10,284,282
8,719,444
8,154,422
249,238
Marion
446
562
2,519
2,506
41,561
Ogle
1,406
2,290
24,329
19,646
45,957
Peoria
116,995
171,412
137,044
112,615
182,827
Rock Island
5,896
31,732
11,388
11,147
148,723
St. Clair
65,457
39,461
27,951
24,336
262,852
Sangamon
4,384
1,692
1,023
1,677
178,386
Stephenson
172,645
197,314
204,975
261,332
48,052
Tazewell
514
369
401
1,242
123,692
Vermilion
42,698
1,321
713
483
88,257
Whiteside
38,224
57,309
76,979
82,468
60,186
Will
34,852
68,873
1,194,153
1,096,687
357,313
Williamson
330
122
3,272
5,400
57,733
Winnebago
73,630
89,115
188,037
381,133
252,913
Student Data Sheet -- With Solutions
Illinois county
RCRA waste produced (tons/yr)
Pop.
Mean RCRA waste (tons/year)
Per Capita Mean RCRA Waste (lbs/person)
Log(Per Capita Mean RCRA Waste)
St. Dev. category
z score
1991
1993
1995
1997
Adams
19,142
267
328
19,515
66,090
9,813
297.0
2.47
0 to 1
0.18
Champaign
1,751
804
2,158
996
173,025
1,427
16.5
1.22
-2 to -1
-1.27
Coles
241
705
137
198
51,644
320
12.4
1.09
-2 to -1
-1.42
Cook
1,962,005
323,486
1,367,858
1,773,073
5,105,067
1,356,606
531.5
2.73
0 to 1
0.47
DeKalb
1,528
1,591
3,690
2,263
77,932
2,268
58.2
1.76
-1 to 0
-0.64
Du Page
8,259
8,206
23,319
57,110
781,666
24,224
62.0
1.79
-1 to 0
-0.61
Henry
29
34
20
6,312
51,159
1,599
62.5
1.80
-1 to 0
-0.60
Jackson
4,151
374
362
584
61,067
1,368
44.8
1.65
-1 to 0
-0.77
Kane
6,962
12,931
26,794
6,751
317,471
13,360
84.2
1.93
-1 to 0
-0.46
Kankakee
2,948
25,320
14,456
64,693
96,255
26,854
558.0
2.75
0 to 1
0.50
Knox
2,912
5,404
426
6,197
56,393
3,735
132.5
2.12
-1 to 0
-0.23
Lake
18,922
83,161
25,291
19,926
516,418
36,825
142.6
2.15
-1 to 0
-0.19
La Salle
11,010
1,418
4,697
7,556
106,913
6,170
115.4
2.06
-1 to 0
-0.30
McHenry
7,345
8,503
5,543
0
183,241
5,348
58.4
1.77
-1 to 0
-0.64
McLean
62,737
2,390
25,496
1,045
129,180
22,917
354.8
2.55
0 to 1
0.27
Macon
1,072
2,755
4,776
2,131
117,206
2,684
45.8
1.66
-1 to 0
-0.76
Madison
9,475,186
10,284,282
8,719,444
8,154,422
249,238
9,158,334
73490.7
4.87
2 to 3
2.95
Marion
446
562
2,519
2,506