Project -
Hazardous Waste and Statistical Analysis

Materials copyrighted, March 2002 by Greg Langkamp and Joe Hull

 

Notes to the Instructor

Student Background Information

Project #6

 

Notes to the Instructor    

      

     Exercise # 6 links the topics of normal distributions with hazardous waste.  In this exercise, the students were presented with hazardous waste production data from the Federal EPA's Biennial Reporting System, the only nationwide, uniform database on hazardous waste generation and management (see student background information).  The exercise focused on just the hazardous waste categorized and monitored under the Resource Conservation and Recovery Act (RCRA).  RCRA waste, such as lead and toluene, is assigned to one of 500 federal hazardous waste codes.  Statewide data can be found in the Biennial Reporting System (BSR) database at the Right-to-Know website:http://www.rtk.net/rtkdata.html.

    We pre-selected for the students 12 populous states, and identified the most populous counties (about 30 per state), to ensure a reasonably-sized sample and ensure that counties reported data for most years. We provided 1990 population data for every county in the 12 states, taken from a US census web site (http://www.census.gov/population/censusdata/90den_stco.txt).  Therefore, much of the "grunt work" was done in advance for the students, however the alternative would have been extremely time consuming and would have diverted the focus away from the mathematical and environmental analysis.

     Each group of students is presented with a data sheet (see data sheet for Illinois given at the bottom of this page). We have students inspect the data and make observations. Because the RCRA waste production sometimes fluctuate wildly (up to two orders of magnitude) from biennium to biennium the students calculated the mean RCRA waste production over the 8 year period for each of the 30 counties in their state. The county-by-county mean data is fairly skewed for each state. Our goal was to obtain hazardous waste data that showed an approximately normal distribution, but with outliers. Students progressively work towards obtaining a normal distribution by computing per capita waste production, and then afterwards, transforming the data using logarithms. Finally, students investigate a regulatory scheme based upon z-scores.

A few additional notes for the instructor:

1.  As with all data, the reliability and meaning of the RCRA waste production values should be questioned immediately.  Note Madison County, where waste generated increased from 48 kilotons in 1989 to 9.5 Megatons in 1991, a factor of 200.  The students should be asked to identify such variability among the data, comment on or explain its possible origins, and draw a tentative conclusion about reliability.(See question #2) The EPA web site also discusses data reliability.

 2.  The students normalize the RCRA waste generation data by population, and then by also logging the data. We introduce normalization specifically in lecture; most students grasp the concept intuitively when it involves population density.  However, other examples of normalizations that don't involve people need to be introduced, as many students do not readily grasp the very general applicability of normalization.

3.  We came up with an enforcement/regulatory theme ("sticks and carrots") for this waste exercise, which is carried throughout.  The enforcement rubric allowed us to introduce a utilitarian aspect to the mathematical analysis at all major steps in this exercise; how would you punish and reward polluters?  The students create increasingly more sophisticated punishment/reward schemes, including one that uses z-scores.  Are these schemes mathematically sound? 

4.  How does the federal, state, county or local government punish and/or reward polluters?  Are their enforcement schemes mathematically sound?  These are tie-in questions that could be addressed in lecture.
 

See Data Set #003 for individual state data and more instructor suggestions.

 

Student Background Information     top of page


Hazardous waste is any waste that may be considered toxic, flammable (i.e. burns readily), corrosive, reactive or explosive.  Many types of businesses produce hazardous waste.  Some are small businesses such as dry cleaners, auto repair shops, hospitals, and photo processing centers.  Others are larger firms which may generate large quantities of hazardous waste, such as chemical manufacturers, electroplating companies, and petroleum refineries.

The Resource Conservation and Recovery Act (RCRA) is the main Federal law that regulates hazardous and other wastes to ensure that they are managed properly.  RCRA waste is solid waste assigned a federal hazardous waste code and regulated by RCRA either because it was managed subject to RCRA permitting standards or because it was shipped subject to RCRA hazardous transportation requirements.  EPA has a list of specific hazardous wastes, defined by 504 different waste codes.  Not all hazardous waste is RCRA waste.

Information on hazardous wastes comes from the Federal EPA's Biennial Reporting System (BRS).  BRS contains data from Hazardous Waste Report Forms submitted by regulated hazardous waste generators and handlers.   BRS represents the only nationally consistent reporting of information on hazardous waste generation and management activities in the United States. Although the information collected is not designed to measure environmental impact, it is the most comprehensive source available for information on the management and generation of hazardous wastes. The data are collected every other year.

Some hazardous wastes are not picked up in the BRS database.  Hazardous wastes that are generated in the home, like mineral spirits and old paint, are not regulated by the federal RCRA program.  In addition, not all hazardous waste generators are required to report, some waste is exempted from regulation, and some waste is regulated under other environmental statutes (particularly at the state level).  Some facilities may fail to report.

RCRA data for this project were taken from the original BRS database. This database is no longer posted on the EPA’s website, but can be accessed through the Right To Know Network website: http://www.rtk.net/   The 1990 county population data is taken from a US census web site http://www.census.gov/population/censusdata/90den_stco.txt.


 

Project # 6    top of page

   
Name _______________________        Name____________________________

 


1. Obtain the Data

Your instructor will provide you with a printed copy showing RCRA waste and population data for 1 of 12 states. Enter (or download) the data into the following TI-83+ lists:

RCRA data: 1991=L1, 1993=L2, 1995=L3, 1997=L4
Population data = L5

Compute "mean RCRA waste" for each county, and store the results in list L6.  Transfer the mean values to the data sheet. What are the units of measure for the mean?

 

2. Analyze the data
Find the county with the biggest change in RCRA waste generation from one biennium to the next:  which county, how much waste one year, how much in the next report?  What is the percent change from one biennium to the next?

 

 

Such extreme changes in hazard waste production do not seem reasonable, maybe the numbers are in error.  But maybe not!  Give one reasonable explanation why RCRA waste generation might change so much in one biennium.

 

 

 

Use your TI-83+ to make a frequency histogram of the mean RCRA waste values. (For review information, consult Chapter 3 in your text.)  Sketch the histogram on graph paper. Label axes appropriately.

Are the mean RCRA waste values normally distributed?  How can you easily tell without doing any computations?

 

 

Compute the mean and standard deviation of the mean RCRA waste values.  Use the TI-83+ for assistance.

Is the standard deviation less than, equal to, or greater than the mean? 

 

In your opinion, is the standard deviation "small", "medium" or "large"?  Explain briefly.

 

Compute the following 7 numbers:

Do any of the 7 numbers come out negative? ___________ If so, do these numbers have any physical meaning, can you have negative mean RCRA waste in reality?  What do the negative numbers tell you?

 


Sometimes there are data that seem to be "way out of bounds."  These numbers can be accurate or they can be caused by error.  In either case they tend to dominate the calculations.   Statisticians call these numbers outliers; outliers are numbers that lie more than 3 standard deviations away from the mean.  Are there any outliers in your mean RCRA waste values?  If so, what are the names of the counties?




3. Per Cap Waste
The EPA hires you as a consultant, to impose fines on counties that are "environmentally bad."   Your supervisor suggests that counties that generate the most RCRA waste should be fined the most.  Discuss why this system might not be fair.

 

Another method of fines is to punish the people, not the counties.  In other words, fine the counties that have the highest mean RCRA waste per capita (per person).  Compute the mean RCRA waste per capita for each county.  Convert the result so that the units are in  pounds per person. (Note: 1 ton = 2000 pounds)  Store the final result in L7 and record on the data sheet.

Use your TI-83+ to make a frequency histogram of the per capita mean RCRA waste values.  Sketch the histogram on separate graph paper. Label axes appropriately.

What is the mean of the mean per capita RCRA waste? What is the standard deviation?   (Use correct symbols when writing values.)

 

Is the standard deviation large, medium or small compared to the mean? 

 

Measuring spread in skewed data using standard deviation is problematic because standard deviation is often many times bigger than the mean. Has normalization by population "improved" the standard deviation of the data?  In other words, is the per capita waste data less skewed than the unnormalized waste values?



 

4. Transform the data
When data are skewed to the right, we can often make the distribution more symmetrical by logging the data. Do this now: log the mean per capita RCRA values for each county, and store the results in list L8. Record the logged values on your data sheet. Then sketch a frequency histogram of the logged values. Include units and labels.

How does the histogram of the transformed data (log of the per capita mean RCRA values) compare to the two histograms that you sketched previously?

Compute the mean and standard deviation for the transformed data. Include units of measure.

 

Is the standard deviation less than, equal to, or greater than the mean? 

Is the standard deviation "small", "medium" or "large", as compared to the mean?  Explain briefly.

 

 

For the transformed data, calculate the 7 numbers:

 

Use these 7 numbers to determine if the transformed data are normally distributed. Show work.

 

 

 

5. Carrots and Sticks
You have transformed the county data into a distribution that is closer to normal. Now you come up with the following idea to impose waste fines.  Based on the transformed data, impose the highest fines on counties that lie more than 3 standard deviations above the mean, impose moderate fines on counties that lie between 2 and 3 standard deviations above the mean, impose small fines on counties that lie between 1 and 2 standard deviations above the mean, and very small fines for those counties between the mean and 1 standard deviation above the mean.  On your data sheet, under the column "st. dev. category", indicate which counties are in the categories:   ">3", "2 to 3", "1 to 2", or "0 to 1".      

To reward counties that produce the least amount of RCRA waste per person, you will give waste credits that can be sold in the market.   On your data sheet, for those counties whose RCRA wastes are below the mean, mark categories "<-3", "-3 to -2", "-2 to -1", and "-1 to 0".

Now you get good results with this penalty and reward system.  Overall, polluters are given monetary incentives to improve their standard deviation score.  In fact, you suggest that all states take up your system.  Your boss likes the idea, but she has some questions:

Is it possible that in some state most of the counties would be in the "above 3" or "below -3" categories?  This could be seen as politically "heavy handed", with lots of money flowing back and forth in fines and credits.  What is your answer?

 

How would this system work with a state like South Dakota , whose mean per capita RCRA waste is very low?  Won't most of the counties in South Dakota be getting pollution credits?

 

  


You've convinced your boss that this system will work, but now she has a third question.  When two counties lie in the same standard deviation category they are penalized or rewarded the same, even if their mean RCRA waste per capita numbers are different.  Is there some way to refine the rewards and incentives so that there is a continuous scale?

A continuous scale can be based on "z-scores" for each county.  A z-score is a number that indicates how many standard deviations each county lies above or below the mean.  Z-scores are computed with the simple formula:

Here x is each county's logged per capita mean RCRA waste, xbar is the mean of logged per capita wastes, and s is the standard deviation.  The z-scores are positive if the county lies above the mean, and negative if they lie below.   Fill out the last column on the data sheet with the z-score for each county; round to 2 decimal places of accuracy.

Your boss thinks your z-score idea is great.  She now gives you enough money to impose fines and give credits.  She suggests a $100,000 fine or credit per z-score (fines for positive z-scores, credits for negative z-scores).  Will your agency lose money, earn money, or break even?  Explain in detail.

 

 

 

Sample Student Data Sheet (for Illinois)

Illinois county

RCRA waste produced (tons/yr)

Pop.

Mean RCRA waste (tons/year)

Per Capita Mean RCRA Waste (lbs/person)

Log(Per Capita Mean RCRA Waste)

St. Dev. category

z score

1991

1993

1995

1997

Adams

19,142

267

328

19,515

66,090

         

Champaign

1,751

804

2,158

996

173,025

         

Coles

241

705

137

198

51,644

         

Cook

1,962,005

323,486

1,367,858

1,773,073

5,105,067

         

DeKalb

1,528

1,591

3,690

2,263

77,932

         

Du Page

8,259

8,206

23,319

57,110

781,666

         

Henry

29

34

20

6,312

51,159

         

Jackson

4,151

374

362

584

61,067

         

Kane

6,962

12,931

26,794

6,751

317,471

         

Kankakee

2,948

25,320

14,456

64,693

96,255

         

Knox

2,912

5,404

426

6,197

56,393

         

Lake

18,922

83,161

25,291

19,926

516,418

         

La Salle

11,010

1,418

4,697

7,556

106,913

         

McHenry

7,345

8,503

5,543

0

183,241

         

McLean

62,737

2,390

25,496

1,045

129,180

         

Macon

1,072

2,755

4,776

2,131

117,206

         

Madison

9,475,186

10,284,282

8,719,444

8,154,422

249,238

         

Marion

446

562

2,519

2,506

41,561

         

Ogle

1,406

2,290

24,329

19,646

45,957

         

Peoria

116,995

171,412

137,044

112,615

182,827

         

Rock Island

5,896

31,732

11,388

11,147

148,723

         

St. Clair

65,457

39,461

27,951

24,336

262,852

         

Sangamon

4,384

1,692

1,023

1,677

178,386

         

Stephenson

172,645

197,314

204,975

261,332

48,052

         

Tazewell

514

369

401

1,242

123,692

         

Vermilion

42,698

1,321

713

483

88,257

         

Whiteside

38,224

57,309

76,979

82,468

60,186

         

Will

34,852

68,873

1,194,153

1,096,687

357,313

         

Williamson

330

122

3,272

5,400

57,733

         

Winnebago

73,630

89,115

188,037

381,133

252,913

         



Student Data Sheet -- With Solutions


Illinois county

RCRA waste produced (tons/yr)

Pop.

Mean RCRA waste (tons/year)

Per Capita Mean RCRA Waste (lbs/person)

Log(Per Capita Mean RCRA Waste)

St. Dev. category

z score

1991

1993

1995

1997

Adams

19,142

267

328

19,515

66,090

9,813

297.0

2.47

0 to 1

0.18

Champaign

1,751

804

2,158

996

173,025

1,427

16.5

1.22

-2 to -1

-1.27

Coles

241

705

137

198

51,644

320

12.4

1.09

-2 to -1

-1.42

Cook

1,962,005

323,486

1,367,858

1,773,073

5,105,067

1,356,606

531.5

2.73

0 to 1

0.47

DeKalb

1,528

1,591

3,690

2,263

77,932

2,268

58.2

1.76

-1 to 0

-0.64

Du Page

8,259

8,206

23,319

57,110

781,666

24,224

62.0

1.79

-1 to 0

-0.61

Henry

29

34

20

6,312

51,159

1,599

62.5

1.80

-1 to 0

-0.60

Jackson

4,151

374

362

584

61,067

1,368

44.8

1.65

-1 to 0

-0.77

Kane

6,962

12,931

26,794

6,751

317,471

13,360

84.2

1.93

-1 to 0

-0.46

Kankakee

2,948

25,320

14,456

64,693

96,255

26,854

558.0

2.75

0 to 1

0.50

Knox

2,912

5,404

426

6,197

56,393

3,735

132.5

2.12

-1 to 0

-0.23

Lake

18,922

83,161

25,291

19,926

516,418

36,825

142.6

2.15

-1 to 0

-0.19

La Salle

11,010

1,418

4,697

7,556

106,913

6,170

115.4

2.06

-1 to 0

-0.30

McHenry

7,345

8,503

5,543

0

183,241

5,348

58.4

1.77

-1 to 0

-0.64

McLean

62,737

2,390

25,496

1,045

129,180

22,917

354.8

2.55

0 to 1

0.27

Macon

1,072

2,755

4,776

2,131

117,206

2,684

45.8

1.66

-1 to 0

-0.76

Madison

9,475,186

10,284,282

8,719,444

8,154,422

249,238

9,158,334

73490.7

4.87

2 to 3

2.95

Marion

446

562

2,519

2,506