Office of National Drug Control Policy bannerskip
skip tertiary linksHome | About | Site Map | Contact

Home | Publications | What America's Users Spend on Illegal Drugs 1988–1998

What America's Users Spend on Illegal Drugs 1988–1998

December 2000

Spoken Math version of Appendix D

Appendix D
Imputations for Missing Data on Marijuana Use

Calculations of the amount of marijuana used by household members was straightforward. We multiplied the number of marijuana users per month, by the average number of joints smoked per user, by the average weight of a joint. The result was then multiplied by twelve months to give a year's estimate. The principal problems when making this calculation are dealing with missing data and with responses that represent a range. The latter presents a problem because the ranges are not suitable for our calculations. Because the Substance Abuse and Mental Health Services Administration had already imputed responses when there was missing data about recent use, this was not a problem. This appendix explains how we imputed responses when either the number of joints smoked or the amount of marijuana smoked were missing or were reported as a range.

Imputing the Number of Joints Smoked

From the National Household Survey for 1991, analysts selected respondents who said they used marijuana in the past month and who gave valid responses to three related questions. The first question was the number of days they smoked marijuana in the past month (DAYS). Valid responses were 1–30 days. The second question was the number of marijuana cigarettes smoked per day in the past month (JOINTS). From the responses to these two questions, analysts created a variable

TOTAL JOINTS = DAYS*JOINTS.

The third question was the amount of marijuana used during the last month (AMOUNT). This is exactly the question that the analysts sought to answer, but the AMOUNT question was not directly useful for this purpose because it was specified as a range. The acceptable answers to AMOUNT were:

  • 1–10 joints
  • 11–20 joints
  • 1 ounce
  • 2 ounces
  • 3–4 ounces
  • 5–6 ounces

The analysts' problem was to infer the amount of marijuana used by people who said they used marijuana in the last month based on the variables TOTAL JOINTS and AMOUNT.

As short-hand, let J represent TOTAL JOINTS, let A represent AMOUNT, and let W equal the weight of marijuana used in ounces. The analysts wanted to estimate W.

Now, W is unknown, but it might be represented as:


Formula

where Symbol is the weight per joint and Symbol is a random error term, which will be discussed below. Equation [1] says that, on average, a person who smokes J joints will use W ounces of marijuana, because Symbol is the average weight of a single joint. Of course, some people who smoke J joints use a little less; some use a little more. This variation about what is typical is reflected in the term Symbol.

Assume that Symbol is distributed normally with a mean of zero, a standard deviation of Symbol, and that the error terms are independently and identically distributed. It turns out that these assumptions about the distribution of Symbol are hard to justify, and alternative assumptions are adopted later. However, this simple, if somewhat unrealistic, specification is useful for explaining the approach.

Although W is unknown to the analysts, it is known to the respondent, and by assumption the value of W determines the respondent's answer for AMOUNT. Specifically, the respondent will say that he used

Formula

The logic here is that the respondent will select the usage category that most closely describes his use, although it seems reasonable to suppose that he makes errors when making this translation. Two terms are unknown, Symbol and Symbol. The first, Symbol, is presumably the weight of 10.5 joints. The second is harder to interpret, but Symbol is some value that distinguishes the response "10 to 2" joints from "1 ounce," at least in the eyes of the respondent.

There are four parameters to be estimated here: Symbol, Symbol, Symbol and Symbol. These parameters can be estimated by maximum likelihood once a probability has been assigned to every response.


Formula
Specifically,


Formula
Formula
Formula
Formula
Formula

where Symbol is the standard normal distribution function.

This approach is similar to an ordered probit model. There is an important difference between this approach and a traditional probit model, however. Specifically, the threshold values of 1.5, 2.5, and 4.5 are known although Symbol and Symbol are unknown. This allows the parameter Symbol to be identified and estimated. In turn, this allows Symbol to be identified and interpreted as the weight of a marijuana cigarette.

One further extension is to assume that:


Formula

That is, the parameter Symbol equals the weight of 10.5 joints, because the weight of 10.5 joints is the threshold value between the responses "1–10 joints" and "11–20 joints." There are only three remaining parameters to estimate: Symbol, Symbol, and Symbol.

As stated, this model is an unacceptable representation of the relationship between the number of joints smoked and the amount of marijuana smoked. A more convincing model is:


Formula

This implies that the average joint weighs - ounces, but that the weight varies across users. This variation is represented by the distribution of Symbol. The model would be complete once the distribution of Symbol is specified.

The distribution of Symbol has to satisfy some a priori constraints. First, W must be positive, so Symbol has a lower limit that depends on -J. Second, the distribution of Symbol should account for an apparent upward skew: inspection of the data shows that some users seem to use much more than the average amount of marijuana, but nobody can use much less because zero is a lower limit. Third, the error term is heteroscedastic.

A new specification is more useful, given these a priori constraints:


Formula

where Symbol. Here, Symbol has a lognormal distribution, and thus SymbolJ is always positive and Symbol is skewed upward. In this specification:

Formula

Formula

Taking logarithms on both sides of [3], we have


Formula
Formula

where Symbol. As with the earlier, less realistic model, the parameters can be estimated using maximum likelihood. A simple extension is to let Symbol. The "100" is just a scale factor that has no effect on analysis. This specification allows frequent smokers to smoke larger or smaller joints than average smokers.

The most important estimate is E(Symbol), the average weight of a marijuana cigarette. An estimate of W, then, is:

W = E(Symbol)J

This tells us that if a respondent says he smoked J joints during the month (TOTAL JOINTS), then E(Symbol)J is the best estimate of the quantity (in ounces) of marijuana smoked.


Table D presents parameter estimates based on an analysis of 1623 smokers who reported DAYS, JOINTS, and AMOUNT. Before estimating these parameters, the analysts changed some of the data

Table D–1
Regression Results: The Total Amount of Marijuana Smoked in the Past Month

Click for larger image


Table D-1

Before calculating TOTAL JOINTS, responses of more than 30 for JOINTS (number of marijuana cigarettes smoked per day in the past month) were truncated to 30. These extreme responses represented only about 0.1% of the total number of monthly users.

After calculating TOTAL JOINTS, analysts compared TOTAL JOINTS with AMOUNT and corrected for extreme inconsistencies between (or highly unlikely combinations of) the two variables. If JOINTS >= 100 and AMOUNT <= 20 joints or if JOINTS >= 200 and AMOUNT <= 2 ounces, then analysts assumed that the respondents had mistakenly given the total number of joints they had smoked in the past month for the question on JOINTS (number of marijuana cigarettes smoked per day in the past month). For these respondents, analysts treated JOINTS as TOTAL JOINTS in calculating the quantity estimates.

Results from the analysis imply that a person who smokes 1 joint per month uses 0.013 ounces (0.37 grams per joint) of marijuana. A person who smokes thirty joints per month uses 0.4 ounces (0.38 grams per joint) of marijuana. A person who smokes 120 joints per month uses 1.79 ounces (0.43 grams per joint) of marijuana. Applying the parameter estimates from Table D–1, Equation [7] was then used to compute the average weight per joint (W/J) for every respondent in each year of the NHSDA. Results, which appear in Table 6 of the main report, are used in the calculations reported in the body of this report.

Imputing Joints

A related problem is that the variable JOINTS was sometimes missing. We could not just substitute the average response when JOINTS were known, because those with missing data seemed to have different usage patterns from those who did not have missing data. Instead, we estimated regressions where JOINTS was the dependent variable and MJFREQ was the independent variable. MJFREQ is "frequency used marijuana in the past 12 months." We used results from these regressions to impute responses when JOINTS was missing.

MJFREQ is coded:

1 — several times a day;
2 — daily;
3 — almost daily (3 to 6 days a week);
4 — 1 or 2 times a week;
5 — several times a month (about 25 to 51 days a year);
6 — 1 or 2 times a month (12 to 24 days a year);
7 — every other month or so (6 to 11 days a year);
8 — 3 to 5 days in the past 12 months;
9 — 1 or 2 days in the past 12 months.

We treated this variable as a continuous measure. To capture nonlinearities, we added an additional independent variable MJFREQ2 = MJFREQ Multiplication Symbol MJFREQ.

The regression had two special features. The first was that the respondent could have said that he used zero joints during the month before the interview. After all, marijuana use during the year (MJFREQ) does not imply marijuana use during the month before the survey (JOINTS). To take this special feature into account, the regression specification was written:


Formula


Formula

where


Formula

Note that in this specification the error term is heteroscedastic and a linear function of the underlying latent variable Z.

Table D–2 shows regression results.

Table D–2
Regression Results: The Average Number of Joints Smoked in the Past Month


Table D-2

The table shows two regressions. Model 1 was estimated for the 1418 respondents who reported use of marijuana in the 1991 NHSDA survey. Model 2 was estimated for the 190 respondents whose use of marijuana was imputed by SAMHSA. We estimated two separate models because specification testing showed that estimates based on the 1418 cases did not work well for the 190 cases and vice versa.

The regressions over predict slightly. Based on the 1418 cases, the regressions predict 23.4 joints on average per month. In reality, respondents said they used an average of 21.6 joints per month. For the 190 cases, the prediction was 10.7 joints on average per month and the actual was 8.5 joints. Because these predictions were only used when responses were missing for the variable JOINTS, we considered them to be close enough for our purposes.


Previous Contents  



Download Adobe Acrobat Reader
Adobe Acrobat Reader

skip navigationInformation Quality Guidelines | Privacy Policy | Site Map | Disclaimer | Accessibility | FOIA