Talk:Sampling (statistics)/Archives/2012

This page is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Section on case studies and sampling

Editors Krakenflies and Gsaup recently added this extensive section on "sampling" in case studies. Although I enjoyed reading the linked article, I believe the section does not relate to the topic of this article, statistical sampling, so I will delete it. Perhaps it could go in a new Sampling (case studies) article instead. -- Avenue 11:51, 11 August 2006 (UTC)

Or maybe combined with Theoretical sampling? -- Avenue 12:09, 11 August 2006 (UTC)

Added definitions for types of data and levels of measurement.

I thought these subheadings would be relevant. Let me know. JT Pickering 22:20, 3 September 2006 (UTC)joe pickering

Outdented 'levels of measurement'

Seemed to flow better when not considered as part of the types of data.JT Pickering 18:29, 12 November 2006 (UTC)

Sampling used as Democratic Process

I would like to know of any movements, if they exist or ever have existed, which advocate for the use of random sampling techniques to provide official decision making bodies. For example, what if the US House of Representatives were actually a representative sample? This seems like something which could be worthy of exploration since it seems one of the only ways to remove bias toward the rich and the non-competitive climate would almost certainly be more conducive to productive intellectual dialogues, as opposed to degenerate partisan contests. Maybe then politics wouldn't be such a turn off to so many people. LordBrain 23:21, 6 October 2006 (UTC)

I believe the phrase is 'government by lot.' To my knowledge, it's never been a primary governmental method, but it does show up here and there in different ways. In the US, the most prominent example is jury duty. The problem--jury duty is a good example--is that it is extraordinarily vulnerable to manipulation. In any event, it shouldn't really be included in this article. Ethan Mitchell 20:00, 22 January 2007 (UTC)

Stratified sampling

At the end of the Stratified sampling section it says:

"Typically, strata should be chosen to have:

   * means which differ substantially from one another
   * variances which are different from one another, and lower than the overall variance."

Wouldn't the second bullet be more clearly expressed as "minimise variance within strata and maximise variance between strata."?

It might also be worth mentioning that stratified sampling can introduce bias when selecting strata.

Selective Sampling

There is no mention of selective sampling! -- Flutefluteflute ^Talk ^{Contributions} 11:31, 29 March 2007 (UTC)

I don't know much about selective sampling, but I gather from skimming these articles [1] [2] [3] that it is a machine learning term for techniques for selecting your sample to reduce the numbers needed to get good predictions, i.e. it would be part of experimental design (specifically adaptive designs) rather than sampling in the usual statistical sense. If I'm right, it wouldn't hurt to add a note to that effect. -- Avenue 13:34, 29 March 2007 (UTC)

Section on Graduate degree programs specializing in sampling/survey methods

How is this section relevant to the rest of the article? It merely contains links with no further description. It is even placed after the See Also section. I suggest removing it. Mdanh2002 (talk) 16:00, 9 April 2008 (UTC)

I agree entirely so I have removed it in line with WP:EL. Qwfp (talk) 17:33, 9 April 2008 (UTC)

Types of data

I think that this section might be improved (generalised). A link to Level of measurement might be worthwhile anyway. There may be some question about how much needs to be in the present article, but something about ordinal data (at least) seems necessary here. Melcombe (talk) 09:47, 15 April 2008 (UTC)

I think that section doesn't really belong in this article. How about we move it to Statistical survey? I agree it could be improved. -- Avenue (talk) 12:35, 16 April 2008 (UTC)

Unsupported claims

There is, however, a strong but unnoticed division of views about the acceptability of representative sampling across different domains of study. To the philosopher or doctor, the representative sampling procedure has no justification whatsoever because it is not how truth is pursued in philosophy. - If it's genuinely 'unnoticed', it fails WP:RS and doesn't belong here. The claim that philosophers and doctors don't acknowledge the worth of representative sampling is extremely sweeping (I can think of several exceptions just among my personal acquaintance!), and needs better sourcing than is given here if it's to be included. --144.53.226.18 (talk) 03:33, 18 December 2008 (UTC)

'Bias'

If periodicity is present and the period is a multiple of 10, then bias will result. - Technically, this isn't bias; bias occurs when the expected (average) estimate that would result from the sampling scheme is higher or lower than the true value. Periodicity causes a lot of measurements that are substantially higher and substantially lower than the true value (i.e. large sampling error), but in the averages these cancel out.

(Basically, 'bias' is the component of error that can't be eliminated by performing a sufficiently large number of samples and averaging the results; 'sampling error' is the component that can.) --144.53.226.18 (talk) 04:28, 18 December 2008 (UTC)

What difference?

"To the scientist, however, representative sampling is the only justified procedure for choosing individual objects for use as the basis of generalization, and is therefore usually the only acceptable basis for ascertaining truth." (Andrew A. Marino) [1]. It is important to understand this difference to steer clear of confusing prescriptions found in many web pages.

Does the author mean the difference between representative sampling and sampling that is not representative? ( Martin | talk • contribs 00:46, 5 March 2009 (UTC))

I also find this section a bit confusing (speaking of confusing prescriptions in web pages). IIRC I edited it down from a longer discussion that seemed to be verging on somebody's personal opinion; I was reluctant to delete the whole thing, but I'm not convinced that paragraph really adds to the article. --GenericBob (talk) 02:31, 5 March 2009 (UTC)

Identifying Voters

The article said there is no way to identify every voter in an election, before the election.

This seems to assume all electoral systems are elective. In a system of compulsory voting, the electoral roll is a list of people who will vote.

It's only a minor point, but I changed the wording slightly. Pavium (talk) 04:23, 30 April 2009 (UTC)

Good point. I tweaked it a bit more to make it clearer that we're talking about specifically identifying voters (not possible), not just identifying all people who *might* vote (easy to do from electoral roll). --GenericBob (talk) 11:07, 30 April 2009 (UTC)

Bias in stratified sampling?

Removed this bit: "Requires accurate information about the population, or introduces bias as a result of either measurement error (effects of which can be modeled by the errors-in-variables model) or selection bias."

This seemed to be confusing stratification with post-stratification. As long as you know selection probabilities for the units you actually select (and all units have nonzero probability), stratified sampling isn't biased. It might be very *inaccurate* if you don't know your population well and choose strata sample sizes poorly, but that's not the same thing. --GenericBob (talk) 08:50, 29 June 2009 (UTC)

Removed Dillman advertising

I removed this advertising (by effect if not by intention), which doesn't have any references or citation. There are plenty of other references on improving response, so why give such prominence to one guy? Kiefer.Wolfowitz (talk) 17:28, 7 July 2009 (UTC)

Dillman Total Design Method (TDM)

The Dillman Total Design Method (TDM), developed by professor Don Dillman, is a method of reducing non-response for telephone, mail, and internet surveys at a low cost. TDM is based on Social Exchange, which suggests the likelihood that individuals will respond to a survey questionnaire is a function of how much effort is required to respond, and what they feel they are likely to get in exchange for completing the questionnaire. The basic elements and procedures of the TDM are:

Minimize the burden on the respondent by designing questionnaires that are attractive in appearance and easy to complete; printing mail questionnaires in booklet format; placing personal questions at the end; creating a vertical flow of questions; and creating sections of questions based on their content.

Personalize all communication with the respondent by printing letters and envelopes individually, using blue ball point pens for signatures and a first class stamp on outgoing and return envelopes; and constructing a persuasive letter.

Provide information about the survey in a cover letter to respondents, interviewers, and clerical personnel. If possible, also send out letters in advance informing respondents that a survey is forthcoming.

Follow-up contacts of non-respondents is essential, and TDM includes more than 30 steps intended to maximize response rate (and reduce non-response)

A sample of the some of the precisely laid-out steps includes: 1. The survey population is sent a questionnaire booklet containing as many as 12 pages. 2. The booklet has an illustrated front cover and a specified instruction format, a means of identifying respondents to allow for removal of their names from the mailing list, and a return envelope. 3. The covering letter, which clearly describes the purpose of the study and explains why the respondent's opinion is being sought, must be signed by hand, in blue ink. 4. An optional return postcard contains the respondent's name and permits the questionnaire to be returned anonymously; alternatively, the questionnaires may be prenumbered. 5. Follow-up procedures adhere to the following:

One week after the initial mail-out a reminder postcard is sent; three weeks and seven weeks after the initial mail-out, nonresponders are sent duplicate packets.The seven-week packet is sent by registered mail. Follow-up letters to non-responders are precisely formatted, and Dillman provides detailed advice on how to construct the questionnaire.

Dillman's have been criticized on the ground that they are too rigid. While it is the nature of "how to" books to give exact instructions, experienced researchers suggest adapting the components of any method to their own situation.

Dillman's "cookbook" method has been cited in more than 1750 scientific publications

In the 2007 update to his original work, Dillman includes new and updated material that covers both the principles behind and directions for how to:

Conduct Web surveys
Visually design questionnaires
Use paper mailed surveys

Citation needed

This quotation appears in a whole bunch of wiki pages on statistics, each time without any citation:

"Nature has established patterns originating in the return of events but only for the most part. New illnesses flood the human race, so that no matter how many experiments you have done on corpses, you have not thereby imposed a limit on the nature of events so that in the future they could not vary. —Gottfried Leibniz"

The only google hits on the phrase turn up wiki pages where (maybe the same person?) has referenced the quotation (and each time, as an illustration of a slightly different concept!). We need a citation.

adaptive sampling

Adaptive sampling designs are those in which the selection procedure may depend sequentially on observed values of the variable of interest. Gains in efficiency can be achieved through adaptive sampling procedures. — Preceding unsigned comment added by 208.65.73.145 (talk) 18:32, 8 June 2011 (UTC)

Bibliography clean up

I removed non-leading references and replaced them with high-quality most reliable references.

I correctly named a "Further reading" section, which now excludes non-leading sources. Kiefer.Wolfowitz 11:18, 16 September 2011 (UTC)

Sample Size Table Source

I could not find the "Sample Size Table," nor the reference cited in that section (Cohen). I offer in it's stead, the table @ http://www.research-advisors.com/tools/SampleSize.htm (Authored by Paul C. Boyd ... me). — Preceding unsigned comment added by PBoyd (talk • contribs) 12:49, 2 October 2011 (UTC)

"wishfully"?

The phrase "wishfully representative" seems a bit pejorative to me; it suggests that representative samples are somehow elusive or mystical. On the contrary, although abuses and mistakes occur, there are, every day, many representative statistical samples taken. I'd replace this with "ideally representative" myself, except that 1) it's at the very top of the article; 2) this seems like a contentious issue already. — Preceding unsigned comment added by 160.39.140.189 (talk) 04:28, 11 November 2011 (UTC)

Accidental/convenience vs random sampling

Why is Accidental/convenience sampling less efficient than random sampling? First, every time you select from the population to generate a random sample, you change the size of the remaining population, thus the members of the population do not have an equal likelihood of being selected. Second, it's possible through random chance that the entire sample of a random sample can be made-up of the same sample that would have been chosen out of convenience, so regardless of whether this group is chosen randomly or out of convenience, the ability of the sample to represent the population remains the same. Thus, convenience sampling is just as effective as random sampling because the random sample might actually be the convenient sample anyways. — Preceding unsigned comment added by 98.220.9.116 (talk • contribs)

"every time you select from the population... you change the size of the remaining population, thus the members of the population do not have an equal likelihood of being selected" - I'm not sure what you're saying here. Let's take a population of six people (Alice, Bob, Carol, Dave, Evelyn, Fred) and pick two of them: first select one at random, then select another at random from the five remaining.

I maintain that each of these people has a 1/3 chance of being selected overall. If you disagree, could you indicate which people have greater or lesser probabilities?

On your second point: yes, it's possible that a random sample will give the same outcome as a convenience sample, but (for large sample sizes) it's extremely unlikely. A random sample has a much better chance of giving an accurate estimate. --GenericBob (talk) 09:28, 20 August 2012 (UTC)