“
Every car on every train on every line holds a surprise, a random sampling of humanity brought together in a confined space for a minute or two - a living Rubik's Cube.
”
”
Bill Hayes (Insomniac City: New York, Oliver, and Me)
“
Systematic sampling is like random sampling, except the members are not chosen totally randomly. You can choose members at regular intervals here.
”
”
Pooja Agnihotri (Market Research Like a Pro)
“
After all, a flight is just a random sample of the general population, a classic bell curve. A few assholes and a few exemplars, but primarily, a whole bunch of sheep.
”
”
T.J. Newman (Falling)
“
Nonetheless, gazing out the train window at a random sample of the Western world, I could not avoid noticing a kind of separation between human beings and all other species. We cut ourselves off by living in cement blocks, moving around in glass-and-metal bubbles, and spending a good part of our time watching other human beings on television. Outside, the pale light of an April sun was shining down on a suburb. I opened a newspaper and all I could find were pictures of human beings and articles about their activities. There was not a single article about another species.
”
”
Jeremy Narby (The Cosmic Serpent: DNA and the Origins of Knowledge)
“
I cannot take a subway without marveling at the lottery logic that brings together a random sampling of humanity for one minute or two, testing us for kindness and compatibility. Is that not what civility is?
”
”
Bill Hayes (Insomniac City: New York, Oliver, and Me)
“
Rule of Five There is a 93.75% chance that the median of a population is between the smallest and largest values in any random sample of five from that population.
”
”
Douglas W. Hubbard (How to Measure Anything: Finding the Value of Intangibles in Business)
“
The purely random sample is the only kind that can be examined with confidence by means of statistical theory, but there is one things wrong with it. It is so difficult and expensive to obtain for many uses that sheer cost eliminates it. A more economical substitute, which is almost universally used in such fields as opinion polling and market research, is called stratified random sampling.
”
”
Darrell Huff (How to Lie with Statistics)
“
The purely random sample is the only kind that can be examined with entire confidence by means of statistical theory, but there is one thing wrong with it. It is so difficult and expensive to obtain for many uses that sheer cost eliminates it. A more economical substitute, which is almost universally used in such fields as opinion polling and market research, is called stratified random sampling.
”
”
Darrell Huff (How to Lie with Statistics)
“
Lisak and Miller examined a random sample of 1,882 men, all of whom were students at the University of Massachusetts Boston between 1991 and 1998. Their average age was twenty-four. Of these 1,882 students, 120 individuals—6.4 percent of the sample—were identified as rapists, which wasn’t a surprising proportion. But 76 of the 120—63 percent of the undetected student rapists, amounting to 4 percent of the overall sample—turned out to be repeat offenders who were collectively responsible for at least 439 rapes, an average of nearly 6 assaults per rapist. A very small number of men in the population, in other words, had raped a great many women with utter impunity. Lisak’s study also revealed something equally disturbing: These same 76 individuals were also responsible for 49 sexual assaults that didn’t rise to the level of rape, 277 acts of sexual abuse against children, 66 acts of physical abuse against children, and 214 acts of battery against intimate partners. This relative handful of male students, as Lisak put it, “had each, on average, left 14 victims in their wake….And the number of assaults was almost certainly underreported.
”
”
Jon Krakauer (Missoula: Rape and the Justice System in a College Town)
“
The basic sample is the kind called “random.” It is selected by pure chance from the “universe,” a word by which the statistician means the whole of which the sample is a part.
”
”
Darrell Huff (How to Lie with Statistics)
“
You can present all the random sample studies you want to prove that it's safe to walk under a ladder, but a superstitious person will still avoid that ladder.
”
”
Chris Hadfield (An Astronaut's Guide to Life on Earth)
“
As Atwood concludes after a random and informal sampling, men and women differ markedly in the 'scope of their threatenability': 'Why do men feel threatened by woman?' I asked a male friend of mine....'[M]en are bigger, most of the time...and they have on the average a lot more money and power.' 'They're afraid women will laugh at them,' he said. 'Undercut their world view.' Then I asked some women students in a quickie poetry seminar I was giving, 'Why do women feel threatened by men?' 'They're afraid of being killed,' they said'.
”
”
Shuli Barzilai (Tales of Bluebeard and His Wives from Late Antiquity to Postmodern Times (Routledge Studies in Folklore and Fairy Tales))
“
Nonetheless, gazing out the train window at a random sample of the the Western world, I could not avoid noticing a kind of separation between human beings and all other species. We cut ourselves off by living in cement blocks, moving around in glass-and-metal bubbles, and spending a good part of our time watching other human beings on television. Outside, the pale light of an April sun was shining down on a suburb. I opened a newspaper and all I could find were pictures of human beings and articles about their activities. There was not a single article about another species.
”
”
Jeremy Narby (The Cosmic Serpent: DNA and the Origins of Knowledge)
“
On an evening when Perdita's away on a school trip, Harriet sits in front of her computer eating sample squares of lavender shortbread and practicing her favorite form of procrastination: writing highly positive reviews of her eBay, Etsy, and Amazon purchases. Five stars for everybody. She didn't finish one of the books she just gave five stars to. She just liked the author photo. Five stars for the portrait photographer, then. She's been doing this ever since some of her students told her they do this with one-star reviews. Opposing random negativity with random positivity - that's the main thing.
”
”
Helen Oyeyemi (Gingerbread)
“
As you consider the next question, please assume that Steve was selected at random from a representative sample:
”
”
Daniel Kahneman (Thinking, Fast and Slow)
“
They’re making a movie, Joe is doing the camera work, he’s never done it before but David says they’re the new Renaissance Men, you teach yourself what you need to learn. It was mostly David’s idea, he calls himself the director: they already have the credits worked out. He wants to get shots of things they come across, random samples he calls them, and that will be the name of the movie too: Random Samples.
”
”
Margaret Atwood (Surfacing)
“
If you engaged in a Russian roulette–type strategy with a low probability of large loss, one that bankrupts you every several years, you are likely to show up as the winner in almost all samples—except in the year when you are dead.
”
”
Nassim Nicholas Taleb (Fooled by Randomness: The Hidden Role of Chance in Life and in the Markets (Incerto, #1))
“
Amos and I called our first joint article “Belief in the Law of Small Numbers.” We explained, tongue-in-cheek, that “intuitions about random sampling appear to satisfy the law of small numbers, which asserts that the law of large numbers applies to small numbers as well.” We also included a strongly worded recommendation that researchers regard their “statistical intuitions with proper suspicion and replace impression formation by computation whenever possible.
”
”
Daniel Kahneman (Thinking, Fast and Slow)
“
The test of the random sample is this: Does every name or thing in the whole group have an equal chance to be in the sample? The purely random sample is the only kind that can be examined with entire confidence by means of statistical theory, but there is one thing wrong with it. It is so difficult and expensive to obtain for many uses that sheer cost eliminates it. A more economical substitute, which is almost universally used in such fields as opinion polling and market research, is called stratified random sampling.
”
”
Darrell Huff (How to Lie with Statistics)
“
Imagine going to Mexico with a notebook and trying to figure out the average wealth of the population from talking to people you randomly encounter. Odds are that, without Carlos Slim in your sample, you have little information. For out of the hundred or so million Mexicans, Slim would (I estimate) be richer than the bottom seventy to ninety million all taken together. So you may sample fifty million persons and unless you include that “rare event,” you may have nothing in your sample and underestimate the total wealth.
”
”
Nassim Nicholas Taleb (Antifragile: Things That Gain From Disorder)
“
Boston University’s CTE Center has established a “brain bank,” where former athletes with symptoms consistent with CTE can donate their brains upon death. Now with 425 brains, the center published a study in 2017 of former professional and amateur football players. Among 111 NFL players, the brains of all but one of them showed signs of severe CTE. That’s sobering and frightening, but it’s extremely important to keep in mind that this study involved former players who already showed personality and mental changes consistent with brain injury. It was not a random sample of NFL players, most of whom never show such changes despite having experienced concussions.
”
”
Rahul Jandial (Life Lessons From A Brain Surgeon: Practical Strategies for Peak Health and Performance)
“
I feel as though dispossessed from the semblances of some crystalline reality to which I’d grown accustomed, and to some degree, had engaged in as a participant, but to which I had, nevertheless, grown inexplicably irrelevant. But the elements of this phenomenon are now quickly dissolving from memory and being replaced by reverse-engineered Random Access actualizations of junk code/DNA consciousness, the retro-coded catalysts of rogue cellular activity. The steel meshing titters musically and in its song, I hear a forgotten tale of the Interstitial gaps that form pinpoint vortexes at which fibers (quanta, as it were) of Reason come to a standstill, like light on the edge of a Singularity. The gaps, along their ridges, seasonally infected by the incidental wildfires in the collective unconscious substrata.
Heat flanks passageways down the Interstices. Wildfires cluster—spread down the base trunk Axon in a definitive roar: hitting branches, flaring out to Dendrites to give rise to this release of the very chemical seeds through which sentience is begotten.
Float about the ether, gliding a gentle current, before skimming down, to a skip over the surface of a sea of deep black with glimmering waves. And then, come to a stop, still inanimate and naked before any trespass into the Field, with all its layers that serve to veil. Plunge downward into the trenches. Swim backwards, upstream, and down through these spiraling jets of bubbles. Plummet past the threshold to trace the living history of shadows back to their source virus. And acquire this sense that the viruses as a sample, all of the outlying populations withstanding: they have their own sense of self-importance, too. Their own religion. And they mine their hosts barren with the utilitarian wherewithal that can only be expected of beings with self-preservationist motives.
”
”
Ashim Shanker (Sinew of the Social Species)
“
As you consider the next question, please assume that Steve was selected at random from a representative sample: An individual has been described by a neighbor as follows: “Steve is very shy and withdrawn, invariably helpful but with little interest in people or in the world of reality. A meek and tidy soul, he has a need for order and structure, and a passion for detail.” Is Steve more likely to be a librarian or a farmer?
”
”
Daniel Kahneman (Thinking, Fast and Slow)
“
Mark Twain didn’t dabble in psychological focus groups, but he certainly knew something about human nature when he wrote, “Twenty years from now you will be more disappointed by the things that you didn’t do than by the ones you did do. So throw off the bowlines. Sail away from the safe harbor. Catch the trade winds in your sails. Explore. Dream. Discover.” A series of surveys explored the premise that time is an important variable in this equation. Researchers asked a random sampling of people, “When you look back on your experiences in life and think of those things that you regret, what would you say you regret more, those things that you did, but wish you hadn’t, or those things that you didn’t do, but wish you had?” The results found that regrettable “failures to act” outnumbered “regrettable actions” by a two-to-one margin and that this was true for both sexes.
”
”
Chip Conley (Emotional Equations: Simple formulas to help your life work better)
“
As it happens, there’s a way of presenting data, called the funnel plot, that indicates whether or not the scientific literature is biased in this way.15 (If statistics don’t excite you, feel free to skip straight to the probably unsurprising conclusion in the last sentence of this paragraph.) You plot the data points from all your studies according to the effect sizes, running along the horizontal axis, and the sample size (roughly)16 running up the vertical axis. Why do this? The results from very large studies, being more “precise,” should tend to cluster close to the “true” size of the effect. Smaller studies by contrast, being subject to more random error because of their small, idiosyncratic samples, will be scattered over a wider range of effect sizes. Some small studies will greatly overestimate a difference; others will greatly underestimate it (or even “flip” it in the wrong direction). The next part is simple but brilliant. If there isn’t publication bias toward reports of greater male risk taking, these over- and underestimates of the sex difference should be symmetrical around the “true” value indicated by the very large studies. This, with quite a bit of imagination, will make the plot of the data look like an upside-down funnel. (Personally, my vote would have been to call it the candlestick plot, but I wasn’t consulted.) But if there is bias, then there will be an empty area in the plot where the smaller samples that underestimated the difference, found no differences, or yielded greater female risk taking should be. In other words, the overestimates of male risk taking get published, but various kinds of “underestimates” do not. When Nelson plotted the data she’d been examining, this is exactly what she found: “Confirmation bias is strongly indicated.”17 This
”
”
Cordelia Fine (Testosterone Rex: Myths of Sex, Science, and Society)
“
Thanks largely to the attempts to integrate women into the armed forces of many modern countries, the physical differences between the sexes have been precisely measured.[296] One study found the average U.S. Army female recruit to be 12 centimeters shorter and 14.3 kilograms lighter than her male brethren. Compared to the average male recruit, females had 16.9 fewer kilograms of muscle and 2.6 more kilograms of fat, as well as 55 percent of the upper body strength and 72 percent of the lower body strength. Fat mass is inversely related to aerobic capacity and heat tolerance, hence women are also at a disadvantage when performing activities such as carrying heavy loads, working in the heat and running. Even when the samples were controlled for height, women possessed only 80 percent of the overall strength of men. Only the upper 20 percent of women could do as well physically as the lower 20 percent of men. Had the 100 strongest individuals out of a random group consisting of 100 men and 100 women been selected, 93 would be male and only seven female.[297] Yet another study showed gthat only the upper 5 percent of women are as strong as the median male.[298]
”
”
Martin van Creveld (The Privileged Sex)
“
There are at least three levels of sampling involved. Dr. Kinsey’s samples of the population (one level) are far from random ones and may not be particularly representative, but they are enormous samples by comparison with anything done in his field before and his figures must be accepted as revealing and important if not necessarily on the nose. It is possibly more important to remember that any questionnaire is only a sample (another level) of the possible questions and that the answer the lady gives is no more than a sample (third level) of her attitudes and experiences on each question.
”
”
Darrell Huff (How to Lie with Statistics)
“
A random event, by definition, does not lend itself to explanation, but collections of random events do behave in a highly regular fashion. Imagine a large urn filled with marbles. Half the marbles are red, half are white. Next, imagine a very patient person (or a robot) who blindly draws 4 marbles from the urn, records the number of red balls in the sample, throws the balls back into the urn, and then does it all again, many times. If you summarize the results, you will find that the outcome “2 red, 2 white” occurs (almost exactly) 6 times as often as the outcome “4 red” or “4 white.” This relationship is a mathematical fact.
”
”
Daniel Kahneman (Thinking, Fast and Slow)
“
Belief in the Law of Small Numbers” teased out the implications of a single mental error that people commonly made—even when those people were trained statisticians. People mistook even a very small part of a thing for the whole. Even statisticians tended to leap to conclusions from inconclusively small amounts of evidence. They did this, Amos and Danny argued, because they believed—even if they did not acknowledge the belief—that any given sample of a large population was more representative of that population than it actually was. The power of the belief could be seen in the way people thought of totally random patterns—like, say, those created by a flipped coin. People knew that a flipped coin was equally likely to come up heads as it was tails. But they also thought that the tendency for a coin flipped a great many times to land on heads half the time would express itself if it were flipped only a few times—an error known as “the gambler’s fallacy.” People seemed to believe that if a flipped coin landed on heads a few times in a row it was more likely, on the next flip, to land on tails—as if the coin itself could even things out. “Even the fairest coin, however, given the limitations of its memory and moral sense, cannot be as fair as the gambler expects it to be,” they wrote. In an academic journal that line counted as a splendid joke.
”
”
Michael Lewis (The Undoing Project: A Friendship That Changed Our Minds)
“
Base your understanding of the world on data, rather than journalism.
Journalism is a highly non random sample of the worst things that have happened in any given period.
It is an availability machine, in the sense of Tversky and Kahneman's availability heuristic; namely - our sense of risk, danger and prevalence is driven by anecdotes, images and narratives that are available in memory.
A lot of good things are either things that "don't happen" (like a country at peace, or a city that has not been attacked by terrorists, which almost by definition are not news), or things that build up incrementally, a few percentage points a year, and then compound (like the decline of extreme poverty).
We can be unaware, out to lunch about what's happening in the world if we base our view on the news. If instead we base our view on data, then not only do we see that many (although not all) things have gone better (not linearly, not without setbacks and reversals, but in general a lot better... and that paradoxically, as I've cheekily put it, progressives hate progress), but also that the best possible case for progress - that is, for striving for more progress in the future, for being a true progressive - is not to have some kind of foolish hope, but to look at the fact that progress has taken place in the past; and that means: why should it stop now?
”
”
Steven Pinker
“
To understand my doctor’s error, let’s employ Bayes’s method. The first step is to define the sample space. We could include everyone who has ever taken an HIV test, but we’ll get a more accurate result if we employ a bit of additional relevant information about me and consider only heterosexual non-IV-drug-abusing white male Americans who have taken the test. (We’ll see later what kind of difference this makes.) Now that we know whom to include in the sample space, let’s classify the members of the space. Instead of boy and girl, here the relevant classes are those who tested positive and are HIV-positive (true positives), those who tested positive but are not positive (false positives), those who tested negative and are HIV-negative (true negatives), and those who tested negative but are HIV-positive (false negatives). Finally, we ask, how many people are there in each of these classes? Suppose we consider an initial population of 10,000. We can estimate, employing statistics from the Centers for Disease Control and Prevention, that in 1989 about 1 in those 10,000 heterosexual non-IV-drug-abusing white male Americans who got tested were infected with HIV.6 Assuming that the false-negative rate is near 0, that means that about 1 person out of every 10,000 will test positive due to the presence of the infection. In addition, since the rate of false positives is, as my doctor had quoted, 1 in 1,000, there will be about 10 others who are not infected with HIV but will test positive anyway. The other 9,989 of the 10,000 men in the sample space will test negative. Now let’s prune the sample space to include only those who tested positive. We end up with 10 people who are false positives and 1 true positive. In other words, only 1 in 11 people who test positive are really infected with HIV.
”
”
Leonard Mlodinow (The Drunkard's Walk: How Randomness Rules Our Lives)
“
Recall that someone with only casual knowledge about the problems of randomness would believe that an animal is at the maximum fitness for the conditions of its time. This is not what evolution means; on average, animals will be fit, but not every single one of them, and not at all times. Just as an animal could have survived because its sample path was lucky, the “best” operators in a given business can come from a subset of operators who survived because of overfitness to a sample path—a sample path that was free of the evolutionary rare event. One vicious attribute is that the longer these animals can go without encountering the rare event, the more vulnerable they will be to it. We said that should one extend time to infinity, then, by ergodicity, that event will happen with certainty—the species will be wiped out! For evolution means fitness to one and only one time series, not the average of all the possible environments.
”
”
Nassim Nicholas Taleb (Fooled by Randomness: The Hidden Role of Chance in Life and in the Markets)
“
There is no shortage of more stable generalizations about dangerous dogs, though. A 1991 study in Denver, for example, compared 178 dogs that had a history of biting people with a random sample of 178 dogs with no history of biting. The breeds were scattered: German shepherds, Akitas, and Chow Chows were among those most heavily represented. (There were no pit bulls among the biting dogs in the study, because Denver banned pit bulls in 1989.) But a number of other, more stable factors stand out. The biters were 6.2 times as likely to be male than female, and 2.6 times as likely to be intact than neutered. The Denver study also found that biters were 2.8 times as likely to be chained as unchained. “About twenty percent of the dogs involved in fatalities were chained at the time, and had a history of long-term chaining,” Lockwood said. “Now, are they chained because they are aggressive or aggressive because they are chained? It’s a bit of both. These are animals that have not had an opportunity to become socialized to people. They don’t necessarily even know that children are small human beings. They tend to see them as prey.” In many cases, vicious dogs are hungry or in need of medical attention. Often, the dogs had a history of aggressive incidents, and, overwhelmingly, dog-bite victims were children (particularly small boys) who were physically vulnerable to attack and may also have unwittingly done things to provoke the dog, like teasing it, or bothering it while it was eating. The strongest connection of all, though, is between the trait of dog viciousness and certain kinds of dog owners. In about a quarter of fatal dog-bite cases, the dog owners were previously involved in illegal fighting. The dogs that bite people are, in many cases, socially isolated because their owners are socially isolated, and they are vicious because they have owners who want a vicious dog. The junkyard German shepherd — which looks as if it would rip your throat out — and the German-shepherd guide dog are the same breed. But they are not the same dog, because they have owners with different intentions. “A
”
”
Malcolm Gladwell (What the Dog Saw and Other Adventures)
“
With Britain preoccupied by World War II and the United States not yet in it, the quest to produce bulk penicillin moved to a U.S. government research facility in Peoria, Illinois. Scientists and other interested parties all over the Allied world were secretly asked to send in soil and mold samples. Hundreds responded, but nothing they sent proved promising. Then, two years after testing had begun, a lab assistant in Peoria named Mary Hunt brought in a cantaloupe from a local grocery store. It had a “pretty golden mold” growing on it, she recalled later. That mold proved to be two hundred times more potent than anything previously tested. The name and location of the store where Mary Hunt shopped are now forgotten, and the historic cantaloupe itself was not preserved: after the mold was scraped off, it was cut into pieces and eaten by the staff. But the mold lived on. Every bit of penicillin made since that day is descended from that single random cantaloupe. Within a year, American pharmaceutical companies were producing 100 billion units of penicillin a month.
”
”
Bill Bryson (The Body: A Guide for Occupants)
“
We pay a high price for this ingenious neural machinery, though, because the default mode network is responsible for mind-wandering. “Experience sampling”—which involves asking people about their mood and thoughts at random moments throughout the day—suggests that our minds wander from what we’re actually doing an amazing 30 percent to 50 percent of the time that we’re awake, and that this is often associated with feelings of unhappiness.6–8 According to Harvard psychologists Matthew Killingsworth and Daniel Gilbert, who created an iPhone app, Rate Your Happiness, to gather some of this data, fluctuations in happiness depend more on what we’re thinking than what we’re doing. Crucially, the results suggest that mind-wandering is the cause rather than the consequence of negative emotions. As the opening verse of the Dhammapada expresses it, “Our life is shaped by our mind; we become what we think. Suffering follows an evil thought as the wheels of a cart follow the oxen that draw it.”9 Less poetically, the psychologists concluded that “the ability to think about what is not happening is a cognitive achievement that comes at an emotional cost.” So, while
”
”
James Kingsland (Siddhartha's Brain: Unlocking the Ancient Science of Enlightenment)
“
way to respond to such a test is to give an ambiguous answer and then change the topic. For example, you could respond by saying - “It’s hard to know what people mean to say when you cannot see their body language, mannerisms, etc.” Never qualify yourself in your emails. If she mentions in an email that she loves the car that you are standing next to in one of your photographs, get her talking about why she loves it. Ask her about her interest in automobiles. You could even ask her if she has a need for speed. Do not begin talking about how you bought that car last year and it cost you a pretty penny. Do not talk about how it goes from zero to 60 miles per hour in under five seconds or how people always ask you to give them a joyride in it. Do not bite on her bait. A woman will do this to see if a man might slip up and show her exactly how desperate he is to get validation from other people, especially women. Sample questions Which of the following animals do you like? a. Komodo dragon (+5) b. Bonobo (+3) c. Dog (0) d. Cat (-1) Your friends would describe you as: a. Sweet and supportive (+5) b. Feisty, fun and sassy (+3) c. Strong and independent (0) d. Totally random (-1)
”
”
Strategic Lothario (Become Unrejectable: Know what women want and how to attract them to avoid rejection)
“
In the opinion of the A. C. Nielsen Company, the ideal radio research service must:
1. Measure the entertainment value of the program (probably best indicated by the size of the audience, bearing in mind the scope of the broadcasting facilities).
2. Measure the sales effectiveness of the program.
3. Cover the entire radio audience; that is:
a. All geographical sections.
b. All sizes of cities.
c. Farms.
d. All income classes.
e. All occupations.
f. All races.
g. All sizes of family.
h. Telephone and non-telephone homes, etc., etc.
4. Sample each of the foregoing sections of the audience in its proper portion; that is, there must be scientific, controlled sampling — not wholly random sampling.
5. Cover a sufficiently large sample to give reliable results.
6. Cover all types of programs.
7. Cover all hours of the day.
8. Permit complete analysis of each program; for example:
a. Variations in audience size at each instant during the broadcast.
b. Average duration of listening.
c. Detection of entertainment features or commercials which cause gain or loss of audience.
d. Audience turnover from day to day or week to week, etc., etc.
9. Reveal the true popularity and listening areas of each station and each network; that is, furnish an "Audit Bureau of Circulations" for radio.
A study was made by A. C. Nielson Company of all possible methods of meeting these specifications. After careful investigation, they decided to use a graphic recording instrument known as the "audimeter" for accurately measuring radio listening. . . .
The audimeter is installed in radio receivers in homes.
”
”
Judith C. Waller (Radio: The Fifth Estate)
“
Auditors are to the world of finance what anti-doping lab technicians are to the Tour de France; they both test thousands of random samples each year and find nothing wrong.
”
”
Donald Roper (THE TOTALLY ACCOUNTANT PERSON: A CERTIFIED NUMBER CRUNCHING NUTCASE!)
“
For some reason human beings did not see it that way. “People’s intuitions about random sampling appear to satisfy the law of small numbers, which asserts that the law of large numbers applies to small numbers as well,” Danny and Amos wrote.
”
”
Michael Lewis (The Undoing Project: A Friendship That Changed Our Minds)
“
So, here it is,” I say. “This is a real-time view of the sample from the integrated electron microscope.” I step back, and give them a view of the screen. It shows a mass of spheres. They move randomly through the frame, occasionally bouncing off of one another. In among them, though, are other shapes. These are far fewer, larger, and more irregular. “See the balls?” I continue. “Those are what should have been produced. They’re temperature-sensitive cages, with serotonin inside. Those other things, though—they’re not supposed to be there. They look a bit like big viruses, but their mass is much higher than you’d expect from a biological. I’m guessing these are what the crypted code tacked onto the configuration file is producing.” “I thought we’d decided that Hagerstown couldn’t have been a virus,” Gary says. “I didn’t say these are viruses,” I say. “I said the protein coat we can see looks like what you’d see on a virus. That’s just the delivery mechanism. I’d be willing to bet that these things bind to cells like a virus, but what’s inside them is definitely not RNA.
”
”
Edward Ashton (Three Days in April)
“
Extremistan, you will have trouble figuring out the average from any sample since it can depend so much on one single observation. The idea is not more difficult than that. In Extremistan, one unit can easily affect the total in a disproportionate way. In this world, you should always be suspicious of the knowledge you derive from data. This is a very simple test of uncertainty that allows you to distinguish between the two kinds of randomness. Capish?
”
”
Nassim Nicholas Taleb (The Black Swan: The Impact of the Highly Improbable (Incerto, #2))
“
in real-life situations we often make the opposite error: we assume that a sample or a series of trials is representative of the underlying situation when it is actually far too small to be reliable.
”
”
Leonard Mlodinow (The Drunkard's Walk: How Randomness Rules Our Lives)
“
Statistics to the layman can appear rather complex, but the concept behind what is used today is so simple that my French mathematician friends call it deprecatorily "cuisine". It is all based on one simple notion; the more information you have the more you are confident about the outcome. Now the problem: by how much? Common statistical method is based on the steady augmentation of the confidence level, in nonlinear proportion to the number of observations. That is, for an n time increase in the sample size, we increase our knowledge by the square root of n. Suppose i'm drawing from an urn containing red and black balls. My confidence level about the relative proportion of red and black balls after 20 drawings in not twice the one I have after 10 drawings; it's merely multiplied by the square root of 2.
”
”
Nassim Nicholas Taleb (Fooled by Randomness: The Hidden Role of Chance in Life and in the Markets (Incerto))
“
The health benefits from regular activity are widely acknowledged and can be achieved by any adult willing to complete the weekly target of just 150 minutes of moderate intensity physical activity. This is the equivalent of just under 22 minutes per day so we would hardly be surprised if most able-bodied adults achieved these targets. Yet, survey data in the United States suggests that only 49 per cent of adults achieve these minimum recommendations, although some states fare better. For example, 60 per cent of Alaskans meet the minimum recommendations compared to only 39 per cent of Louisianans. Adults in the United Kingdom appear to struggle even more, with only 35 per cent of men and women achieving the same 150 minute weekly target. To make matters worse, these percentages are all based on official government statistics which were obtained by asking random samples of people to estimate how much activity they usually do. Using these types of self-report questionnaires introduces considerable bias, especially when the respondents are aware that they don’t do as much exercise as they believe they should.
A better way to check how much exercise adults really do is to use electronic sensors worn on the body to record the number of minutes spent performing physical activity of moderate intensity or above. Using this more accurate measurement technique, only 6 per cent of men and 4 per cent of women in the United Kingdom actually achieved the minimum weekly amounts of recommended physical activity. Similar results have been revealed in other Western countries, including the United States. If most adults believe that regular exercise is important, then the low participation statistics suggest that it must be difficult to achieve in practice.
”
”
Jim Flood (The Complete Guide to Indoor Rowing (Complete Guides))
“
Mathematicians of probability give that a fancy name: ergodicity. It means, roughly, that (under certain conditions) very long sample paths would end up resembling each other. The properties of a very, very long sample path would be similar to the Monte Carlo properties of an average of shorter ones.
”
”
Nassim Nicholas Taleb (Fooled by Randomness: The Hidden Role of Chance in Life and in the Markets (Incerto, #1))
“
The null hypothesis of normality is that the variable is normally distributed: thus, we do not want to reject the null hypothesis. A problem with statistical tests of normality is that they are very sensitive to small samples and minor deviations from normality. The extreme sensitivity of these tests implies the following: whereas failure to reject the null hypo thesis indicates normal distribution of a variable, rejecting the null hypothesis does not indicate that the variable is not normally distributed. It is acceptable to consider variables as being normally distributed when they visually appear to be so, even when the null hypothesis of normality is rejected by normality tests. Of course, variables are preferred that are supported by both visual inspection and normality tests. In Greater Depth … Box 12.1 Why Normality? The reasons for the normality assumption are twofold: First, the features of the normal distribution are well-established and are used in many parametric tests for making inferences and hypothesis testing. Second, probability theory suggests that random samples will often be normally distributed, and that the means of these samples can be used as estimates of population means. The latter reason is informed by the central limit theorem, which states that an infinite number of relatively large samples will be normally distributed, regardless of the distribution of the population. An infinite number of samples is also called a sampling distribution. The central limit theorem is usually illustrated as follows. Assume that we know the population distribution, which has only six data elements with the following values: 1, 2, 3, 4, 5, or 6. Next, we write each of these six numbers on a separate sheet of paper, and draw repeated samples of three numbers each (that is, n = 3). We
”
”
Evan M. Berman (Essential Statistics for Public Managers and Policy Analysts)
“
observation is simply an observation for which a specified outcome has not yet occurred. Assume that data exist from a random sample of 100 clients who are seeking, or have found, employment. Survival analysis is the statistical procedure for analyzing these data. The name of this procedure stems from its use in medical research. In clinical trials, researchers want to know the survival (or disease) rate of patients as a function of the duration of their treatment. For patients in the middle of their trial, the specified outcome may not have occurred yet. We obtain the following results (also called a life table) from analyzing hypothetical data from welfare records (see Table 18.3). In the context shown in the table, the word terminal signifies that the event has occurred. That is, the client has found employment. At start time zero, 100 cases enter the interval. During the first period, there are no terminal cases and nine censored cases. Thus, 91 cases enter the next period. In this second period, 2 clients find employment and 14 do not, resulting in 75 cases that enter the following period. The column labeled “Cumulative proportion surviving until end of interval” is an estimate of probability of surviving (not finding employment) until the end of the stated interval.5 The column labeled “Probability density” is an estimate of the probability of the terminal event occurring (that is, finding employment) during the time interval. The results also report that “the median survival time is 5.19.” That is, half of the clients find employment in 5.19 weeks. Table 18.2 Censored Observations Note: Obs = observations (clients); Emp = employment; 0 = has not yet found employment; 1 = has found employment. Table 18.3 Life Table Results
”
”
Evan M. Berman (Essential Statistics for Public Managers and Policy Analysts)
“
Very innovative companies, such a Twitter, know how important this type of cross-pollination is to creativity in their businesses, and they make an effort to hire people with unusual skills, knowing that diversity of thinking will certainly influence the development of their products. According to Elizabeth Weil, the head of organizational culture at Twitter, a random sampling of people at the company would reveal former rock stars, a Rubik’s cube champion, a world-class cyclist, and a professional juggler. She said that the hiring practices at Twitter guarantee that all employees are bright and skilled at their jobs, but are also interested in other unrelated pursuits. Knowing this results in random conversations between employees in the elevator, at lunch, and in the hallways. Shared interests surface, and the web of people becomes even more intertwined. These unplanned conversations often lead to fascinating new ideas. Elizabeth is a great example herself; she is a top ultramarathon runner, professional designer, and former venture capitalist. Although these skills aren’t required in her day-to-day work at Twitter, they naturally influence the ideas she generates. Her artistic talents have deeply influenced the ways Elizabeth builds the culture at Twitter. For instance, whenever a new employee starts, she designs and prints a beautiful handmade welcome card on her 1923 antique letterpress.
”
”
Tina Seelig (inGenius: A Crash Course on Creativity)
“
it is not uncommon for experts in DNA analysis to testify at a criminal trial that a DNA sample taken from a crime scene matches that taken from a suspect. How certain are such matches? When DNA evidence was first introduced, a number of experts testified that false positives are impossible in DNA testing. Today DNA experts regularly testify that the odds of a random person’s matching the crime sample are less than 1 in 1 million or 1 in 1 billion. With those odds one could hardly blame a juror for thinking, throw away the key. But there is another statistic that is often not presented to the jury, one having to do with the fact that labs make errors, for instance, in collecting or handling a sample, by accidentally mixing or swapping samples, or by misinterpreting or incorrectly reporting results. Each of these errors is rare but not nearly as rare as a random match. The Philadelphia City Crime Laboratory, for instance, admitted that it had swapped the reference sample of the defendant and the victim in a rape case, and a testing firm called Cellmark Diagnostics admitted a similar error.20 Unfortunately, the power of statistics relating to DNA presented in court is such that in Oklahoma a court sentenced a man named Timothy Durham to more than 3,100 years in prison even though eleven witnesses had placed him in another state at the time of the crime. It turned out that in the initial analysis the lab had failed to completely separate the DNA of the rapist and that of the victim in the fluid they tested, and the combination of the victim’s and the rapist’s DNA produced a positive result when compared with Durham’s. A later retest turned up the error, and Durham was released after spending nearly four years in prison.21 Estimates of the error rate due to human causes vary, but many experts put it at around 1 percent. However, since the error rate of many labs has never been measured, courts often do not allow testimony on this overall statistic. Even if courts did allow testimony regarding false positives, how would jurors assess it? Most jurors assume that given the two types of error—the 1 in 1 billion accidental match and the 1 in 100 lab-error match—the overall error rate must be somewhere in between, say 1 in 500 million, which is still for most jurors beyond a reasonable doubt. But employing the laws of probability, we find a much different answer. The way to think of it is this: Since both errors are very unlikely, we can ignore the possibility that there is both an accidental match and a lab error. Therefore, we seek the probability that one error or the other occurred. That is given by our sum rule: it is the probability of a lab error (1 in 100) + the probability of an accidental match (1 in 1 billion). Since the latter is 10 million times smaller than the former, to a very good approximation the chance of both errors is the same as the chance of the more probable error—that is, the chances are 1 in 100. Given both possible causes, therefore, we should ignore the fancy expert testimony about the odds of accidental matches and focus instead on the much higher laboratory error rate—the very data courts often do not allow attorneys to present! And so the oft-repeated claims of DNA infallibility are exaggerated.
”
”
Leonard Mlodinow (The Drunkard's Walk: How Randomness Rules Our Lives)
“
2012 My Response to Andy Dearest Andy, It would be splendid to revisit the canal city and reminisce of our time at the Falcon’s Den – especially that fateful evening when I ended up at Dr. Fahrib’s private hospital. I have no idea why I blacked out. I recalled the vivid dream I experienced while comatose. You and Zac were in such a panic, worried if I’d ever wake. LOL! The final thing I remember in ARGOS before I collapsed was the unpleasant smell within the ‘bathroom’. Quick-witted Zac ushered me to the open courtyard for air. We weren’t quick enough; I fainted just as we reached the doorway. I was out like a light. I remember you guys trying to revive me. I didn’t come around. You carried me back to the Falcon’s Den hurriedly. Thank Allah, the good doctor was home. He was already asleep, but you woke him for help. I faintly recall inhaling some kind of smelling salt. It didn’t help. Fahrib had to rush me to his private clinic for urgent care. I remained unconscious until the first ray of light the following day. When I finally came around, I was hooked to an IV. The doctor couldn’t diagnose the problem until he took a sample of my urine and discovered LSD in my system. The ARGOS pineapple juice had tasted strange. I suspect the barman had added several drops of the hallucinogenic drug to my drink. I wouldn’t be surprised if he did this to his customers randomly. But why didn’t the rest of our group fall ill? Have you any idea…?
”
”
Young (Turpitude (A Harem Boy's Saga Book 4))
“
I suggest this passage from the German “philosopher” (this passage was detected, translated, and reviled by Karl Popper): Sound is the change in the specific condition of segregation of the material parts, and in the negation of this condition; merely an abstract or an ideal ideality, as it were, of that specification. But this change, accordingly, is itself immediately the negation of the material specific subsistence; which is, therefore, real ideality of specific gravity and cohesion, i.e.—heat. The heating up of sounding bodies, just as of beaten and or rubbed ones, is the appearance of heat, originating conceptually together with sound. Even a Monte Carlo engine could not sound as random as the great philosophical master thinker (it would take plenty of sample runs to get the mixture of “heat” and “sound.” People call that philosophy and frequently finance it with taxpayer subsidies!
”
”
Nassim Nicholas Taleb (Fooled by Randomness: The Hidden Role of Chance in Life and in the Markets (Incerto, #1))
“
A pair of researchers named Kristen Schilt and Matthew Wiswall wanted to systematically examine what happens to the salaries of people who switched gender as adults. It is not quite the experiment we proposed above—after all, the set of folks who switch gender aren’t exactly a random sample, nor are they the typical woman or man before or after—but still, the results are intriguing. Schilt and Wiswall found that women who become men earn slightly more money after their gender transitions, while men who become women make, on average, nearly one-third less than their previous wage.
”
”
Steven D. Levitt (SuperFreakonomics: Global Cooling, Patriotic Prostitutes And Why Suicide Bombers Should Buy Life Insurance)
“
Consider a 2012 study, led by psychologists Wilhelm Hofmann and Roy Baumeister, that outfitted 205 adults with beepers that activated at randomly selected times (this is the experience sampling method discussed in Part 1). When the beeper sounded, the subject was asked to pause for a moment to reflect on desires that he or she was currently feeling or had felt in the last thirty minutes, and then answer a set of questions about these desires. After a week, the researchers had gathered more than 7,500 samples. Here’s the short version of what they found: People fight desires all day long. As Baumeister summarized in his subsequent book, Willpower (co-authored with the science writer John Tierney): “Desire turned out to be the norm, not the exception.” The five most common desires these subjects fought include, not surprisingly, eating, sleeping, and sex. But the top five list also included desires for “taking a break from [hard] work… checking e-mail and social networking sites, surfing the web, listening to music, or watching television.” The lure of the Internet and television proved especially strong: The subjects succeeded in resisting these particularly addictive distractions only around half the time. These results are bad news for this rule’s goal of helping you cultivate a deep work habit. They tell us that you can expect to be bombarded with the desire to do anything but work deeply throughout the day,
”
”
Cal Newport (Deep Work: Rules for Focused Success in a Distracted World)
“
The challenge was to ensure that the device - which worked by sampling “white noise” generated by a neon gas discharge tube - was genuinely random. The key (without going into too much detail) was to use two such devices for each number that needed to be generated, subtracting the output from the second from the output from the first, in case one of them went wrong. This meant - we thought - that 32 devices were needed to generate the requisite 16-figure number. Then our youngest lab boy suggested that we could achieve the same result with just 16 devices, pairing the first with the second, the second with the third, the third with the fourth, and so on, instead of having 16 discrete pairs. At a stroke, the cost was halved. The genius of Tommy Flowers - or part of it - was to run his team in a way that allowed such suggestions to be both made and heard
”
”
Stephanie Shirley (LET IT GO : The Entrepreneur Turned Ardent Philanthropist)
“
Who comes to writers’ conferences?” you ask. A random sample of twenty students will contain six recent divorcées, three wives in middle life, five schoolteachers of no particular age or sex, two foxy grandmas, one sweet old widower with true tales to tell about railroading in Idaho, one real writer, one not merely angry but absolutely furious young man, and one physician with forty years’ worth of privileged information that he wants to sell to the movies for a blue million.
”
”
Kurt Vonnegut Jr. (Wampeters, Foma & Granfalloons: (Opinions))
“
The auditors reported a scene of pure chaos. “Drugs were given to the wrong babies, documents were altered, and there was infrequent follow-up, even though one third of the mothers were marked ‘abnormal’ in their charts at discharge. The infants who did receive follow-up care were, in many cases, small and alarmingly underweight. ‘It was thought to be likely that some, perhaps many, of these infants had serious health problems.’”16 When Westat chose a random sample of forty-three of those infants to examine, all of them had “adverse events” twelve months after the study terminated. Only eleven of them were HIV positive.17 When Westat confronted Dr. Jackson’s researchers with study discrepancies, they admitted that they routinely applied more lenient standards for their Black Ugandan subjects than FDA rules required for US safety studies.18 The PIs admitted to systematically downgrading standardized definitions of serious adverse events to adapt to “local standards.” Injuries that researchers would score as “serious” or “deadly” if they happened to white Americans became “minor” injuries when Black Africans were the victims. Under their relaxed rubric, clinical trials staff scored “life-threatening” injuries as “not serious.” When they reported them at all, NIAID classified mortalities among its African volunteers as “serious adverse events,” rather than “death.” NIAID’s Ugandan team had entirely neglected to report thousands of adverse events and at least fourteen deaths.19
”
”
Robert F. Kennedy Jr. (The Real Anthony Fauci: Bill Gates, Big Pharma, and the Global War on Democracy and Public Health)
“
Mathematicians of probability give that a fancy name: ergodicity. It means, roughly, that (under certain conditions) very long sample paths would end up resembling each other. The properties of a very, very long sample path would be similar to the Monte Carlo properties of an average of shorter ones. The janitor in Chapter 1 who won the lottery, if he lived one thousand years, cannot be expected to win more lotteries. Those who were unlucky in life in spite of their skills would eventually rise. The lucky fool might have benefited from some luck in life; over the longer run he would slowly converge to the state of a less-lucky idiot. Each one would revert to his long-term properties.
”
”
Nassim Nicholas Taleb (Fooled by Randomness: The Hidden Role of Chance in Life and in the Markets (Incerto, #1))
“
Data could be misleading sometimes, too. In the early days, Dropbox was growing so fast that it was often hard to do analyses on what types of content people were putting in their folders. One of the simplest analyses was to randomly sample snapshots of folders, and count the file extensions. Perhaps it is not surprising to some that the most popular files were photos—lots and lots of photos, especially on mobile. Combined with the natural virality of this media type, Dropbox embarked on a road map of photos-related features, culminating in the launch of Carousel, a separate app to let consumers manage and view their photos on Dropbox. It did okay, but underperformed relative to expectations and was eventually shut down so that the company could invest in what is now its core focus: businesses.
”
”
Andrew Chen (The Cold Start Problem: How to Start and Scale Network Effects)
“
The Monte Carlo tree search method is naturally suited to non-deterministic settings such as card games or backgammon. Minimax trees are not well suited to non-deterministic settings because of the inability to predict the opponent’s moves while building the tree. On the other hand, Monte Carlo tree search is naturally suited to handling such settings, since the desirability of moves is always evaluated in an expected sense. The randomness in the game can be naturally combined with the randomness in move sampling in order to learn the expected outcomes from each choice of move.
”
”
Charu C. Aggarwal (Artificial Intelligence: A Textbook)
“
I visualized opening accounts as planting acorns in the hope of getting a crop of oak trees. Only these were strange acorns. They could lie dormant for months or years, perhaps forever; but once in a while, at random, a mighty tree of money would explode out of the ground. Was this “farm” worth operating? Our hundreds of accounts took capital away from other investments. Paid low interest rates on our passbooks and certificates of deposit (CDs), we sacrificed an expected 10 to 15 percent differential to maintain our accounts. We also had expenses and the so-called opportunity cost. Fortunately, Judy McCoy in my office managed the project competently and efficiently. The harvest from our crop of S&L accounts sometimes netted a million dollars in a year. The game has slowly wound down over the last two decades. Mutual S&Ls have converted, leaving fewer opportunities. The gain has also diminished because more people have opened accounts, thus spreading the profits among more players. Investors also have posted larger balances in CDs, savings accounts, and checking accounts in the hope of being allocated more shares in a future conversion. Tying up more capital increases the cost to stay in the game. Our profits have been dwindling. Currently we’re keeping our old accounts but are spending less effort in trying to open new ones. Even so, a quarter of a century after we began opening accounts, 2014 was a good year.
”
”
Edward O. Thorp (A Man for All Markets: From Las Vegas to Wall Street, How I Beat the Dealer and the Market)
“
The ‘quantitative revolution’ in geography required the discipline to adopt an explicitly scientific approach, including numerical and statistical methods, and mathematical modelling, so ‘numeracy’ became another necessary skill. Its immediate impact was greatest on human geography as physical geographers were already using these methods. A new lexicon encompassing the language of statistics and its array of techniques entered geography as a whole. Terms such as random sampling, correlation, regression, tests of statistical significance, probability, multivariate analysis, and simulation became part both of research and undergraduate teaching. Correlation and regression are procedures to measure the strength and form, respectively, of the relationships between two or more sets of variables. Significance tests measure the confidence that can be placed in those relationships. Multivariate methods enable the analysis of many variables or factors simultaneously – an appropriate approach for many complex geographical data sets. Simulation is often linked to probability and is a set of techniques capable of extrapolating or projecting future trends.
”
”
John A. Matthews (Geography: A Very Short Introduction)
“
When teaching various psychology courses over time, I’ve conducted informal polls of my students regarding what they would prefer in a situation similar to Alvin Ford’s. About two-thirds to three-fourths have preferred the delusion, at least when queried on the fly. Although my classes have not exactly comprised a random sample of the population at large, their position corroborates my hunch that most Americans prefer the delusion over the truth.
”
”
David Landers (Optimistic Nihilism: A Psychologist's Personal Story & (Biased) Professional Appraisal of Shedding Religion)
“
True randomness doesn’t play favorites. It’s just as likely to give you fifty heads in a row than an equal split of heads and tails. Then again we don’t have a truly random sample, not with us holding ten out of fifty-two cards. Whatever he picks up won’t be any of these. I bite my lip, running through numbers in my head, determined to make use of what little data I have, running simulations in these precious few seconds. “God, you’re incredible,” he says, sounding reverent. Only then do I realize I’d been lost in thought. And he’s staring at me, intent and for once serious. Brennan had looked at me that way and called me pretty. Damon looked at me like I was some other creature, more than a human—a goddess.
”
”
Skye Warren (The King (Masterpiece Duet, #1))
“
One reason for this “dirty little secret” is the positive publication bias described in Chapter 7. If researchers and medical journals pay attention to positive findings and ignore negative findings, then they may well publish the one study that finds a drug effective and ignore the nineteen in which it has no effect. Some clinical trials may also have small samples (such as for a rare diseases), which magnifies the chances that random variation in the data will get more attention than it deserves. On top of that, researchers may have some conscious or unconscious bias, either because of a strongly held prior belief or because a positive finding would be better for their career.
”
”
Charles Wheelan (Naked Statistics: Stripping the Dread from the Data)
“
One reason for this “dirty little secret” is the positive publication bias described in Chapter 7. If researchers and medical journals pay attention to positive findings and ignore negative findings, then they may well publish the one study that finds a drug effective and ignore the nineteen in which it has no effect. Some clinical trials may also have small samples (such as for a rare diseases), which magnifies the chances that random variation in the data will get more attention than it deserves. On top of that, researchers may have some conscious or unconscious bias, either because of a strongly held prior belief or because a positive finding would be better for their career. (No one ever gets rich or famous by proving what doesn’t cure cancer.)
”
”
Charles Wheelan (Naked Statistics: Stripping the Dread from the Data)
“
labs make errors, for instance, in collecting or handling a sample, by accidentally mixing or swapping samples, or by misinterpreting or incorrectly reporting results.
”
”
Leonard Mlodinow (The Drunkard's Walk: How Randomness Rules Our Lives)
“
Per your mission instructions, you can reject the null hypothesis that this bus contains a random sample of 60 Changing Lives study participants at the .05 significance level. This means (1) the mean weight on the bus falls into a range that we would expect to observe only 5 times in 100 if the null hypothesis were true and this were really a bus full of Changing Lives passengers; (2) you can reject the null hypothesis at the .05 significance level; and (3) on average, 95 times out of 100 you will have correctly rejected the null hypothesis, and 5 times out of 100 you will be wrong, meaning that you have concluded that this is not a bus of Changing Lives participants, when in fact it is. This sample of Changing Lives folks just happens to have a mean weight that is particularly high or low relative to the mean for the study participants overall.
”
”
Charles Wheelan (Naked Statistics: Stripping the Dread from the Data)
“
Here is a quick intuitive example. Suppose your null hypothesis is that male professional basketball players have the same mean height as the rest of the adult male population. You randomly select a sample of 50 professional basketball players and a sample of 50 men who do not play professional basketball. Suppose the mean height of your basketball sample is 6 feet 7 inches, and the mean height of the non–basketball players is 5 feet 10 inches (a 9-inch difference). What is the probability of observing such a large difference in mean height between the two samples if in fact there is no difference in average height between professional basketball players and all other men in the overall population? The nontechnical answer: very, very, very low.* The autism research paper has the same basic methodology
”
”
Charles Wheelan (Naked Statistics: Stripping the Dread from the Data)
“
With sufficiently much English text we can get pretty good estimates not just for probabilities of single letters or pairs of letters (2-grams), but also for longer runs of letters. And if we generate “random words” with progressively longer n-gram probabilities, we see that they get progressively “more realistic”: But let’s now assume—more or less as ChatGPT does—that we’re dealing with whole words, not letters. There are about 40,000 reasonably commonly used words in English. And by looking at a large corpus of English text (say a few million books, with altogether a few hundred billion words), we can get an estimate of how common each word is. And using this we can start generating “sentences”, in which each word is independently picked at random, with the same probability that it appears in the corpus. Here’s a sample of what we get: Not surprisingly, this is nonsense. So how can we do better? Just like with letters, we can start taking into account not just probabilities for single words but probabilities for pairs or longer n-grams of words.
”
”
Stephen Wolfram (What Is ChatGPT Doing... and Why Does It Work?)
“
If there are 60,000 blue marbles and 40,000 red marbles in a giant urn, then the most likely composition of a sample of 100 marbles drawn randomly from the urn would be 60 blue marbles and 40 red marbles.
”
”
Charles Wheelan (Naked Statistics: Stripping the Dread from the Data)
“
some might have 62 blue marbles and 38 red marbles, or 58 blue and 42 red. But the chances of drawing any random sample that deviates hugely from the composition of marbles in the urn are very, very low.
”
”
Charles Wheelan (Naked Statistics: Stripping the Dread from the Data)
“
Why is it that able, public-spirited people produce such different results according to whether they operate in the political or the economic market? Why is it that if a random sample of the people who read this essay and are not at present in Washington were to replace those who are in Washington, our policies would very likely not be improved? That is the real puzzle for me.
”
”
Milton Friedman (Why Government Is the Problem (Essays in Public Policy Book 39))
“
sample size. Sample sizes can be calculated not only for randomized trials but
”
”
Leon Gordis (Epidemiology)
“
So for a survey of 1,000 people (the industry standard), the margin of error is generally quoted as ± 3%:fn8 if 400 of them said they preferred coffee, and 600 of them said they preferred tea, then you could roughly estimate the underlying percentage of people in the population who prefer coffee as 40 ± 3%, or between 37% and 43%. Of course, this is only accurate if the polling company really did take a random sample, and everyone replied, and they all had an opinion either way and they all told the truth. So although we can calculate margins of error, we must remember that they only hold if our assumptions are roughly correct. But can we rely on these assumptions?
”
”
David Spiegelhalter (The Art of Statistics: Learning from Data)
“
If a random sampling of one thousand American Christians were taken today, the majority would define faith as belief in the existence of God. In earlier times it did not take faith to believe that God existed—almost everybody took that for granted. Rather, faith had to do with one’s relationship to God—whether one trusted in God. The difference between faith as “belief in something that may or may not exist” and faith as “trusting in God” is enormous. The first is a matter of the head, the second a matter of the heart. The first can leave us unchanged; the second intrinsically brings change.7
”
”
Brennan Manning (The Ragamuffin Gospel: Good News for the Bedraggled, Beat-Up, and Burnt Out)
“
• If it’s really that important, it’s something you can define. If it’s something you think exists at all, it’s something you’ve already observed somehow. • If it’s something important and something uncertain, you have a cost of being wrong and a chance of being wrong. • You can quantify your current uncertainty with calibrated estimates. • You can compute the value of additional information by knowing the “threshold” of the measurement where it begins to make a difference compared to your existing uncertainty. • Once you know what it’s worth to measure something, you can put the measurement effort in context and decide on the effort it should take. • Knowing just a few methods for random sampling, controlled experiments, or even merely improving on the judgments of experts can lead to a significant reduction in uncertainty.
”
”
Douglas W. Hubbard (How to Measure Anything: Finding the Value of Intangibles in Business)
“
Social Justice approaches that focus solely on group identity and neglect individuality and universality are doomed to fail for the simple reasons that people are individuals and share a common human nature. Identity politics is not a path to empowerment. There is no “unique voice of color” or of women or of trans, gay, disabled, or fat people. Even a relatively small random sample drawn from any of those groups will reveal widely varying individual views. This does not negate the likelihood that prejudice still exists and that the people who experience it are the most likely to be aware of it. We still need to “listen and consider,” but we need to listen to and consider a variety of experiences and views from members of oppressed groups, not just a single one that has been arbitrarily labeled “authentic” because it represents the view essentialized by Theory.
”
”
Helen Pluckrose (Cynical Theories: How Activist Scholarship Made Everything about Race, Gender, and Identity—and Why This Harms Everybody)
“
When both males and females were considered as a single group, the impact of having a roommate classified as a frequent or occasional precollege drinker was to reduce a student’s end-of-year GPA by more than a tenth of a point on a four-point scale. But the effect was dramatically larger for males than for females. Relative to males whose roommates were nondrinkers, those whose roommates were frequent precollege drinkers had end-of-year GPAs that were 0.28 lower; for those whose roommates were occasional drinkers, the corresponding deficit was almost as great, 0.26 lower. These effects are comparable to the effect of a student’s own high school GPA being lower by half a point, or to having scored fifty points lower on the Scholastic Aptitude Test.27 By far the most dramatic impact observed in this study was for males who were themselves frequent precollege drinkers and were randomly assigned to a roommate who was also a frequent precollege drinker. Relative to the overall sample GPA, these males had end-of-year GPAs that were almost a full point lower.28
”
”
Robert H. Frank (Under the Influence: Putting Peer Pressure to Work)
“
The story of the Eridania Basin and the possible scientific promise it holds was pieced together by using the results from different instruments on different spacecraft over many years, spanning several scientific disciplines: geology, chemistry, spectroscopy, laser altitude ranging and photography. The estimate of the age of the surface required the Apollo lunar rock samples from 50 years ago, and radiometric dating techniques which require an understanding of nuclear physics. The estimate of the age of the surface requires a model of the entire Solar System in order to interpret the measured crater density, which illustrates another important idea. The Solar System is a system; no planet is an island; no planet can be understood in isolation, just as the structure of any one living thing on Earth cannot be understood in isolation. Organisms are a product of evolution by natural selection, the interaction of the expression of genetic mutations and mixing with other organisms, in the ecosystem and the wider environment. The planets formed in a chaotic maelstrom from motions as random as the impact of a cosmic ray on a strand of primordial DNA, and whatever worlds emerged from the chaos have had their histories shaped profoundly by their mutual interactions throughout their evolution; the Late Heavy Bombardment is a beautiful example.
”
”
Brian Cox (The Planets)
“
many of the alien plants that have succeeded in North America are not a random sample of all plants that evolved elsewhere, but rather are a subset that were imported specifically because of their unpalatability to insects
”
”
Douglas Tallamy
“
The word sample stresses that one sees only one realization among a collection of possible ones. Now, a sample path can be either deterministic or random, which brings the next distinction.
”
”
Nassim Nicholas Taleb (Fooled by Randomness: The Hidden Role of Chance in Life and in the Markets (Incerto, #1))
“
The Mysterious Letter You get an anonymous letter on January 2nd informing you that the market will go up during the month. It proves to be true, but you disregard it owing to the well known January effect (stocks have gone up historically during January). Then you receive another one on Feb 1st telling you that the market will go down. Again, it proves to be true. Then you get another letter on March 1st –same story. By July you are intrigued by the prescience of the anonymous person until you are asked to invest in a special offshore fund. You pour all your savings into it. Two months later, your money is gone. You go spill your tears on your neighbor's shoulder and he tells you that he remembers that he received two such mysterious letters. But the mailings stopped at the second letter. He recalls that the first one was correct in its prediction, the other incorrect. What happened? The trick is as follows. The con operator pulls 10,000 names out of a phone book. He mails a bullish letter to one half of the sample, and a bearish one to the other half. The following month he selects the names of the persons to whom he mailed the letter whose prediction turned out to be right, that is, 5000 names. The next month he does the same with the remaining 2500 names, until the list narrows down to 500 people. Of these there will be 200 victims. An investment in a few thousand dollars worth of postage stamps will turn into several million.
”
”
Fooled By Randomness Nassim Taleb
“
Once a person is in the sample, you must pursue that person with relentless dedication to get his or her response. Any substitution violates the randomness of the sample.
”
”
Philip N. Meyer (Precision Journalism: A Reporter's Introduction to Social Science Methods)
“
Selection Bias: basically, that your inferences will be biased if you use a non-random sample and pretend that it’s random.
”
”
Uri Bram (Thinking Statistically)
“
People are in flow relatively rarely in daily life.19 Sampling people’s moods at random reveals that most of the time people are either stressed or bored, with only occasional periods of flow; only about 20 percent of people have flow moments at least once a day. Around 15 percent of people never enter a flow state during a typical day.
”
”
Anonymous
“
Ken used obfuscation. He kept random people’s DNA samples—hair, spare tissue, saliva, whatever—in Tupperware containers. Sometimes he found the samples in public restrooms, disgusting as that might sound. One very good spot was at summer camp. Many of the counselors used the disposable razors, which he could easily swipe. Urinals provided pubic hair. Showers gave you more. With
”
”
Harlan Coben (Stay Close)
“
Many natural phenomena, such as the heights of a group of people or the lengths of their middle fingers, fall into a normal distribution. As Galton suggested, two conditions are necessary for observations to be distributed normally, or symmetrically, around their average. First, there must be as large a number of observations as possible. Second, the observations must be independent, like rolls of the dice. Order is impossible to find unless disorder is there first. People can make serious mistakes by sampling data that are not independent. In 1936, a now-defunct magazine called the Literary Digest took a straw vote to predict the outcome of the forthcoming presidential election between Franklin Roosevelt and Alfred Landon. The magazine sent about ten million ballots in the form of returnable postcards to names selected from telephone directories and automobile registrations. A high proportion of the ballots were returned, with 59% favoring Landon and 41% favoring Roosevelt. On Election Day, Landon won 39% of the vote and Roosevelt won 61%. People who had telephones and drove automobiles in the mid-1930s hardly constituted a random sample of American voters: their voting preferences were all conditioned by an environment that the mass of people at that time could not afford.
”
”
Peter L. Bernstein (Against the Gods: The Remarkable Story of Risk)