“
The point is, the brain talks to itself, and by talking to itself changes its perceptions. To make a new version of the not-entirely-false model, imagine the first interpreter as a foreign correspondent, reporting from the world. The world in this case means everything out- or inside our bodies, including serotonin levels in the brain. The second interpreter is a news analyst, who writes op-ed pieces. They read each other's work. One needs data, the other needs an overview; they influence each other. They get dialogues going.
INTERPRETER ONE: Pain in the left foot, back of heel.
INTERPRETER TWO: I believe that's because the shoe is too tight.
INTERPRETER ONE: Checked that. Took off the shoe. Foot still hurts.
INTERPRETER TWO: Did you look at it?
INTERPRETER ONE: Looking. It's red.
INTERPRETER TWO: No blood?
INTERPRETER ONE: Nope.
INTERPRETER TWO: Forget about it.
INTERPRETER ONE: Okay.
Mental illness seems to be a communication problem between interpreters one and two.
An exemplary piece of confusion.
INTERPRETER ONE: There's a tiger in the corner.
INTERPRETER TWO: No, that's not a tiger- that's a bureau.
INTERPRETER ONE: It's a tiger, it's a tiger!
INTERPRETER TWO: Don't be ridiculous. Let's go look at it.
Then all the dendrites and neurons and serotonin levels and interpreters collect themselves and trot over to the corner.
If you are not crazy, the second interpreter's assertion, that this is a bureau, will be acceptable to the first interpreter. If you are crazy, the first interpreter's viewpoint, the tiger theory, will prevail.
The trouble here is that the first interpreter actually sees a tiger. The messages sent between neurons are incorrect somehow. The chemicals triggered are the wrong chemicals, or the impulses are going to the wrong connections. Apparently, this happens often, but the second interpreter jumps in to straighten things out.
”
”
Susanna Kaysen (Girl, Interrupted)
“
good data organized effectively was the most important commodity for any analyst.
”
”
Tom Clancy (Command Authority)
“
As AI technology matures paired with the continued implementation of Blockchain technology, I think we'll see a lot of analyst, educator and lawyer jobs for example be repositioned into the consulting industry. Consulting will pretty much be a broad category for all jobs involved in the gathering, utilization and sale of actionable data.
”
”
Hendrith Vanlon Smith Jr.
“
The other buzzword that epitomizes a bias toward substitution is “big data.” Today’s companies have an insatiable appetite for data, mistakenly believing that more data always creates more value. But big data is usually dumb data. Computers can find patterns that elude humans, but they don’t know how to compare patterns from different sources or how to interpret complex behaviors. Actionable insights can only come from a human analyst (or the kind of generalized artificial intelligence that exists only in science fiction).
”
”
Peter Thiel (Zero to One: Notes on Startups, or How to Build the Future)
“
Big data analyst Seth Stephens-Davidowitz reports in the New York Times that Google searches for “sexless marriage” outnumber searches related to any other marital issue.3
”
”
Esther Perel (The State of Affairs: Rethinking Infidelity)
“
Excel suffers from an image problem. Most people assume that spreadsheet programs such as Excel are intended for accountants, analysts, financiers, scientists, mathematicians, and other geeky types. Creating a spreadsheet, sorting data, using functions, and making charts seems daunting, and best left to the nerds.
”
”
Ian Lamont (Excel Basics In 30 Minutes)
“
NSA analyst touches something in the database,
”
”
Bruce Schneier (Data and Goliath: The Hidden Battles to Collect Your Data and Control Your World)
“
Without hypothesis tests, you risk drawing the wrong conclusions and making bad decisions. That can be costly, either in business dollars or for your reputation as an analyst or scientist.
”
”
Jim Frost (Hypothesis Testing: An Intuitive Guide for Making Data Driven Decisions)
“
It was a scientific success, bringing back data enough to keep the analysts busy for years… but there was no glib, slick way to explain the full meaning of its observations in layman’s terms. In public relations the mission was a failure; the public, seeking to understand on their own terms, looked for material benefit, treasure, riches, dramatic findings.
”
”
C.J. Cherryh (Downbelow Station (The Company Wars, #1))
“
Like the story of the steam drill against John Henry, the machine will be victorious because it doesn’t get tired and keeps on going long after a human worker will have dropped dead from exhaustion. The modern-day steam drill is likely to be an AI system, and John Henry is played by the planner, doctor, analyst, stockbroker or accountant who believes that they can process more data and crunch more numbers than the new machine overlords. They can’t.
”
”
Sean A. Culey (Transition Point: From Steam to the Singularity)
“
Once the NSA embraced the Internet and a drift-net style of data collection, the agency was transformed. The bulk collection of phone and e-mail metadata, both inside the United States and around the world, has now become one of the NSA’s core missions. The agency’s analysts have discovered that they can learn far more about people by tracking their daily digital footprints through their metadata than they could ever learn from actually eavesdropping on their conversations. What’s more, phone and e-mail logging data comes with few legal protections, making it easy for the NSA to access.
”
”
James Risen (Pay Any Price: Greed, Power, and Endless War)
“
The most-studied evidence, by the greatest number of economists, concerns what is called short-term dependence. This refers to the way price levels or price changes at one moment can influence those shortly afterwards-an hour, a day, or a few years, depending on what you consider "short." A "momentum" effect is at work, some economists theorize: Once a stock price starts climbing, the odds are slightly in favor of it continuing to climb for a while longer. For instance, in 1991 Campbell Harvey of Duke- he of the CFO study mentioned earlier-studied stock exchanges in sixteen of the world's largest economies. He found that if an index fell in one month, it had slightly greater odds of falling again in the next moth, or, if it had risen, greater odds of continuing to rise. Indeed, the data show, the sharper the move in the first, the more likely is is that the price trend will continue into the next month, although at a slower rate. Several other studies have found similar short-term trending in stock prices. When major news about a company hits the wires, the stock will react promptly-but it may keep on moving for the next few days as the news spreads, analysts study it, and more investors start to act upon it.
”
”
Benoît B. Mandelbrot (The (Mis)Behavior of Markets)
“
Due to the various pragmatic obstacles, it is rare for a mission-critical analysis to be done in the “fully Bayesian” manner, i.e., without the use of tried-and-true frequentist tools at the various stages. Philosophy and beauty aside, the reliability and efficiency of the underlying computations required by the Bayesian framework are the main practical issues. A central technical issue at the heart of this is that it is much easier to do optimization (reliably and efficiently) in high dimensions than it is to do integration in high dimensions. Thus the workhorse machine learning methods, while there are ongoing efforts to adapt them to Bayesian framework, are almost all rooted in frequentist methods. A work-around is to perform MAP inference, which is optimization based.
Most users of Bayesian estimation methods, in practice, are likely to use a mix of Bayesian and frequentist tools. The reverse is also true—frequentist data analysts, even if they stay formally within the frequentist framework, are often influenced by “Bayesian thinking,” referring to “priors” and “posteriors.” The most advisable position is probably to know both paradigms well, in order to make informed judgments about which tools to apply in which situations.
”
”
Jake VanderPlas (Statistics, Data Mining, and Machine Learning in Astronomy: A Practical Python Guide for the Analysis of Survey Data (Princeton Series in Modern Observational Astronomy, 1))
“
The need for managers with data-analytic skills The consulting firm McKinsey and Company estimates that “there will be a shortage of talent necessary for organizations to take advantage of big data. By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions.” (Manyika, 2011). Why 10 times as many managers and analysts than those with deep analytical skills? Surely data scientists aren’t so difficult to manage that they need 10 managers! The reason is that a business can get leverage from a data science team for making better decisions in multiple areas of the business. However, as McKinsey is pointing out, the managers in those areas need to understand the fundamentals of data science to effectively get that leverage.
”
”
Foster Provost (Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking)
“
As we’ve seen, one of the most frequently pursued paths for achievement-minded college seniors is to spend several years advancing professionally and getting trained and paid by an investment bank, consulting firm, or law firm. Then, the thought process goes, they can set out to do something else with some exposure and experience under their belts. People are generally not making lifelong commitments to the field in their own minds. They’re “getting some skills” and making some connections before figuring out what they really want to do. I subscribed to a version of this mind-set when I graduated from Brown. In my case, I went to law school thinking I’d practice for a few years (and pay down my law school debt) before lining up another opportunity. It’s clear why this is such an attractive approach. There are some immensely constructive things about spending several years in professional services after graduating from college. Professional service firms are designed to train large groups of recruits annually, and they do so very successfully. After even just a year or two in a high-level bank or consulting firm, you emerge with a set of skills that can be applied in other contexts (financial modeling in Excel if you’re a financial analyst, PowerPoint and data organization and presentation if you’re a consultant, and editing and issue spotting if you’re a lawyer). This is very appealing to most any recent graduate who may not yet feel equipped with practical skills coming right out of college. Even more than the professional skill you gain, if you spend time at a bank, consultancy, or law firm, you will become excellent at producing world-class work. Every model, report, presentation, or contract needs to be sophisticated, well done, and error free, in large part because that’s one of the core value propositions of your organization. The people above you will push you to become more rigorous and disciplined, and your work product will improve across the board as a result. You’ll get used to dressing professionally, preparing for meetings, speaking appropriately, showing up on time, writing official correspondence, and so forth. You will be able to speak the corporate language. You’ll become accustomed to working very long hours doing detail-intensive work. These attributes are transferable to and helpful in many other contexts.
”
”
Andrew Yang (Smart People Should Build Things: How to Restore Our Culture of Achievement, Build a Path for Entrepreneurs, and Create New Jobs in America)
“
A good metric is a ratio or a rate. Accountants and financial analysts have several ratios they look at to understand, at a glance, the fundamental health of a company. You need some, too.
There are several reasons ratios tend to be the best metrics:
• Ratios are easier to act on. Think about driving a car. Distance traveled is informational. But speed—distance per hour—is something you can act on, because it tells you about your current state, and whether you need to go faster or slower to get to your destination on time.
• Ratios are inherently comparative. If you compare a daily metric to the same metric over a month, you’ll see whether you’re looking at a sudden spike or a long-term trend. In a car, speed is one metric, but speed right now over average speed this hour shows you a lot about whether you’re accelerating or slowing down.
• Ratios are also good for comparing factors that are somehow opposed, or for which there’s an inherent tension. In a car, this might be distance covered divided by traffic tickets. The faster you drive, the more distance you cover—but the more tickets you get. This ratio might suggest whether or not you should be breaking the speed limit.
”
”
Alistair Croll (Lean Analytics: Use Data to Build a Better Startup Faster)
“
Many aspects of the modern financial system are designed to give an impression of overwhelming urgency: the endless ‘news’ feeds, the constantly changing screens of traders, the office lights blazing late into the night, the young analysts who find themselves required to work thirty hours at a stretch. But very little that happens in the finance sector has genuine need for this constant appearance of excitement and activity. Only its most boring part—the payments system—is an essential utility on whose continuous functioning the modern economy depends. No terrible consequence would follow if the stock market closed for a week (as it did in the wake of 9/11)—or longer, or if a merger were delayed or large investment project postponed for a few weeks, or if an initial public offering happened next month rather than this. The millisecond improvement in data transmission between New York and Chicago has no significance whatever outside the absurd world of computers trading with each other. The tight coupling is simply unnecessary: the perpetual flow of ‘information’ part of a game that traders play which has no wider relevance, the excessive hours worked by many employees a tournament in which individuals compete to display their alpha qualities in return for large prizes. The traditional bank manager’s culture of long lunches and afternoons on the golf course may have yielded more useful information about business than the Bloomberg terminal. Lehman
”
”
John Kay (Other People's Money: The Real Business of Finance)
“
To understand why it is no longer an option for geneticists to lock arms with anthropologists and imply that any differences among human populations are so modest that they can be ignored, go no further than the “genome bloggers.” Since the genome revolution began, the Internet has been alive with discussion of the papers written about human variation, and some genome bloggers have even become skilled analysts of publicly available data. Compared to most academics, the politics of genome bloggers tend to the right—Razib Khan17 and Dienekes Pontikos18 post on findings of average differences across populations in traits including physical appearance and athletic ability. The Eurogenes blog spills over with sometimes as many as one thousand comments in response to postings on the charged topic of which ancient peoples spread Indo-European languages,19 a highly sensitive issue since as discussed in part II, narratives about the expansion of Indo-European speakers have been used as a basis for building national myths,20 and sometimes have been abused as happened in Nazi Germany.21 The genome bloggers’ political beliefs are fueled partly by the view that when it comes to discussion about biological differences across populations, the academics are not honoring the spirit of scientific truth-seeking. The genome bloggers take pleasure in pointing out contradictions between the politically correct messages academics often give about the indistinguishability of traits across populations and their papers showing that this is not the way the science is heading.
”
”
David Reich (Who We Are and How We Got Here: Ancient DNA and the New Science of the Human Past)
“
Organizations seeking to commercialize open source software realized this, of course, and deliberately incorporated it as part of their market approach. In a 2013 piece on Pando Daily, venture capitalist Danny Rimer quotes then-MySQL CEO Mårten Mickos as saying, “The relational database market is a $9 billion a year market. I want to shrink it to $3 billion and take a third of the market.” While MySQL may not have succeeded in shrinking the market to three billion, it is interesting to note that growing usage of MySQL was concurrent with a declining ability of Oracle to sell new licenses. Which may explain both why Sun valued MySQL at one third of a $3 billion dollar market and why Oracle later acquired Sun and MySQL. The downward price pressure imposed by open source alternatives have become sufficiently visible, in fact, as to begin raising alarm bells among financial analysts. The legacy providers of data management systems have all fallen on hard times over the last year or two, and while many are quick to dismiss legacy vendor revenue shortfalls to macroeconomic issues, we argue that these macroeconomic issues are actually accelerating a technology transition from legacy products to alternative data management systems like Hadoop and NoSQL that typically sell for dimes on the dollar. We believe these macro issues are real, and rather than just causing delays in big deals for the legacy vendors, enterprises are struggling to control costs and are increasingly looking at lower cost solutions as alternatives to traditional products. — Peter Goldmacher Cowen and Company
”
”
Stephen O’Grady (The Software Paradox: The Rise and Fall of the Commercial Software Market)
“
Many organizations we encounter lament their spreadsheet-driven culture. Every department has its own mechanism for gathering, analyzing, and reporting on its unique data. No consistent “source of truth” exists and data analysts become indispensable because they are the only people in the organization who know how a financial model works, how to access and understand the data sources, and its strengths and weaknesses. People in these organizations wish for a technology solution that could bring all the information together and make it available to all decision makers in interactive, visual dashboards.
”
”
Zach Gemignani (Data Fluency: Empowering Your Organization with Effective Data Communication)
“
Global temperatures have been irregularly declining for at least 3,000 years based on Greenland ice core data similarly to what has occurred near the end of previous interglacial periods. The increases during the Twentieth Century have been well within normal bounds over the last 800,000 years for which we have ice core data from Antarctica.
”
”
Alan Carlin (Environmentalism Gone Mad: How a Sierra Club Activist and Senior EPA Analyst Discovered a Radical Green Energy Fantasy)
“
Trump’s data analysts gave them a nickname: “double haters.” These were people who disliked both candidates but traditionally showed up at the polls to vote.
”
”
Joshua Green (Devil's Bargain: Steve Bannon, Donald Trump, and the Storming of the Presidency)
“
The Italian-owned Benetton label, for example, manufactures its entire clothing line in white. Once the clothes are delivered to distribution centers, Benneton’s analysts assess what color or length is in vogue, at which point workers dye and cut the company’s shirts, jackets, pants and infant apparel to replicate the style and color preferences popular at the time.
”
”
Martin Lindstrom (Small Data: The Tiny Clues That Uncover Huge Trends)
“
Beyond collecting comprehensive data about the online activities of hundreds of millions of people, X-KEYSCORE allows any NSA analyst to search the system’s databases by email address, telephone number, or identifying attributes such as an IP address.
”
”
Glenn Greenwald (No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State)
“
With all of the data and analytical tools at our disposal, you would not expect this, but a substantial proportion of business and investment decisions are still based on the average. I see investors and analysts contending that a stock is cheap because it trades at a PE that is lower than the sector average or that a company has too much debt because its debt ratio is higher than the average for the market. The average is not only a poor central measure on which to focus in distributions that are not symmetric, but it strikes me as a waste to not use the rest of the data.
”
”
Aswath Damodaran (Narrative and Numbers: The Value of Stories in Business (Columbia Business School Publishing))
“
There is a story in your data. But your tools don’t know what that story is. That’s where it takes you—the analyst or communicator of the information—to bring that story visually and contextually to life.
”
”
Cole Nussbaumer Knaflic (Storytelling with Data: A Data Visualization Guide for Business Professionals)
“
Today’s companies have an insatiable appetite for data, mistakenly believing that more data always creates more value. But big data is usually dumb data. Computers can find patterns that elude humans, but they don’t know how to compare patterns from different sources or how to interpret complex behaviors. Actionable insights can only come from a human analyst (or the kind of generalized artificial intelligence that exists only in science fiction).
”
”
Peter Thiel (Zero to One: Notes on Startups, or How to Build the Future)
“
Koch hired teams of analysts who worked alongside each trader to provide reams of data and analysis. The importance of this analysis was reflected in Koch’s pay structure—the company changed its payment structure so that profits were split between the trader and her supporting team of analysts. This put the analysts on equal footing with the traders. Melissa Beckett, who worked on several of Koch’s trading desks as both an analyst and trader, said Koch was unique in this regard. Other trading shops might consider analyst reports to be an afterthought; at Koch, those reports were the bedrock where a trade began.
”
”
Christopher Leonard (Kochland: The Secret History of Koch Industries and Corporate Power in America)
“
The smart guess matters to leaders now more than ever precisely because they face such a deluge of data—often with no clear map of what it portends for the future. As Richard Fairbank, CEO at Capital One, put it, “Finding a visionary strategy you believe as a leader is a very intuitive thing. There are many things a leader can’t predict using data. How do you know what you will need to have in three years? Yet you’ve got to start development now or you won’t have it when you need it. Our company hires brilliant data analysts; we have one of the biggest Oracle databases in the world. But at the end of the day, I find that all the data does is push us out farther on the frontier where it’s uncertain all over again.
”
”
Daniel Goleman (Primal Leadership, With a New Preface by the Authors: Unleashing the Power of Emotional Intelligence (Unleashing the Power of Emotinal Intelligence))
“
MMO users became so great in number that in 2008, the CIA, the NSA, and DARPA launched a covert data-mining effort, called Project Reynard, to track World of Warcraft subscribers and discern how they exist and interact in virtual worlds. To do so, CIA analysts created their own avatars and entered the virtual world of World of Warcraft.
”
”
Annie Jacobsen (The Pentagon's Brain: An Uncensored History of DARPA, America's Top-Secret Military Research Agency)
“
I was in a state of perpetual disbelief. I would have thought that someone would have recognized what was coming before June 2007. If it really took that June remit data to cause a sudden realization, well, it makes me wonder what a ‘Wall Street analyst’ really does all day.” By
”
”
Michael Lewis (The Big Short: Inside the Doomsday Machine)
“
Chapter 3), the resulting index variables typically are continuous as well. When variables are continuous, we should not recode them as categorical variables just to use the techniques of the previous chapters. Continuous variables provide valuable information about distances between categories and often have a broader range of values than ordinal variables. Recoding continuous variables as categorical variables is discouraged because it results in a loss of information; we should use tests such as the t-test. Statistics involving continuous variables usually require more test assumptions. Many of these tests are referred to as parametric statistics; this term refers to the fact that they make assumptions about the distribution of data and also that they are used to make inferences about population parameters. Formally, the term parametric means that a test makes assumptions about the distribution of the underlying population. Parametric tests have more test assumptions than nonparametric tests, most typically that the variable is
”
”
Evan M. Berman (Essential Statistics for Public Managers and Policy Analysts)
“
The null hypothesis of normality is that the variable is normally distributed: thus, we do not want to reject the null hypothesis. A problem with statistical tests of normality is that they are very sensitive to small samples and minor deviations from normality. The extreme sensitivity of these tests implies the following: whereas failure to reject the null hypo thesis indicates normal distribution of a variable, rejecting the null hypothesis does not indicate that the variable is not normally distributed. It is acceptable to consider variables as being normally distributed when they visually appear to be so, even when the null hypothesis of normality is rejected by normality tests. Of course, variables are preferred that are supported by both visual inspection and normality tests. In Greater Depth … Box 12.1 Why Normality? The reasons for the normality assumption are twofold: First, the features of the normal distribution are well-established and are used in many parametric tests for making inferences and hypothesis testing. Second, probability theory suggests that random samples will often be normally distributed, and that the means of these samples can be used as estimates of population means. The latter reason is informed by the central limit theorem, which states that an infinite number of relatively large samples will be normally distributed, regardless of the distribution of the population. An infinite number of samples is also called a sampling distribution. The central limit theorem is usually illustrated as follows. Assume that we know the population distribution, which has only six data elements with the following values: 1, 2, 3, 4, 5, or 6. Next, we write each of these six numbers on a separate sheet of paper, and draw repeated samples of three numbers each (that is, n = 3). We
”
”
Evan M. Berman (Essential Statistics for Public Managers and Policy Analysts)
“
Assume that a welfare manager in our earlier example (see discussion of path analysis) takes a snapshot of the status of the welfare clients. Some clients may have obtained employment and others not yet. Clients will also vary as to the amount of time that they have been receiving welfare. Examine the data in Table 18.2. It shows that neither of the two clients, who have yet to complete their first week on welfare, has found employment; one of the three clients who have completed one week of welfare has found employment. Censored observations are observations for which the specified outcome has yet to occur. It is assumed that all clients who have not yet found employment are still waiting for this event to occur. Thus, the sample should not include clients who are not seeking employment. Note, however, that a censored observation is very different from one that has missing data, which might occur because the manager does not know whether the client has found employment. As with regression, records with missing data are excluded from analysis. A censored
”
”
Evan M. Berman (Essential Statistics for Public Managers and Policy Analysts)
“
observation is simply an observation for which a specified outcome has not yet occurred. Assume that data exist from a random sample of 100 clients who are seeking, or have found, employment. Survival analysis is the statistical procedure for analyzing these data. The name of this procedure stems from its use in medical research. In clinical trials, researchers want to know the survival (or disease) rate of patients as a function of the duration of their treatment. For patients in the middle of their trial, the specified outcome may not have occurred yet. We obtain the following results (also called a life table) from analyzing hypothetical data from welfare records (see Table 18.3). In the context shown in the table, the word terminal signifies that the event has occurred. That is, the client has found employment. At start time zero, 100 cases enter the interval. During the first period, there are no terminal cases and nine censored cases. Thus, 91 cases enter the next period. In this second period, 2 clients find employment and 14 do not, resulting in 75 cases that enter the following period. The column labeled “Cumulative proportion surviving until end of interval” is an estimate of probability of surviving (not finding employment) until the end of the stated interval.5 The column labeled “Probability density” is an estimate of the probability of the terminal event occurring (that is, finding employment) during the time interval. The results also report that “the median survival time is 5.19.” That is, half of the clients find employment in 5.19 weeks. Table 18.2 Censored Observations Note: Obs = observations (clients); Emp = employment; 0 = has not yet found employment; 1 = has found employment. Table 18.3 Life Table Results
”
”
Evan M. Berman (Essential Statistics for Public Managers and Policy Analysts)
“
Note: The median survival time is 5.19. Survival analysis can also examine survival rates for different “treatments” or conditions. Assume that data are available about the number of dependents that each client has. Table 18.3 is readily produced for each subset of this condition. For example, by comparing the survival rates of those with and those without dependents, the probability density figure, which shows the likelihood of an event occurring, can be obtained (Figure 18.5). This figure suggests that having dependents is associated with clients’ finding employment somewhat faster. Beyond Life Tables Life tables require that the interval (time) variable be measured on a discrete scale. When the time variable is continuous, Kaplan-Meier survival analysis is used. This procedure is quite analogous to life tables analysis. Cox regression is similar to Kaplan-Meier but allows for consideration of a larger number of independent variables (called covariates). In all instances, the purpose is to examine the effect of treatment on the survival of observations, that is, the occurrence of a dichotomous event. Figure 18.5 Probability Density FACTOR ANALYSIS A variety of statistical techniques help analysts to explore relationships in their data. These exploratory techniques typically aim to create groups of variables (or observations) that are related to each
”
”
Evan M. Berman (Essential Statistics for Public Managers and Policy Analysts)
“
other and distinct from other groups. These techniques usually precede regression and other analyses. Factor analysis is a well-established technique that often aids in creating index variables. Earlier, Chapter 3 discussed the use of Cronbach alpha to empirically justify the selection of variables that make up an index. However, in that approach analysts must still justify that variables used in different index variables are indeed distinct. By contrast, factor analysis analyzes a large number of variables (often 20 to 30) and classifies them into groups based on empirical similarities and dissimilarities. This empirical assessment can aid analysts’ judgments regarding variables that might be grouped together. Factor analysis uses correlations among variables to identify subgroups. These subgroups (called factors) are characterized by relatively high within-group correlation among variables and low between-group correlation among variables. Most factor analysis consists of roughly four steps: (1) determining that the group of variables has enough correlation to allow for factor analysis, (2) determining how many factors should be used for classifying (or grouping) the variables, (3) improving the interpretation of correlations and factors (through a process called rotation), and (4) naming the factors and, possibly, creating index variables for subsequent analysis. Most factor analysis is used for grouping of variables (R-type factor analysis) rather than observations (Q-type). Often, discriminant analysis is used for grouping of observations, mentioned later in this chapter. The terminology of factor analysis differs greatly from that used elsewhere in this book, and the discussion that follows is offered as an aid in understanding tables that might be encountered in research that uses this technique. An important task in factor analysis is determining how many common factors should be identified. Theoretically, there are as many factors as variables, but only a few factors account for most of the variance in the data. The percentage of variation explained by each factor is defined as the eigenvalue divided by the number of variables, whereby the
”
”
Evan M. Berman (Essential Statistics for Public Managers and Policy Analysts)
“
SUMMARY A vast array of additional statistical methods exists. In this concluding chapter, we summarized some of these methods (path analysis, survival analysis, and factor analysis) and briefly mentioned other related techniques. This chapter can help managers and analysts become familiar with these additional techniques and increase their access to research literature in which these techniques are used. Managers and analysts who would like more information about these techniques will likely consult other texts or on-line sources. In many instances, managers will need only simple approaches to calculate the means of their variables, produce a few good graphs that tell the story, make simple forecasts, and test for significant differences among a few groups. Why, then, bother with these more advanced techniques? They are part of the analytical world in which managers operate. Through research and consulting, managers cannot help but come in contact with them. It is hoped that this chapter whets the appetite and provides a useful reference for managers and students alike. KEY TERMS Endogenous variables Exogenous variables Factor analysis Indirect effects Loading Path analysis Recursive models Survival analysis Notes 1. Two types of feedback loops are illustrated as follows: 2. When feedback loops are present, error terms for the different models will be correlated with exogenous variables, violating an error term assumption for such models. Then, alternative estimation methodologies are necessary, such as two-stage least squares and others discussed later in this chapter. 3. Some models may show double-headed arrows among error terms. These show the correlation between error terms, which is of no importance in estimating the beta coefficients. 4. In SPSS, survival analysis is available through the add-on module in SPSS Advanced Models. 5. The functions used to estimate probabilities are rather complex. They are so-called Weibull distributions, which are defined as h(t) = αλ(λt)a–1, where a and 1 are chosen to best fit the data. 6. Hence, the SSL is greater than the squared loadings reported. For example, because the loadings of variables in groups B and C are not shown for factor 1, the SSL of shown loadings is 3.27 rather than the reported 4.084. If one assumes the other loadings are each .25, then the SSL of the not reported loadings is [12*.252 =] .75, bringing the SSL of factor 1 to [3.27 + .75 =] 4.02, which is very close to the 4.084 value reported in the table. 7. Readers who are interested in multinomial logistic regression can consult on-line sources or the SPSS manual, Regression Models 10.0 or higher. The statistics of discriminant analysis are very dissimilar from those of logistic regression, and readers are advised to consult a separate text on that topic. Discriminant analysis is not often used in public
”
”
Evan M. Berman (Essential Statistics for Public Managers and Policy Analysts)
“
The Scheffe test is the most conservative, the Tukey test is best when many comparisons are made (when there are many groups), and the Bonferroni test is preferred when few comparisons are made. However, these post-hoc tests often support the same conclusions.3 To illustrate, let’s say the independent variable has three categories. Then, a post-hoc test will examine hypotheses for whether . In addition, these tests will also examine which categories have means that are not significantly different from each other, hence, providing homogeneous subsets. An example of this approach is given later in this chapter. Knowing such subsets can be useful when the independent variable has many categories (for example, classes of employees). Figure 13.1 ANOVA: Significant and Insignificant Differences Eta-squared (η2) is a measure of association for mixed nominal-interval variables and is appropriate for ANOVA. Its values range from zero to one, and it is interpreted as the percentage of variation explained. It is a directional measure, and computer programs produce two statistics, alternating specification of the dependent variable. Finally, ANOVA can be used for testing interval-ordinal relationships. We can ask whether the change in means follows a linear pattern that is either increasing or decreasing. For example, assume we want to know whether incomes increase according to the political orientation of respondents, when measured on a seven-point Likert scale that ranges from very liberal to very conservative. If a linear pattern of increase exists, then a linear relationship is said to exist between these variables. Most statistical software packages can test for a variety of progressive relationships. ANOVA Assumptions ANOVA assumptions are essentially the same as those of the t-test: (1) the dependent variable is continuous, and the independent variable is ordinal or nominal, (2) the groups have equal variances, (3) observations are independent, and (4) the variable is normally distributed in each of the groups. The assumptions are tested in a similar manner. Relative to the t-test, ANOVA requires a little more concern regarding the assumptions of normality and homogeneity. First, like the t-test, ANOVA is not robust for the presence of outliers, and analysts examine the presence of outliers for each group. Also, ANOVA appears to be less robust than the t-test for deviations from normality. Second, regarding groups having equal variances, our main concern with homogeneity is that there are no substantial differences in the amount of variance across the groups; the test of homogeneity is a strict test, testing for any departure from equal variances, and in practice, groups may have neither equal variances nor substantial differences in the amount of variances. In these instances, a visual finding of no substantial differences suffices. Other strategies for dealing with heterogeneity are variable transformations and the removal of outliers, which increase variance, especially in small groups. Such outliers are detected by examining boxplots for each group separately. Also, some statistical software packages (such as SPSS), now offer post-hoc tests when equal variances are not assumed.4 A Working Example The U.S. Environmental Protection Agency (EPA) measured the percentage of wetland loss in watersheds between 1982 and 1992, the most recent period for which data are available (government statistics are sometimes a little old).5 An analyst wants to know whether watersheds with large surrounding populations have
”
”
Evan M. Berman (Essential Statistics for Public Managers and Policy Analysts)
“
suffered greater wetland loss than watersheds with smaller surrounding populations. Most watersheds have suffered no or only very modest losses (less than 3 percent during the decade in question), and few watersheds have suffered more than a 4 percent loss. The distribution is thus heavily skewed toward watersheds with little wetland losses (that is, to the left) and is clearly not normally distributed.6 To increase normality, the variable is transformed by twice taking the square root, x.25. The transformed variable is then normally distributed: the Kolmogorov-Smirnov statistic is 0.82 (p = .51 > .05). The variable also appears visually normal for each of the population subgroups. There are four population groups, designed to ensure an adequate number of observations in each. Boxplot analysis of the transformed variable indicates four large and three small outliers (not shown). Examination suggests that these are plausible and representative values, which are therefore retained. Later, however, we will examine the effect of these seven observations on the robustness of statistical results. Descriptive analysis of the variables is shown in Table 13.1. Generally, large populations tend to have larger average wetland losses, but the standard deviations are large relative to (the difference between) these means, raising considerable question as to whether these differences are indeed statistically significant. Also, the untransformed variable shows that the mean wetland loss is less among watersheds with “Medium I” populations than in those with “Small” populations (1.77 versus 2.52). The transformed variable shows the opposite order (1.06 versus 0.97). Further investigation shows this to be the effect of the three small outliers and two large outliers on the calculation of the mean of the untransformed variable in the “Small” group. Variable transformation minimizes this effect. These outliers also increase the standard deviation of the “Small” group. Using ANOVA, we find that the transformed variable has unequal variances across the four groups (Levene’s statistic = 2.83, p = .41 < .05). Visual inspection, shown in Figure 13.2, indicates that differences are not substantial for observations within the group interquartile ranges, the areas indicated by the boxes. The differences seem mostly caused by observations located in the whiskers of the “Small” group, which include the five outliers mentioned earlier. (The other two outliers remain outliers and are shown.) For now, we conclude that no substantial differences in variances exist, but we later test the robustness of this conclusion with consideration of these observations (see Figure 13.2). Table 13.1 Variable Transformation We now proceed with the ANOVA analysis. First, Table 13.2 shows that the global F-test statistic is 2.91, p = .038 < .05. Thus, at least one pair of means is significantly different. (The term sum of squares is explained in note 1.) Getting Started Try ANOVA on some data of your choice. Second, which pairs are significantly different? We use the Bonferroni post-hoc test because relatively few comparisons are made (there are only four groups). The computer-generated results (not shown in Table 13.2) indicate that the only significant difference concerns the means of the “Small” and “Large” groups. This difference (1.26 - 0.97 = 0.29 [of transformed values]) is significant at the 5 percent level (p = .028). The Tukey and Scheffe tests lead to the same conclusion (respectively, p = .024 and .044). (It should be noted that post-hoc tests also exist for when equal variances are not assumed. In our example, these tests lead to the same result.7) This result is consistent with a visual reexamination of Figure 13.2, which shows that differences between group means are indeed small. The Tukey and
”
”
Evan M. Berman (Essential Statistics for Public Managers and Policy Analysts)
“
Chapter 3), the resulting index variables typically are continuous as well. When variables are continuous, we should not recode them as categorical variables just to use the techniques of the previous chapters. Continuous variables provide valuable information about distances between categories and often have a broader range of values than ordinal variables. Recoding continuous variables as categorical variables is discouraged because it results in a loss of information; we should use tests such as the t-test. Statistics involving continuous variables usually require more test assumptions. Many of these tests are referred to as parametric statistics; this term refers to the fact that they make assumptions about the distribution of data and also that they are used to make inferences about population parameters. Formally, the term parametric means that a test makes assumptions about the distribution of the underlying population. Parametric tests have more test assumptions than nonparametric tests, most typically that the variable is continuous and normally distributed (see Chapter 7). These and other test assumptions are also part of t-tests. This chapter focuses on three common t-tests: for independent samples, for dependent (paired) samples, and the one-sample t-test. For each, we provide examples and discuss test assumptions. This chapter also discusses nonparametric alternatives to t-tests, which analysts will want to consider when t-test assumptions cannot be met for their variables. As a general rule, a bias exists toward using parametric tests because they are more powerful than nonparametric tests. Nonparametric alternatives to parametric tests often transform continuous testing variables into other types of variables, such as rankings, which reduces information about them. Although nonparametric statistics are easier to use because they have fewer assumptions, parametric tests are more likely to find statistical evidence that two variables are associated; their tests often have lower p-values than nonparametric statistics.1
”
”
Evan M. Berman (Essential Statistics for Public Managers and Policy Analysts)
“
usually does not present much of a problem. Some analysts use t-tests with ordinal rather than continuous data for the testing variable. This approach is theoretically controversial because the distances among ordinal categories are undefined. This situation is avoided easily by using nonparametric alternatives (discussed later in this chapter). Also, when the grouping variable is not dichotomous, analysts need to make it so in order to perform a t-test. Many statistical software packages allow dichotomous variables to be created from other types of variables, such as by grouping or recoding ordinal or continuous variables. The second assumption is that the variances of the two distributions are equal. This is called homogeneity of variances. The use of pooled variances in the earlier formula is justified only when the variances of the two groups are equal. When variances are unequal (called heterogeneity of variances), revised formulas are used to calculate t-test test statistics and degrees of freedom.7 The difference between homogeneity and heterogeneity is shown graphically in Figure 12.2. Although we needn’t be concerned with the precise differences in these calculation methods, all t-tests first test whether variances are equal in order to know which t-test test statistic is to be used for subsequent hypothesis testing. Thus, every t-test involves a (somewhat tricky) two-step procedure. A common test for the equality of variances is the Levene’s test. The null hypothesis of this test is that variances are equal. Many statistical software programs provide the Levene’s test along with the t-test, so that users know which t-test to use—the t-test for equal variances or that for unequal variances. The Levene’s test is performed first, so that the correct t-test can be chosen. Figure 12.2 Equal and Unequal Variances The term robust is used, generally, to describe the extent to which test conclusions are unaffected by departures from test assumptions. T-tests are relatively robust for (hence, unaffected by) departures from assumptions of homogeneity and normality (see below) when groups are of approximately equal size. When groups are of about equal size, test conclusions about any difference between their means will be unaffected by heterogeneity. The third assumption is that observations are independent. (Quasi-) experimental research designs violate this assumption, as discussed in Chapter 11. The formula for the t-test test statistic, then, is modified to test whether the difference between before and after measurements is zero. This is called a paired t-test, which is discussed later in this chapter. The fourth assumption is that the distributions are normally distributed. Although normality is an important test assumption, a key reason for the popularity of the t-test is that t-test conclusions often are robust against considerable violations of normality assumptions that are not caused by highly skewed distributions. We provide some detail about tests for normality and how to address departures thereof. Remember, when nonnormality cannot be resolved adequately, analysts consider nonparametric alternatives to the t-test, discussed at the end of this chapter. Box 12.1 provides a bit more discussion about the reason for this assumption. A combination of visual inspection and statistical tests is always used to determine the normality of variables. Two tests of normality are the Kolmogorov-Smirnov test (also known as the K-S test) for samples with more than 50 observations and the Shapiro-Wilk test for samples with up to 50 observations. The null hypothesis of
”
”
Evan M. Berman (Essential Statistics for Public Managers and Policy Analysts)
“
For comparison, we use the Mann-Whitney test to compare the two samples of 10th graders discussed earlier in this chapter. The sum of ranks for the “before” group is 69.55, and for the “one year later group,” 86.57. The test statistic is significant at p = .019, yielding the same conclusion as the independent-samples t-test, p = .011. This comparison also shows that nonparametric tests do have higher levels of significance. As mentioned earlier, the Mann-Whitney test (as a nonparametric test) does not calculate the group means; separate, descriptive analysis needs to be undertaken for that information. A nonparametric alternative to the paired-samples t-test is the Wilcoxon signed rank test. This test assigns ranks based on the absolute values of these differences (Table 12.5). The signs of the differences are retained (thus, some values are positive and others are negative). For the data in Table 12.5, there are seven positive ranks (with mean rank = 6.57) and three negative ranks (with mean rank = 3.00). The Wilcoxon signed rank test statistic is normally distributed. The Wilcoxon signed rank test statistic, Z, for a difference between these values is 1.89 (p = .059 > .05). Hence, according to this test, the differences between the before and after scores are not significant. Getting Started Calculate a t-test and a Mann-Whitney test on data of your choice. Again, nonparametric tests result in larger p-values. The paired-samples t-test finds that p = .038 < .05, providing sufficient statistical evidence to conclude that the differences are significant. It might also be noted that a doubling of the data in Table 12.5 results in finding a significant difference between the before and after scores with the Wilcoxon signed rank test, Z = 2.694, p = .007. Table 12.5 Wilcoxon Signed Rank Test The Wilcoxon signed rank test can also be adapted as a nonparametric alternative to the one-sample t-test. In that case, analysts create a second variable that, for each observation, is the test value. For example, if in Table 12.5 we wish to test whether the mean of variable “before” is different from, say, 4.0, we create a second variable with 10 observations for which each value is, say, 4.0. Then using the Wilcoxon signed rank test for the “before” variable and this new,
”
”
Evan M. Berman (Essential Statistics for Public Managers and Policy Analysts)
“
different from 3.5. However, it is different from larger values, such as 4.0 (t = 2.89, df = 9, p = .019). Another example of this is provided in the Box 12.2. Finally, note that the one-sample t-test is identical to the paired-samples t-test for testing whether the mean D = 0. Indeed, the one-sample t-test for D = 0 produces the same results (t = 2.43, df = 9, p = .038). In Greater Depth … Box 12.2 Use of the T-Test in Performance Management: An Example Performance benchmarking is an increasingly popular tool in performance management. Public and nonprofit officials compare the performance of their agencies with performance benchmarks and draw lessons from the comparison. Let us say that a city government requires its fire and medical response unit to maintain an average response time of 360 seconds (6 minutes) to emergency requests. The city manager has suspected that the growth in population and demands for the services have slowed down the responses recently. He draws a sample of 10 response times in the most recent month: 230, 450, 378, 430, 270, 470, 390, 300, 470, and 530 seconds, for a sample mean of 392 seconds. He performs a one-sample t-test to compare the mean of this sample with the performance benchmark of 360 seconds. The null hypothesis of this test is that the sample mean is equal to 360 seconds, and the alternate hypothesis is that they are different. The result (t = 1.030, df = 9, p = .330) shows a failure to reject the null hypothesis at the 5 percent level, which means that we don’t have sufficient evidence to say that the average response time is different from the benchmark 360 seconds. We cannot say that current performance of 392 seconds is significantly different from the 360-second benchmark. Perhaps more data (samples) are needed to reach such a conclusion, or perhaps too much variability exists for such a conclusion to be reached. NONPARAMETRIC ALTERNATIVES TO T-TESTS The tests described in the preceding sections have nonparametric alternatives. The chief advantage of these tests is that they do not require continuous variables to be normally distributed. The chief disadvantage is that they are less likely to reject the null hypothesis. A further, minor disadvantage is that these tests do not provide descriptive information about variable means; separate analysis is required for that. Nonparametric alternatives to the independent-samples test are the Mann-Whitney and Wilcoxon tests. The Mann-Whitney and Wilcoxon tests are equivalent and are thus discussed jointly. Both are simplifications of the more general Kruskal-Wallis’ H test, discussed in Chapter 11.19 The Mann-Whitney and Wilcoxon tests assign ranks to the testing variable in the exact manner shown in Table 12.4. The sum of the ranks of each group is computed, shown in the table. Then a test is performed to determine the statistical significance of the difference between the sums, 22.5 and 32.5. Although the Mann-Whitney U and Wilcoxon W test statistics are calculated differently, they both have the same level of statistical significance: p = .295. Technically, this is not a test of different means but of different distributions; the lack of significance implies that groups 1 and 2 can be regarded as coming from the same population.20 Table 12.4 Rankings of
”
”
Evan M. Berman (Essential Statistics for Public Managers and Policy Analysts)
“
In the past, firms could employ teams of statisticians, modelers, and analysts to explore datasets manually, but the volume and variety of data have far outstripped the capacity of manual analysis.
”
”
Foster Provost (Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking)
“
Remedies exist for correcting substantial departures from normality, but these remedies may make matters worse when departures from normality are minimal. The first course of action is to identify and remove any outliers that may affect the mean and standard deviation. The second course of action is variable transformation, which involves transforming the variable, often by taking log(x), of each observation, and then testing the transformed variable for normality. Variable transformation may address excessive skewness by adjusting the measurement scale, thereby helping variables to better approximate normality.8 Substantively, we strongly prefer to make conclusions that satisfy test assumptions, regardless of which measurement scale is chosen.9 Keep in mind that when variables are transformed, the units in which results are expressed are transformed, as well. An example of variable transformation is provided in the second working example. Typically, analysts have different ways to address test violations. Examination of the causes of assumption violations often helps analysts to better understand their data. Different approaches may be successful for addressing test assumptions. Analysts should not merely go by the result of one approach that supports their case, ignoring others that perhaps do not. Rather, analysts should rely on the weight of robust, converging results to support their final test conclusions. Working Example 1 Earlier we discussed efforts to reduce high school violence by enrolling violence-prone students into classes that address anger management. Now, after some time, administrators and managers want to know whether the program is effective. As part of this assessment, students are asked to report their perception of safety at school. An index variable is constructed from different items measuring safety (see Chapter 3). Each item is measured on a seven-point Likert scale (1 = strongly disagree to 7 = strongly agree), and the index is constructed such that a high value indicates that students feel safe.10 The survey was initially administered at the beginning of the program. Now, almost a year later, the survey is implemented again.11 Administrators want to know whether students who did not participate in the anger management program feel that the climate is now safer. The analysis included here focuses on 10th graders. For practical purposes, the samples of 10th graders at the beginning of the program and one year later are regarded as independent samples; the subjects are not matched. Descriptive analysis shows that the mean perception of
”
”
Evan M. Berman (Essential Statistics for Public Managers and Policy Analysts)
“
Simple Regression CHAPTER OBJECTIVES After reading this chapter, you should be able to Use simple regression to test the statistical significance of a bivariate relationship involving one dependent and one independent variable Use Pearson’s correlation coefficient as a measure of association between two continuous variables Interpret statistics associated with regression analysis Write up the model of simple regression Assess assumptions of simple regression This chapter completes our discussion of statistical techniques for studying relationships between two variables by focusing on those that are continuous. Several approaches are examined: simple regression; the Pearson’s correlation coefficient; and a nonparametric alterative, Spearman’s rank correlation coefficient. Although all three techniques can be used, we focus particularly on simple regression. Regression allows us to predict outcomes based on knowledge of an independent variable. It is also the foundation for studying relationships among three or more variables, including control variables mentioned in Chapter 2 on research design (and also in Appendix 10.1). Regression can also be used in time series analysis, discussed in Chapter 17. We begin with simple regression. SIMPLE REGRESSION Let’s first look at an example. Say that you are a manager or analyst involved with a regional consortium of 15 local public agencies (in cities and counties) that provide low-income adults with health education about cardiovascular diseases, in an effort to reduce such diseases. The funding for this health education comes from a federal grant that requires annual analysis and performance outcome reporting. In Chapter 4, we used a logic model to specify that a performance outcome is the result of inputs, activities, and outputs. Following the development of such a model, you decide to conduct a survey among participants who attend such training events to collect data about the number of events they attended, their knowledge of cardiovascular disease, and a variety of habits such as smoking that are linked to cardiovascular disease. Some things that you might want to know are whether attending workshops increases
”
”
Evan M. Berman (Essential Statistics for Public Managers and Policy Analysts)
“
knowledge of cardiovascular disease and whether such knowledge reduces behaviors that put people at risk for cardiovascular disease. Simple regression is used to analyze the relationship between two continuous variables. Continuous variables assume that the distances between ordered categories are determinable.1 In simple regression, one variable is defined as the dependent variable and the other as the independent variable (see Chapter 2 for the definitions). In the current example, the level of knowledge obtained from workshops and other sources might be measured on a continuous scale and treated as an independent variable, and behaviors that put people at risk for cardiovascular disease might also be measured on a continuous scale and treated as a dependent variable. Scatterplot The relationship between two continuous variables can be portrayed in a scatterplot. A scatterplot is merely a plot of the data points for two continuous variables, as shown in Figure 14.1 (without the straight line). By convention, the dependent variable is shown on the vertical (or Y-) axis, and the independent variable on the horizontal (or X-) axis. The relationship between the two variables is estimated as a straight line relationship. The line is defined by the equation y = a + bx, where a is the intercept (or constant), and b is the slope. The slope, b, is defined as Figure 14.1 Scatterplot or (y2 – y1)/(x2 – x1). The line is calculated mathematically such that the sum of distances from each observation to the line is minimized.2 By definition, the slope indicates the change in y as a result of a unit change in x. The straight line, defined by y = a + bx, is also called the regression line, and the slope (b) is called the regression coefficient. A positive regression coefficient indicates a positive relationship between the variables, shown by the upward slope in Figure 14.1. A negative regression coefficient indicates a negative relationship between the variables and is indicated by a downward-sloping line. Test of Significance The test of significance of the regression coefficient is a key test that tells us whether the slope (b) is statistically different from zero. The slope is calculated from a sample, and we wish to know whether it is significant. When the regression line is horizontal (b = 0), no relationship exists between the two variables. Then, changes in the independent variable have no effect on the dependent variable. The following hypotheses are thus stated: H0: b = 0, or the two variables are unrelated. HA: b ≠ 0, or the two variables are (positively or negatively) related. To determine whether the slope equals zero, a t-test is performed. The test statistic is defined as the slope, b, divided by the standard error of the slope, se(b). The standard error of the slope is a measure of the distribution of the observations around the regression slope, which is based on the standard deviation of those observations to the regression line: Thus, a regression line with a small slope is more likely to be statistically significant when observations lie closely around it (that is, the standard error of the observations around the line is also small, resulting in a larger test statistic). By contrast, the same regression line might be statistically insignificant when observations are scattered widely around it. Observations that lie farther from the
”
”
Evan M. Berman (Essential Statistics for Public Managers and Policy Analysts)
“
(e). Hence the expressions are equivalent, as is y = ŷ + e. Certain assumptions about e are important, such as that it is normally distributed. When error term assumptions are violated, incorrect conclusions may be made about the statistical significance of relationships. This important issue is discussed in greater detail in Chapter 15 and, for time series data, in Chapter 17. Hence, the above is a pertinent but incomplete list of assumptions. Getting Started Conduct a simple regression, and practice writing up your results. PEARSON’S CORRELATION COEFFICIENT Pearson’s correlation coefficient, r, measures the association (significance, direction, and strength) between two continuous variables; it is a measure of association for two continuous variables. Also called the Pearson’s product-moment correlation coefficient, it does not assume a causal relationship, as does simple regression. The correlation coefficient indicates the extent to which the observations lie closely or loosely clustered around the regression line. The coefficient r ranges from –1 to +1. The sign indicates the direction of the relationship, which, in simple regression, is always the same as the slope coefficient. A “–1” indicates a perfect negative relationship, that is, that all observations lie exactly on a downward-sloping regression line; a “+1” indicates a perfect positive relationship, whereby all observations lie exactly on an upward-sloping regression line. Of course, such values are rarely obtained in practice because observations seldom lie exactly on a line. An r value of zero indicates that observations are so widely scattered that it is impossible to draw any well-fitting line. Figure 14.2 illustrates some values of r. Key Point Pearson’s correlation coefficient, r, ranges from –1 to +1. It is important to avoid confusion between Pearson’s correlation coefficient and the coefficient of determination. For the two-variable, simple regression model, r2 = R2, but whereas 0 ≤ R ≤ 1, r ranges from –1 to +1. Hence, the sign of r tells us whether a relationship is positive or negative, but the sign of R, in regression output tables such as Table 14.1, is always positive and cannot inform us about the direction of the relationship. In simple regression, the regression coefficient, b, informs us about the direction of the relationship. Statistical software programs usually show r rather than r2. Note also that the Pearson’s correlation coefficient can be used only to assess the association between two continuous variables, whereas regression can be extended to deal with more than two variables, as discussed in Chapter 15. Pearson’s correlation coefficient assumes that both variables are normally distributed. When Pearson’s correlation coefficients are calculated, a standard error of r can be determined, which then allows us to test the statistical significance of the bivariate correlation. For bivariate relationships, this is the same level of significance as shown for the slope of the regression coefficient. For the variables given earlier in this chapter, the value of r is .272 and the statistical significance of r is p ≤ .01. Use of the Pearson’s correlation coefficient assumes that the variables are normally distributed and that there are no significant departures from linearity.7 It is important not to confuse the correlation coefficient, r, with the regression coefficient, b. Comparing the measures r and b (the slope) sometimes causes confusion. The key point is that r does not indicate the regression slope but rather the extent to which observations lie close to it. A steep regression line (large b) can have observations scattered loosely or closely around it, as can a shallow (more horizontal) regression line. The purposes of these two statistics are very different.8 SPEARMAN’S RANK CORRELATION
”
”
Evan M. Berman (Essential Statistics for Public Managers and Policy Analysts)
“
COEFFICIENT The nonparametric alternative, Spearman’s rank correlation coefficient (r, or “rho”), looks at correlation among the ranks of the data rather than among the values. The ranks of data are determined as shown in Table 14.2 (adapted from Table 11.8): Table 14.2 Ranks of Two Variables In Greater Depth … Box 14.1 Crime and Poverty An analyst wants to examine empirically the relationship between crime and income in cities across the United States. The CD that accompanies the workbook Exercising Essential Statistics includes a Community Indicators dataset with assorted indicators of conditions in 98 cities such as Akron, Ohio; Phoenix, Arizona; New Orleans, Louisiana; and Seattle, Washington. The measures include median household income, total population (both from the 2000 U.S. Census), and total violent crimes (FBI, Uniform Crime Reporting, 2004). In the sample, household income ranges from $26,309 (Newark, New Jersey) to $71,765 (San Jose, California), and the median household income is $42,316. Per-capita violent crime ranges from 0.15 percent (Glendale, California) to 2.04 percent (Las Vegas, Nevada), and the median violent crime rate per capita is 0.78 percent. There are four types of violent crimes: murder and nonnegligent manslaughter, forcible rape, robbery, and aggravated assault. A measure of total violent crime per capita is calculated because larger cities are apt to have more crime. The analyst wants to examine whether income is associated with per-capita violent crime. The scatterplot of these two continuous variables shows that a negative relationship appears to be present: The Pearson’s correlation coefficient is –.532 (p < .01), and the Spearman’s correlation coefficient is –.552 (p < .01). The simple regression model shows R2 = .283. The regression model is as follows (t-test statistic in parentheses): The regression line is shown on the scatterplot. Interpreting these results, we see that the R-square value of .283 indicates a moderate relationship between these two variables. Clearly, some cities with modest median household incomes have a high crime rate. However, removing these cities does not greatly alter the findings. Also, an assumption of regression is that the error term is normally distributed, and further examination of the error shows that it is somewhat skewed. The techniques for examining the distribution of the error term are discussed in Chapter 15, but again, addressing this problem does not significantly alter the finding that the two variables are significantly related to each other, and that the relationship is of moderate strength. With this result in hand, further analysis shows, for example, by how much violent crime decreases for each increase in household income. For each increase of $10,000 in average household income, the violent crime rate drops 0.25 percent. For a city experiencing the median 0.78 percent crime rate, this would be a considerable improvement, indeed. Note also that the scatterplot shows considerable variation in the crime rate for cities at or below the median household income, in contrast to those well above it. Policy analysts may well wish to examine conditions that give rise to variation in crime rates among cities with lower incomes. Because Spearman’s rank correlation coefficient examines correlation among the ranks of variables, it can also be used with ordinal-level data.9 For the data in Table 14.2, Spearman’s rank correlation coefficient is .900 (p = .035).10 Spearman’s p-squared coefficient has a “percent variation explained” interpretation, similar
”
”
Evan M. Berman (Essential Statistics for Public Managers and Policy Analysts)
“
to the measures described earlier. Hence, 90 percent of the variation in one variable can be explained by the other. For the variables given earlier, the Spearman’s rank correlation coefficient is .274 (p < .01), which is comparable to r reported in preceding sections. Box 14.1 illustrates another use of the statistics described in this chapter, in a study of the relationship between crime and poverty. SUMMARY When analysts examine relationships between two continuous variables, they can use simple regression or the Pearson’s correlation coefficient. Both measures show (1) the statistical significance of the relationship, (2) the direction of the relationship (that is, whether it is positive or negative), and (3) the strength of the relationship. Simple regression assumes a causal and linear relationship between the continuous variables. The statistical significance and direction of the slope coefficient is used to assess the statistical significance and direction of the relationship. The coefficient of determination, R2, is used to assess the strength of relationships; R2 is interpreted as the percent variation explained. Regression is a foundation for studying relationships involving three or more variables, such as control variables. The Pearson’s correlation coefficient does not assume causality between two continuous variables. A nonparametric alternative to testing the relationship between two continuous variables is the Spearman’s rank correlation coefficient, which examines correlation among the ranks of the data rather than among the values themselves. As such, this measure can also be used to study relationships in which one or both variables are ordinal. KEY TERMS Coefficient of determination, R2 Error term Observed value of y Pearson’s correlation coefficient, r Predicted value of the dependent variable y, ŷ Regression coefficient Regression line Scatterplot Simple regression assumptions Spearman’s rank correlation coefficient Standard error of the estimate Test of significance of the regression coefficient Notes 1. See Chapter 3 for a definition of continuous variables. Although the distinction between ordinal and continuous is theoretical (namely, whether or not the distance between categories can be measured), in practice ordinal-level variables with seven or more categories (including Likert variables) are sometimes analyzed using statistics appropriate for interval-level variables. This practice has many critics because it violates an assumption of regression (interval data), but it is often
”
”
Evan M. Berman (Essential Statistics for Public Managers and Policy Analysts)
“
safety at the beginning of the program was 4.40 (standard deviation, SD = 1.00), and one year later, 4.80 (SD = 0.94). The mean safety score increased among 10th graders, but is the increase statistically significant? Among other concerns is that the standard deviations are considerable for both samples. As part of the analysis, we conduct a t-test to answer the question of whether the means of these two distributions are significantly different. First, we examine whether test assumptions are met. The samples are independent, and the variables meet the requirement that one is continuous (the index variable) and the other dichotomous. The assumption of equality of variances is answered as part of conducting the t-test, and so the remaining question is whether the variables are normally distributed. The distributions are shown in the histograms in Figure 12.3.12 Are these normal distributions? Visually, they are not the textbook ideal—real-life data seldom are. The Kolmogorov-Smirnov tests for both distributions are insignificant (both p > .05). Hence, we conclude that the two distributions can be considered normal. Having satisfied these t-test assumptions, we next conduct the t-test for two independent samples. Table 12.1 shows the t-test results. The top part of Table 12.1 shows the descriptive statistics, and the bottom part reports the test statistics. Recall that the t-test is a two-step test. We first test whether variances are equal. This is shown as the “Levene’s test for equality of variances.” The null hypothesis of the Levene’s test is that variances are equal; this is rejected when the p-value of this Levene’s test statistic is less than .05. The Levene’s test uses an F-test statistic (discussed in Chapters 13 and 15), which, other than its p-value, need not concern us here. In Table 12.1, the level of significance is .675, which exceeds .05. Hence, we accept the null hypothesis—the variances of the two distributions shown in Figure 12.3 are equal. Figure 12.3 Perception of High School Safety among 10th Graders Table 12.1 Independent-Samples T-Test: Output Note: SD = standard deviation. Now we go to the second step, the main purpose. Are the two means (4.40 and 4.80)
”
”
Evan M. Berman (Essential Statistics for Public Managers and Policy Analysts)
“
12.2. The transformed variable has equal variances across the two groups (Levene’s test, p = .119), and the t-test statistic is –1.308 (df = 85, p = .194). Thus, the differences in pollution between watersheds in the East and Midwest are not significant. (The negative sign of the t-test statistic, –1.308, merely reflects the order of the groups for calculating the difference: the testing variable has a larger value in the Midwest than in the East. Reversing the order of the groups results in a positive sign.) Table 12.2 Independent-Samples T-Test: Output For comparison, results for the untransformed variable are shown as well. The untransformed variable has unequal variances across the two groups (Levene’s test, p = .036), and the t-test statistic is –1.801 (df = 80.6, p =.075). Although this result also shows that differences are insignificant, the level of significance is higher; there are instances in which using nonnormal variables could lead to rejecting the null hypothesis. While our finding of insignificant differences is indeed robust, analysts cannot know this in advance. Thus, analysts will need to deal with nonnormality. Variable transformation is one approach to the problem of nonnormality, but transforming variables can be a time-intensive and somewhat artful activity. The search for alternatives has led many analysts to consider nonparametric methods. TWO T-TEST VARIATIONS Paired-Samples T-Test Analysts often use the paired t-test when applying before and after tests to assess student or client progress. Paired t-tests are used when analysts have a dependent rather than an independent sample (see the third t-test assumption, described earlier in this chapter). The paired-samples t-test tests the null hypothesis that the mean difference between the before and after test scores is zero. Consider the following data from Table 12.3. Table 12.3 Paired-Samples Data The mean “before” score is 3.39, and the mean “after” score is 3.87; the mean difference is 0.54. The paired t-test tests the null hypothesis by testing whether the mean of the difference variable (“difference”) is zero. The paired t-test test statistic is calculated as where D is the difference between before and after measurements, and sD is the standard deviation of these differences. Regarding t-test assumptions, the variables are continuous, and the issue of heterogeneity (unequal variances) is moot because this test involves only one variable, D; no Levene’s test statistics are produced. We do test the normality of D and find that it is normally distributed (Shapiro-Wilk = .925, p = .402). Thus, the assumptions are satisfied. We proceed with testing whether the difference between before and after scores is statistically significant. We find that the paired t-test yields a t-test statistic of 2.43, which is significant at the 5 percent level (df = 9, p = .038 < .05).17 Hence, we conclude that the increase between the before and after scores is significant at the 5 percent level.18 One-Sample T-Test Finally, the one-sample t-test tests whether the mean of a single variable is different from a prespecified value (norm). For example, suppose we want to know whether the mean of the before group in Table 12.3 is different from the value of, say, 3.5? Testing against a norm is akin to the purpose of the chi-square goodness-of-fit test described in Chapter 11, but here we are dealing with a continuous variable rather than a categorical one, and we are testing the mean rather than its distribution. The one-sample t-test assumes that the single variable is continuous and normally distributed. As with the paired t-test, the issue of heterogeneity is moot because there is only one variable. The Shapiro-Wilk test shows that the variable “before” is normal (.917, p = .336). The one-sample t-test statistic for testing against the test value of 3.5 is –0.515 (df = 9, p = .619 > .05). Hence, the mean of 3.39 is not significantly
”
”
Evan M. Berman (Essential Statistics for Public Managers and Policy Analysts)
“
The ability to quickly analyze data is vital for a system of countering the laundering of the proceeds of crime, and computerized databases and analytical tools are an important element in achieving this goal. Nevertheless, it is important to keep in mind that electronic databases and software can only facilitate the work of analysts, not replace it.
”
”
International Monetary Fund (Financial Intelligence Units: An Overview)
“
What, then, are the practical steps that pricing managers can take to master Big Data? Companies must recruit a new generation of pricing talent with more of a “trader” profile than an “analyst” one.
”
”
McKinsey Chief Marketing & Sales Officer Forum (Big Data, Analytics, and the Future of Marketing & Sales)
“
Some people are Accommodators; others—like me—are basically Assertive; and the rest are data-loving Analysts.
”
”
Chris Voss (Never Split the Difference: Negotiating as if Your Life Depended on It)
“
The guy with the buzz cut said, “I’m Casey Waterman, FBI.” “Jack Reacher, United States Army.” The guy with the hair said, “John White, CIA.” They all shook hands, and then they lapsed into the same kind of silence Reacher had heard when he stepped in. They had run out of things to say. He sat on a desk near the back of the room. Waterman was ahead of him on the left, and White was ahead of him on the right. Waterman was very still. But watchful. He was passing the time and conserving his energy. He had done so before. He was an experienced agent. No kind of a rookie. And neither was White, despite being different in every other way. White was never still. He was twitching and writhing and wringing his hands, and squinting into space, variably, focusing long, focusing short, sometimes narrowing his eyes and grimacing, looking left, looking right, as if caught in a tortuous sequence of thoughts, with no way out. An analyst, Reacher guessed, after many years in a world of unreliable data and double, triple, and quadruple bluffs. The guy was entitled to look a little agitated. No one spoke. Five
”
”
Lee Child (Night School (Jack Reacher, #21))
“
NASA satellites might have discovered the effect earlier but analysts were blinded by theory; they threw away the data showing very low ozone levels in the Antarctic spring because their data analysis software assumed that the readings must be instrumental error. Beyond
”
”
David Keith (A Case for Climate Engineering (The MIT Press))
“
there is a mismatch between available workers’ existing skill sets and jobs. This mismatch is most acute in data-oriented jobs. Research by MGI and McKinsey’s Business Technology Office suggests that the U.S. is facing a shortage of 140,000 to 190,000 individuals with analytical expertise and 1.5 million managers and analysts with the skills to understand and make decisions based on the analysis of data, and this estimate could easily be off by multiple factors.
”
”
Shawn DuBravac (Digital Destiny: How the New Age of Data Will Transform the Way We Work, Live, and Communicate)
“
By 2018, the United States will experience a shortage of 190,000 skilled data scientists, and 1.5 million managers and analysts capable of reaping actionable insights from the big data deluge.
”
”
Sebastian Gutiérrez (Data Scientists at Work)
“
He was no longer, at least in his mind, the most wanted man in America. He was an analyst, a seer, a prognosticator going over his reams of data, moving their pieces, twisting them, testing them, discounting some, fleshing out others, slowly transforming disjointed intelligence into something that made sense.
”
”
David Baldacci (The Escape (John Puller #3))
“
Hydraulic fracturing has been used safely in over a million wells, resulting in America’s rise as a global energy superpower, growth in energy investments, wages, and new jobs," added Mr. Milito in the statement. Environmental groups have countered that the isolated incidents of contamination confirm their fears about the environmental impacts of hydraulic fracturing. John Noël, of the group Clean Water Action, said in a statement that the report "smashes the myth that there can be oil and gas development without impacts to drinking water." Amy Mall, a senior policy analyst for the Natural Resources Defense Council, said that the EPA study, "while limited, shows fracking can and has impacted drinking water sources in many different ways," according to the Beacon Journal. The EPA report acknowledges that the findings may be due to a lack of data collected, inaccessible information, a scarcity of long-term systematic and base-line studies, and other factors. Bloomberg reported that EPA couldn't come to terms with energy companies including Range Resources Corp. and Chesapeake Energy to conduct water tests near their wells before and after they were fracked, meaning if the agency did find instances of contamination, it was harder to prove that fracking was the cause. "These elements significantly limit EPA’s ability to determine the actual frequency of impacts," the agency said in a fact sheet released with the report.
”
”
Anonymous
“
High-level knowledge of the fundamentals helps creative business analysts see novel formulations.
”
”
Foster Provost (Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking)
“
Amazon could use the data it has about buying behavior to help make these ads much more effective," said Karsten Weide, an analyst at researcher IDC. "Marketers would love to have another viable option beyond Google and Facebook for their advertising.
”
”
Anonymous
“
NSA lawyers even altered some otherwise plain definitions, so that doing this didn’t constitute “collecting” data from American citizens, as that would be illegal: under the new terminology, the NSA was just storing the data; the collecting wouldn’t happen until an analyst went to retrieve it from the files, and that would be done only with proper FISA Court approval. Under
”
”
Fred Kaplan (Dark Territory: The Secret History of Cyber War)
“
For ecommerce data derived from digital experiences, such as the keywords and phrases from search engines to the frequency of purchases of various customer segments, data is most often not normally distributed. Thus, much of the classic and Bayesian statistical methods taught in schools are not immediately applicable to digital ecommerce data. That does not mean that the classic methods you learned in college or business school do not apply to digital data; it means that the best analysts understand this fact.
”
”
Judah Phillips (Ecommerce Analytics: Analyze and Improve the Impact of Your Digital Strategy (FT Press Analytics))
“
I believe that the presence of nuclear weapons in the former USSR played a part as well. At the end of 1991, Ukraine had almost one-fifth of the ground-based warheads in the strategic triad. The total number of strategic weapons there was greater than the total in England and France combined. Data on the distribution of nuclear weapons on the territory of the former Soviet Union are not completely reliable. This is even more evidence of how dangerous the situation was for the country at the end of 1991. See tables 8-6 and 8-7 for (the sometimes conflicting) data provided by informed analysts who have studied the history of the USSR’s nuclear endeavors.
”
”
Yegor Gaidar (Collapse of an Empire: Lessons for Modern Russia)
“
Other than when dealing with exception-unsafe legacy code (which we'll discuss later in this Item), offering no exception safety guarantee should be an option only if your crack team of requirements analysts has identified a need for your application to leak resources and run with corrupt data structures. As
”
”
Scott Meyers (Effective C++: 55 Specific Ways to Improve Your Programs and Designs)
“
Ramakrishna Paramhans Ward,
PO mangal nagar, Katni, [M.P.]
2nd Floor, Above KBZ Pay Centre, between 65 & 66 street,
Manawhari Road Mandalay, Myanmar
Phone +95 9972107002
Statistical surveying assumes a critical part in understanding purchaser conduct, market patterns, and contest in any industry. Market research surveys are essential for businesses looking to stay ahead of the competition and make well-informed decisions in the context of Myanmar, a rapidly changing market with increasing opportunities and challenges. This article investigates the meaning of, market research survey in Myanmargives experiences from a new study led by AMT Statistical surveying, and gives suggestions for organizations working in this powerful market climate.
# Prologue to Statistical surveying in Myanmar
With regards to figuring out purchaser conduct, inclinations, and patterns, statistical surveying assumes a critical part. In Myanmar, a country with a quickly developing business sector scene, directing thorough statistical surveying is fundamental for organizations to settle on informed choices. By get-together important experiences through overviews and information investigation, organizations can fit their items and administrations to meet the particular necessities of Myanmar's different shopper base.
## Understanding the Market Scene
Myanmar's market scene is dynamic and different, with a developing economy and an inexorably educated populace. Businesses must keep up with the latest market trends and consumer preferences in order to stay ahead of the curve as the country continues to open up to foreign investment and trade. Directing statistical surveying reviews is an essential method for acquiring a more profound comprehension of the way of behaving and needs of Myanmar's shoppers, assisting organizations with recognizing open doors for development and development.
# Significance of Directing Statistical surveying Studies
Statistical surveying studies are important devices for organizations hoping to acquire an upper hand in Myanmar's clamoring market. By gathering information straightforwardly from purchasers through reviews, organizations can accumulate bits of knowledge that illuminate their essential dynamic cycles. From recognizing arising patterns to understanding consumer loyalty levels, statistical surveying reviews give organizations significant data that can shape their advertising procedures and item improvement drives.
## Advantages of Statistical surveying for Organizations
The advantages of directing statistical surveying studies are huge. By understanding shopper inclinations and conduct, organizations can fit their items and administrations to successfully address the issues of their main interest group. Additionally, market research surveys assist businesses in identifying new market opportunities, assessing levels of customer satisfaction, and assessing the efficacy of their marketing campaigns. At last, statistical surveying engages organizations to settle on information driven choices that drive development and outcome in Myanmar's serious market climate.
# Outline of AMT Statistical surveying Organization
AMT Statistical surveying is a main market research survey in Myanmar, known for its creative exploration philosophies and wise examination. AMT Market Research has a team of knowledgeable researchers and analysts who specialize in providing individualized research solutions to assist businesses in navigating Myanmar's market landscape's complexities.
## About AMT Statistical surveying
AMT Statistical surveying is focused on conveying excellent examination benefits that convey significant experiences to clients across different enterprises. From market division and customer conduct examination to contender profiling and pattern determining, AMT Statistical surveying offers a complete
”
”
market research survey in Myanmar
“
Instead, I bashed “data analyst” and “Parker” into Google. A box appeared on my screen and demanded to know if I was a robot or not. “How many times do I need to tell you, Google, I am not a robot,” I snapped. The prompt asked me to pick all the images of boats I could see. “Could a robot do this?” I said dramatically, and quickly clicked on the four images of boats. A message popped up: Please try again.
”
”
L.M. Chilton (Swiped)
“
nudge.” We are seduced by pundits and data analysts, soothsayers who are often wrong, but rarely uncertain. When given the choice between complex uncertainty and comforting—but wrong—certainty, we too often choose comfort. Perhaps the world isn’t so simple. Can we ever understand a world so altered by apparent flukes?
”
”
Brian Klaas (Fluke: Chance, Chaos, and Why Everything We Do Matters)
“
As a reader and analyst of data myself, I get a joyful thrill every time I zoom out on the English language and realize that we’re somewhere in the middle of its story, not at the beginning or end. I don’t know how we’ll be writing in the twenty-second century, but I feel a responsibility to help its linguists gain a broad cross-section of the language of the twenty-first by not lingering overlong in the twentieth. To that end, I’ve chosen to lowercase “internet” and social acronyms like “lol” and “omg” and to write “email” rather than “e-mail,” and when I’ve needed to make a decision on other spelling choices, I’ve looked up which ones are more common in the Corpus of Global Web-Based English and tweets by ordinary people rather than which ones are favored by usage manuals, which has led me to close many compound words.
”
”
Gretchen McCulloch (Because Internet: Understanding the New Rules of Language)
“
Another common issue is the lack of interdisciplinarity in the transformation. In our experience, the highest impact is the result of multi-lever end-to-end process automation – not small, siloed implementations, focused on one single technology lever. To achieve this, management should advocate for getting the right talents from across the different parts of an organization to work together (e.g., data scientists, developers, business analysts). Interdisciplinarity is also about avoiding limiting the transformation to the implementation of one single technology lever (e.g., RPA), and about implementing IA on end-to-end processes instead of only a few process tasks. By combining talents and technology levers and targeting end-to-end processes, the organization will create synergies, build economies of scale, and remove potential bottlenecks. Organizations failing to achieve this are not able to scale their IA transformation.
”
”
Pascal Bornet (INTELLIGENT AUTOMATION: Learn how to harness Artificial Intelligence to boost business & make our world more human)
“
The talent required within the CoE is wide and ranges from business and operations excellence to risk and IT departments. According to McKinsey’s survey, the CoE of top-performing companies includes a large variety of profiles such as delivery managers, data scientists, data engineers, workflow integrators, system architects, developers, and, most critically, translators and business analysts.152 A
”
”
Pascal Bornet (INTELLIGENT AUTOMATION: Learn how to harness Artificial Intelligence to boost business & make our world more human)
“
Putin demanded that St. Petersburg companies register with the Committee for External Relations to turn over data on their finances. Working with the tax inspectorate, Putin’s analysts examined firms’ tax payment records.
”
”
Chris Miller (Putinomics: Power and Money in Resurgent Russia)
“
Often, no matter how careful you tried to be, sheer exhaustion would lead to errors that weren’t caught until it was too late. Sometimes it was due to what we called the F9 mistake. Back then, computers were very slow, so you didn’t want to wait for the spreadsheet program to recalculate automatically every time you made a change. You would instead turn off that feature, but then you needed to be careful to remember to hit F9 at the end, which would trigger the recalculation of data throughout the model. There were always stories about analysts who made a bunch of changes and then forgot to hit F9, printing the books with faulty numbers. They might realize during the client presentation, or perhaps after the meeting, that the wrong data had been utilized. The models were so complicated that usually no one would notice, but people were making big decisions based on erroneous information. How many deals were done, we wondered, or people laid off because some sleep-deprived analyst got a model wrong? Steve forgot to hit F9; ten thousand people got fired.
”
”
Christopher Varelas (How Money Became Dangerous: The Inside Story of Our Turbulent Relationship with Modern Finance)
“
HouJeng provides expert betting picks and tips for horse races around Hong Kong using artificial intelligence. We Are A Collective Of Data Analysts And Machine Learning Enthusiasts. We Enjoy The Complexities Of Data Forecasting And Data Analysis. Boost your horse racing betting with our horse racing betting tips for today.
”
”
HouJeng
“
The caucus room was essentially a makeshift trading floor. Harrison and his team developed a point of view on the large, multiyear trade to which they were committing Koch Industries. They evaluated the multiyear price that Koch should pay for the IBU workers’ labor, and they treated this trade exactly as Koch treated a multiyear hedge on oil prices. They sucked in data from diverse sources like federal labor statistics, private financial services, and even other labor unions. Backed by a team of analysts with spreadsheets, they analyzed the market and figured out their view on what the true price of the labor should be. This was a technique that Koch Industries had used since at least the 1990s. Randy Pohlman, the former Koch human resources executive, said Koch’s team in the caucus room used spreadsheets to tweak and tailor the numbers even as negotiators worked next door.
”
”
Christopher Leonard (Kochland: The Secret History of Koch Industries and Corporate Power in America)
“
Given the recent remarkable advances in artificial intelligence, scouting will probably involve “algorithmic warfare,” with competing AI systems plowing through vast amounts of data to identify patterns of enemy behavior that might elude human analysts. Identifying enemy operational tendencies may also aid commanders in employing their forces more effectively, similar to the way the introduction of operations research aided the allies in identifying effective convoy operations during the Battle of the Atlantic in World War II.30 AI could potentially assist efforts to develop malware, which could be used to erase or corrupt enemy scouting information, including the enemy’s AI algorithms themselves. If these efforts are successful, enemy commanders may lose confidence in their scouts, producing a “mission kill,” in which much of the enemy’s scouting force continues to operate but where its product is suspect.
”
”
Andrew F. Krepinevich (The Origins of Victory: How Disruptive Military Innovation Determines the Fates of Great Powers)
“
fundamental analysts focus their attention on company finances and economic data about industries for which the stocks trade (also known as industries). They are concerned with factors like corporate earnings reports, profit margins, unemployment rates, and gross domestic product (GDP) growth rates. They examine these economic factors to determine how they will affect the demand and supply of a particular stock.
”
”
Andrew Elder (Technical Analysis for Beginners: Candlestick Trading, Charting, and Technical Analysis to Make Money with Financial Markets Zero Trading Experience Required (Day Trading Book 3))
“
fundamental analysts turn to earnings reports and other data released by companies that provide some indication of how well or poorly they are performing. The fundamental analyst looks at how the company is doing as a whole and tries to get an overall grasp of how the market reacts to these reports.
”
”
Andrew Elder (Technical Analysis for Beginners: Candlestick Trading, Charting, and Technical Analysis to Make Money with Financial Markets Zero Trading Experience Required (Day Trading Book 3))
“
Technical analysis is more concerned with the price movements of a stock or an index by examining historical records of trading activity. A technical analyst looks at past data to predict future price movements. They believe that history tends to repeat itself in the stock market and that past performance is the best indicator of what will happen in the future.
”
”
Andrew Elder (Technical Analysis for Beginners: Candlestick Trading, Charting, and Technical Analysis to Make Money with Financial Markets Zero Trading Experience Required (Day Trading Book 3))
“
It appeared that OP-20-G analysts were nourishing King’s anxieties by sending him cherry-picked data that could be interpreted as pointing to such attacks. The men who had King’s ear were unduly alarmist, and their impulsive theories might incite the fleet to chase its own tail.
”
”
Ian W. Toll (Pacific Crucible: War at Sea in the Pacific, 1941–1942)
“
context in which data is collected. For this reason we cannot emphasize enough the importance of collaboration between both hospital staff and research analysts. Some examples of common issues to consider when
”
”
Mit Critical Data (Secondary Analysis of Electronic Health Records)
“
Thinking about the projector as a performance tool, a display mechanism, a playback machine, a decompressor of content, an image-enlarger, a sound amplifier, a recording device, and an audiovisual interface carries far richer interpretive possibilities than thinking about it as the poor cousin of the movie theater. It also helps us to explain more about why film has long mattered across many realms of cultural and institutional activity. Critically shifting how we conceptualize what a projector is and does opens a window to a wider array of other media devices that performed the work of storing, decompressing, and yielding content, as well as interfacing with users, viewers, and analysts. Drawing on innovations in precision mechanics, chemistry, optics, and electrical and eventually acoustic and magnetic engineering, projectors catalyzed alternate ways of presenting recorded images and sounds, converting celluloid and its otherwise indecipherable inscriptions into visible and audible content, usable data, productive lessons, and persuasive messaging. In doing so they shaped performance and presentation for audiences of
”
”
Haidee Wasson (Everyday Movies: Portable Film Projectors and the Transformation of American Culture)
“
Early in the new millennium it became apparent to anyone with eyes to see that we had entered an informational order unprecedented in the experience of the human race. I can quantify that last statement. Several of us—analysts of events—were transfixed by the magnitude of the new information landscape, and wondered whether anyone had thought to measure it. My friend and colleague, Tony Olcott, came upon (on the web, of course) a study conducted by some very clever researchers at the University of California, Berkeley. In brief, these clever people sought to measure, in data bits, the amount of information produced in 2001 and 2002, and compare the result with the information accumulated from earlier times. Their findings were astonishing. More information was generated in 2001 than in all the previous existence of our species on earth. In fact, 2001 doubled the previous total. And 2002 doubled the amount present in 2001, adding around 23 “exabytes” of new information—roughly the equivalent of 140,000 Library of Congress collections.1 Growth in information had been historically slow and additive. It was now exponential. Poetic minds have tried to conjure a fitting metaphor for this strange transformation. Explosion conveys the violent suddenness of the change. Overload speaks to our dazed mental reaction. Then there are the trivially obvious flood and the most unattractive firehose. But a glimpse at the chart above should suggest to us an apt metaphor. It’s a stupendous wave: a tsunami.
”
”
Martin Gurri (The Revolt of the Public and the Crisis of Authority in the New Millennium)
“
Trademark
Trademark is fundamentally exceptional of a licensed innovation comprising plans, logos, and imprints. Organizations utilize different plans, logos, or words to recognize their items and administrations from others. Those imprints which help in distinctive the item or administrations from others and help the clients in distinguishing their image, quality, and even source of the item is known as Trademark.
In contrast to licenses, a brand name is enlisted for a very long time, and from that point, it tends to be recharged for an additional 10 years after an additional installment of reestablishment expenses.
Trademark Objection
After the enrollment of the brand name, an Examiner/Registrar or outsider can set a trademark objection. As per Section(s) 9 (Absolute Grounds of Refusal) and 11 (Relative Grounds of Refusal) of the Act, these two can be the ground of a complaint:-
The application contains wrong data, or
Comparable or indistinguishable brand names exist.
At whatever point a Trademark enlistment center mentions a criticism, a candidate has an occasion to send a composed answer alongside the strong proof, realities, and reasons why the imprint ought to be assigned to him within 30 days of the protest.
On the off chance that the analyst/enlistment center discovers the answer to be adequate and addresses the entirety of his interests in the assessment report and there is no contention, at that point he may give authorization to the candidate to distribute the application in the Trademark diary before enrollment.
How to respond to an objection
A Trademark assessment report is set up on the Trademark office site alongside the subtleties of the brand name application and a candidate or a specialist has the occasion to send a composed answer which ought to be known as a trademark objection reply.
The answer can be submitted as "Answer to the assessment report" either on the web or it tends to be submitted through a post or individual alongside supporting archives or a sworn statement.
When the application gets recorded a candidate ought to be given a notification about the protest and ground of the complaint. Different grounds are:-
There ought to be a counter assertion of the application,
It ought to be recorded within 2 months of the application,
On the off chance that the analyst neglects to record a complaint inside the time, at that point the status of the application will be deserted.
After recording the counter of a complaint, the enlistment center will call a candidate for the meeting. On the off chance that it rules in the courtesy, at that point, the candidate will get it enrolled, and on the off chance that the answer isn't agreeable, at that point, the application for the enlistment will get dismissed.
Trademark Objection Reply Fees
Although I have gone through various sites, finding a perfect formal reply is quite difficult. But Professional Utilities provides a perfect reply through experts, also the trademark objection reply fees are really affordable. They provide services for just 1,499/- only.
”
”
Shweta Sharma
“
Total retail sales rose 0.7 percent in November, as holiday shopping began, and that came despite a sharp tumble in gasoline prices that reduced the dollar value of sales at gas stations by 0.8 percent. Analysts had expected a rise of only 0.4 percent. Read narrowly, the results show that some survey data suggesting weak post-Thanksgiving Black Friday sales was misleading at best; retail trade groups said at the time that they believed consumers spread their spending more evenly through November than they have in the past, and that appears to hold up.
”
”
Anonymous
“
I once had a foreign exchange trader who worked for me who was an unabashed chartist. He truly believed that all the information you needed was reflected in the past history of a currency. Now it's true there can be less to consider in trading currencies than individual equities, since at least for developed country currencies it's typically not necessary to pore over their financial statements every quarter. And in my experience, currencies do exhibit sustainable trends more reliably than, say, bonds or commodities. Imbalances caused by, for example, interest rate differentials that favor one currency over another (by making it more profitable to invest in the higher-yielding one) can persist for years. Of course, another appeal of charting can be that it provides a convenient excuse to avoid having to analyze financial statements or other fundamental data. Technical analysts take their work seriously and apply themselves to it diligently, but it's also possible for a part-time technician to do his market analysis in ten minutes over coffee and a bagel. This can create the false illusion of being a very efficient worker. The FX trader I mentioned was quite happy to engage in an experiment whereby he did the trades recommended by our in-house market technician. Both shared the same commitment to charts as an under-appreciated path to market success, a belief clearly at odds with the in-house technician's avoidance of trading any actual positions so as to provide empirical proof of his insights with trading profits. When challenged, he invariably countered that managing trading positions would challenge his objectivity, as if holding a losing position would induce him to continue recommending it in spite of the chart's contrary insight. But then, why hold a losing position if it's not what the chart said? I always found debating such tortured logic a brief but entertaining use of time when lining up to get lunch in the trader's cafeteria. To the surprise of my FX trader if not to me, the technical analysis trading account was unprofitable. In explaining the result, my Kool-Aid drinking trader even accepted partial responsibility for at times misinterpreting the very information he was analyzing. It was along the lines of that he ought to have recognized the type of pattern that was evolving but stupidly interpreted the wrong shape. It was almost as if the results were not the result of the faulty religion but of the less than completely faithful practice of one of its adherents. So what use to a profit-oriented trading room is a fully committed chartist who can't be trusted even to follow the charts? At this stage I must confess that we had found ourselves in this position as a last-ditch effort on my part to salvage some profitability out of a trader I'd hired who had to this point been consistently losing money. His own market views expressed in the form of trading positions had been singularly unprofitable, so all that remained was to see how he did with somebody else's views. The experiment wasn't just intended to provide a “live ammunition” record of our in-house technician's market insights, it was my last best effort to prove that my recent hiring decision hadn't been a bad one. Sadly, his failure confirmed my earlier one and I had to fire him. All was not lost though, because he was able to transfer his unsuccessful experience as a proprietary trader into a new business advising clients on their hedge fund investments.
”
”
Simon A. Lack (Wall Street Potholes: Insights from Top Money Managers on Avoiding Dangerous Products)
“
Finally, to determine the specific service running on the port, we will send garbage data and read the banner results sent back by the specific application.
”
”
T.J. O'Connor (Violent Python: A Cookbook for Hackers, Forensic Analysts, Penetration Testers and Security Engineers)