Getting ahead in the Lucrative Field of Data Massaging

Evan Warfel has an excellent comment on a post of Andrew Gelman's. Reproduced in full:
Perhaps we are teaching statistics backwards. Instead of teaching students to try and come up with the correct result, we could teach what it feels like to rationalize one’s way through to non-objectivity.

A final exam question might go: This dataset consists of 5 completely uncorrelated variables — I’ve labeled the columns as ‘weight of cat’, ‘probability of attrition’, ‘color of cat [in RGB]’, ‘current age of subject’ and ‘SAT verbal score’. Find a way to make 3 statistically significant correlations and one non-significant correlation. You get an extra point for each spurious t-test you can come up with. The catch is that your entire analysis has to form part of a coherent story. Bonus points go to the 5 most concise answers.


Predictions Concerning Migration to Germany

1. The current love-fest, remindful of the opening of the Berlin Wall, will soon end and something in the range between disillusionment and xenophobia will set in. Like the post-reunification hangover, really, only on steroids, coke and speed.

2. Family reunification legislation (Familienzusammenführung) will be severely tightened within the next three years.


Playlists by Year: A Tape Side's Worth of 1961

The greatest songs (and non-song tracks) from 1961, as far as I can tell.


Robin Hanson's Final Words on Signaling

"Falsifiability is just not a very useful concept in social science. Really."

A Two-Step Model of Class-typical Behaviour

Let's start with the example: In the U.S., high-SES people used to smoke more than low-SES people until about 1965. Then the lines crossed, once, and they never crossed again. These days, there are many high-SES people that you don't have to tell about health risks: to them, smoking is prole. And who wants to be prole?

More generally, there are many behaviours that low-SES people show more frequently than low-SES people, and vice-versa. Why? Let me propose a two-step model. First, there is some initial reason why a certain behaviour is shown more often by low-SES people. Then, the behaviour becomes associated with being low SES. Then, the behaviour is reduced even more by high-SES people. 

Smoking is, I think, a good example. Initially, high-SES people may have had access to better information, or have been better at processing the information, or had more self-control, or have put a higher value on health, or what have you, or all of the above. This created an initial smoking gap. This helped associate smoking with being prole. This, in turn, caused people who don't want to be seen as prole to smoke less.

In some cases, the reason for the initial reason could simply be chance.

The model implies that SES differences in smoking were easier to explain in terms of the psychological factors mentioned above (more self-control, etc.) in 1970 than today. Generalizing this is left as an exercise to the reader.


How to Keep Your Man Happy

1. Have sex with him when he wants to.

2. Don't question his respectability.

It seems to me these are the two rules that are true for almost every man, and at the same time are specific to keeping your man happy, rather than keeping your spouse happy.


Negative Externalities

It is not from the benevolence of the butcher, the brewer, or the baker that we expect our dinner, but from their regard to their own interest.
Adam Smith, The Wealth of Nations

Overjoyed, yet slightly appalled that Richard would even think of, let alone do, such a thing, the man gave him ten thousand dollars for the contract, and a second ten thousand dollars for the incredible suffering the mark had experienced.

"You did a good job," he said. Richard liked to please his customers; that was how his business had grown over the years.
 Philip Carlo, The Ice Man: Confessions of a Mafia Contract Killer


Yeah, sort of.

(It worked on reload.)

Economics, Sociology, Extrinsic and Intrinsic Motivation, and Variance, or, Advice for the Tribal Social Scientist

Someone whose name I've forgotten said that economics is all about how people act rationally and sociology is all about how they don't. That's catchy, but not very helful, because it makes you think about the exact meaning of the term "rational", and before you know it, you're writing a book. Let me propose instead (and of course I'm using the broad brush here) that economics is how people are driven by extrinsic motivation and sociology is about how they're driven by intrinsic motivation.

Of course, people are driven by both, so a good social scientist should consider both. But there's more to be said about the two. Extrinsic and intrinsic motivation are functional equivalents, and the less variance there is in one of the two, the more variance in your dependent variable the other is going to explain.

That's a little abstract, so here's an example. For the purposes of the example, please accept the simplification that the two types of motivation are completely independent of each other.

Consider a company unit in which variance in intrinsic work motivation is low, and the mean is also low - that is, everybody's a lazy bastard. Then variance in extrinsic motivation will explain a lot of variance in behaviour. That is, those who have a higher incentive, such as financial rewards, to work, will work harder, while those that have little incentive will work little (high variance, middling mean).

Now consider a company unit in which variance in intrinsic work motivation is low, and the mean is high. Here, everyone will work hard (low variance, high mean), and differences in incentives will have little effect.

Next, start by thinking about extrinsic motivation. If extrinsic motivation's variance is low with a low mean (everyone gets minimum wage, regardless), then how hard people work will be driven by their intrinsic motivation (high variance, middling mean).

Finally, if extrinsic motivation's variance is low with a high mean, everyone will try hard (low variance, high mean).

What does that leave us with? First, if variance in one independent variable is lower, then variance in the other dependent variable will explain more of the variance in the dependent variable. Admittedly, that's a mathematical necessity, and not a new insight. More specifically, it means that economists who want to stick it to the guys in the other building had better select topics where variance in people's intrinsic motivation is low, while sociologists who want to teach those arrogant dicks in the Adam Smith building a lesson should select reasearch areas in which variance in people's extrinsic motivation is low.

And if you own a company, you ought to put a lot of time into selecting people with high intrinsic work motivation, and also think hard about how to taylor rewards to employees' behaviour. You'll want to do both, because you'll be perfect at neither.


Playlists by Year: A Tape Side's Worth of 1958

The greatest songs and other tracks, according to me, from 1958:

Virtual Reality, or, What's So Great about Knausgaard?

I've only read the first two volumes so far, but here's the best answer I've yet seen to the question Why is Knausgaard so great - when really, with all the detail and mundane plotlines, he should be boring:
The answer lies not in Knausgaard’s depth of revelation so much as the intensity of focus he brings to the subject of his life. He seems to punch a hole in the wall between the writer and reader, breaking through to a form of micro-realism and emotional authenticity that makes other novels seem contrived, “made up”, irrelevant. As [Zadie] Smith put it: “You live his life with him. You don’t simply ‘identify’ with the character, effectively you ‘become’ them.”
There's so much talk about literature's ability to put the reader in someone else's head - this is often portrayed as the feature that most differentiates writing from other forms of art, and Steven Pinker even singled out the increase in putting-yourself-in-other-people's-heads caused by the invention of the printing press as the trigger that started the long-term decline in violence ca. 1500-2000. I've read a fair bit of fiction and some memoirs, but have never seen anyone doing it like Knausgaard does.


Genetics, Human Capital, and the Thomas Theorem

In this short presentation on "Genetics and Society" (video), Gregory Cochran points out an obvious incompatability between human capital theory and empirical findings in behaviour genetics: Human capital theory proceeds as though differences in human capital were solely the product of environmental influences, and especially decisions made by parents, but this is known to be false. Cochran goes on to say this has an impact on a point made in human capital theory concerning the quality-quantity tradeoff: The standard view is that you can have many kids and low investment per kid, leading to relatively low human capital in each kid, or you can have many kids, invest heavily in them, and expect them to exhibit high human capital. To the extent that human capital is influenced by genetics, and to the extent that it is not influenced by parents, this tradeoff does not exist. For example, IQ shows practically no response to differences in parental behaviour, hence your kids' expected IQ is independent from your investments, hence independent from the number of kids you have.

But that's a normative point. Empirically, how much people plan to invest in their kids and how many kids they hence choose to have, should be influenced not by the truth itself, but by what people believe to be true. Belief in genetic influences on people's characteristics decreased ca. 1920-1950 and is now low. While research results from the 1970s onwards have shown the popular view to be wrong, these results are not widely known and believed. Hence, decreases in the number of kids people have might still in part be explained by the above-mentioned aspect of human capital theory, if it is combined with the Thomas theorem: "If men define situations as real, they are real in their consequences."


The Greatest Tracks 2010-2014, Part 3

And, finally, the cream of the crop.


The Best Tracks from the First Half of the 2010s, Part 2

Installment 2/3 of the greatest songs from the first half of the 10s. Enjoy!


Playlist: The Greatest Songs 2010-2014, Part 1

Hey, it's already been half a decade! Here's 50 choice tracks from it. Well, actually, here's the first bunch of 20. Next installment tomorrow.


Does Altruism Exist? Against the "Warm Glow" Argument

Models of human decision-making are just that: models, not the real thing. Another way to put this is to say that models are wrong. As the famous saying goes, all models are wrong, but some are useful. This leads to the question, useful for what? That is, just because a model is useful in one domain, doesn't mean it's useful in another domain.

Case in point: economists. Traditionally, they have been working with a model of rational, egoistic decision-makers. Pretty much everybody knows that model is wrong because people sometimes act altruistically, but the model is useful model for many purposes. The trouble starts when people forget that it's just a model that is supposed to be good for some purposes (and not others) and start to defend the model as though it were an entirely accurate description - a view that obviously does not conform to the empirical evidence.

Bryan Caplan is not one of those people. He points out that altruism is real. Anticipating counter-arguments, he adds:
Sure, true believers in ubiquitous selfishness can grasp at straws to protect their dogma.  Perhaps people donate blood for the free cookie, join the army because they might run for office one day, or give to charity in order to make business connections.  Or maybe millions of average joes are clueless enough to believe that the blood supply, the safety of the free world, and the availability of charity hinge on whatever they personally choose to do. 

Anything is possible, but that doesn't mean that anything is plausible.  [...] Genuine altruism is all around us.  Benevolence doesn't explain why bakers bake bread for paying customers, but it does explain why blood donors give blood to strangers for free.
Naturally, this prompts his readers to double down on the altruism-is-really-egoism stuff. For example, his reader Caliban Darklock writes:
I suggest that if there exists an incentive, the activity is not altruistic.

If I give a person $10 for drugs, and then I take the drugs and they give me an endorphin rush, that is not altruistic.

If I give a person $10 because it makes me feel good about myself, which gives me an endorphin rush, how is that altruistic?

I traded $10 for an endorphin rush either way. What is the rational distinction between them?
This is generally known as the "warm glow" argument: If you give money to charity, for example, you get a positive feeling ("warm glow") in return. A counterargument to this comes from Jon Elster (cited from memory): If you could raise your future utility by taking a pill that erases all your altruistic motivations, would you do it? Probably not, because you think it would induce you to do things that are morally wrong.

Antoher counterargument that I've just though up: If, for example, giving a person $ 10 makes you feel good about yourself, this already presupposes you're an alturist. If you weren't, it wouldn't make you feel good about yourself. That is, the argument presupposes what it tries to disprove. I guess philosophers have a name for this; I don't.


Cops Don't Shoot People, Guns Do

I'm a fan of the right to keep and bear arms. But I prefer unarmed police and restricted gun rights to strong gun rights combined with a police force that regularly shoots civilians 'by accident'.
This is in the context of a discussion that uses the U.S. as an example of a country where cops bear arms as a default and New Zealand as an example of a country where they don't. The danger of police carrying guns is readily apparent given recent events and discussions about them in the U.S.: if they have guns, cops might use them all too often. I'm guessing a comparison of death by cop statistics between New Zealand and the U.S. would support that view.

But there's another important variable: the availability of guns to citizens. Apparently (based on information in the thread I link to above), it is pretty limited in NZ, whereas in some U.S. states, any Tom, Dick and Harry can buy a gun. Let me submit the theory that this is what really counts. I'm basing this view on a third data point: Germany. Here, cops routinely carry guns. If you play your music too loud at 10.01 p.m., the cops you'll find knocking on your door will be carrying fully loaded pistols. And yet, in 2011, police fired only 85 bullets while on duty (presumably not counting training), of which 49 were warning shots and 36 aimed at people; 15 people were injured and 6 killed. The numbers for 2010 were 96, 59, 37, 17 and 7, respectively.

Let me wildly generalize from that small heap of data and assumptions: When the probability is high that the other person has a gun, police will be quick to shoot. Part of this is split-second rational(ish) decision making, but there is also a wider institutional context in which this occurs - such as police guidelines about when to shoot and where to aim. The way to reduce police killings of citizens is hence to make it hard for citizens to bear arms.

Playlists by Year: A Tape Side's Worth of 1956

Merry Afterchristmas everybody!


Playlists by Year: A Tape Side's Worth of 1955

The greatest songs (including instrumentals) from that year, as far as I can tell:


The Thinker's Advantage

Men are often startled when, without any warning, their dearly beloved suddenly asks “What are you thinking about right now?”

Naturally, the last thing a man should give is a truthful answer. Endless trouble will ensue if the man innocently replies: “Having sex with your best friend”. Therefore, in the unaccustomed role of agony uncle, I would suggest that men prepare a response in advance, and trot it out when required.
Good advice. Although, in honesty, I can remember only one time when I was the recipient of that clichéd question. To which I answered, thruthfully, "I was thinking, 'Should I ever get rich, it would be nice to have a separate room to put a pool table in'". That taught her, I guess.

What happens to me more often is that people think I'm looking at something specific, when actually I'm just staring into undefined space, usually reflecting on what was just said. (I'm not particularly quick.) With some delight I've noticed that a colleague of mine has memorized this as a characteristic of mine after I'd repeatedly explained to her that, no I wasn't looking at her shoes, I was just thinking, and my eyeballs have to point somewhere. I know she's memorized this as she recently started saying something about how I was probably hungry, the way I was looking at her meal, oh, no, wait, I was probably just thinking, right? That'll come in handy the next time she starts thinking I'm staring at her tits, when I'm actually, you know, staring at her tits. They're lovely, and I can't help it.


Ich habe den Eindruck, dass der Empfehlungsalgorithmus von rebuy noch verbesserungsbedürftig ist

Ich habe zwar noch nie was von Jojo Moyes gelesen, vermute aber, dass mir das nicht gefallen würde. Ich beurteile Bücher nämlich gern nach ihrem Umschlag. Die werden schließlich nicht nach dem Zufallsprinzip zugeteilt.

Florian Illies, Shamelessly Jumping on Angrist & Pischke's Train

Good books both.


Unromantic Advice for Women

The less attractive you are, compared to other women of your age, the earlier you should look to getting married, assuming you're so inclined at all.


Playlists by Year: A Tape Side's Worth of 1954

Never mind the best-of-the-decade-type lists. Starting today, we'll go year by year. Each playlist will be as long as one standard tape's side. That is, no longer than 45 minutes.

The first is 1954, for the simple reason that it is the first I could get a good 45 minutes together; it is also the year that rock'n'roll broke.

There is no regular schedule for the release of new yearly lists, but I guess it's going to be about once a month.

And here's the first list:


Biological Reality of Race? What Does It Even Mean? (Also: Free access to Sage journals)

Via Dan Hirschman at Scatterplot comes a debate in Sociological Theory about the nature of race: is it social and/or biological? The new contributions consist of three critical reactions to an article in the 2012 volume of the same journal by Jiannbin Lee Shiao, Thomas Bode, Amber Beyer and Daniel Selvig called "The Genomic Challenge to the Social Construction of Race", and a rejoinder by Shiao. 

The topic isn't new, and the sub-exchange between Shiao in one corner and Daniel Martinez HoSang in the other confirms what something I've long been thinking about this.  

As you may know, variants of cluster analysis can be used to group individuals' genomes on the basis of similarities and dissimilarities, and it has been shown that the resulting clusters correspond to racial categories, as measured by self-identification, for example. One of the two main arguments in the initial Shiao et al. paper is that this clearly shows that the view that race has no biological basis, held by so many sociologists, is wrong.

HoSang's article ends in an attempt at character assassination that stops just short of holding Shiao et al. personally responsible for the gas chambers in Auschwitz, but the earlier portions actually have serious content. HoSang voices misgivings about the validity of the cluster analyses and their interpretation by Shiao et al. and others, but then goes on to say (p. 233):
And even if one accepts the (contested) finding that self-identified race or ethnicity correlates with population structure, this finding does not justify a conclusion that “race” (or clinal class) has a biological basis. At the most quotidian level, the findings suggest that a statistical analysis of genetic ancestry informative markers of a population in the United States that self-identifies as “black” is likely to bear a relationship to an analysis of populations sampled in some region of sub-Saharan Africa. And a population that self-identifies as Chinese is likely to be statistically related with a population in China (Dupré 2008). That a new statistical technique has validated a high probability of such histories of migration is hardly revelatory; it does not establish a biological basis of race.
But  Shiao et al. clearly think just that: These findings show that race has a biological basis.

I suggest that people who wish to have this debate take a step back and start by reaching an agreement on the following:

1. What does it means to say, "Race has a biological basis"? What does it mean to say "Race is a biologically meaningful concept"? Are the two the same?

2. What evidence, if it existed, would show that race has a biological basis/is a biologically meaningful concept? What evidence, if it existed, would refute those claims?

If you don't do that, you'll debate ad infinitum.

Added: Along similar lines, Fabio Rojas comments.


By the way, you can download all of the articles above, as Sage allows open access to all of its journals until October 31st (registration required).


Spoiler Alert!

An hommage to Alfred Hitchcock's second best movie, by one Jeff Desom.

Those who prefer vimeo, look here.


Stereotypes in Narrative Art

Recently watched one of those shows in which four critics - it's always four - debate the quality of new books. Critic A said that a character in the novel in question was clichéd. No, said critic B, such women really exist!

Sigh. First, it would be a strange complaint about a fiction book to say something's not realistic. Second, a cliché is but a stereotype, and the existence of a stereotype does not mean it's not true. If anything, the opposite is the case.

Stereotypes in literature and other forms of narrative art can be a problem for a different reason: they're not cognitively challenging, allowing the mind to quickly call up information about the character, because it has this information stored and connected to the characteristics you're given. If your mind's in the mood for a bit of a challenge, it's likely to be bored by cut-out characters. But the unchallenging nature of clichéd characters can also be a virtue: It allows the storyteller to quickly dispense information.

Hence, a rule of thumb for telling stories: If the character's only there for performing a function, you might want to reach for the cliché. Minimum fuss for the reader, info received, move on to the important stuff. On the other hand, if the character's a main attraction of the story, you want to make the character somewhat interesting, hence somewhat challenging, hence somewhat non-clichéd. Of course, another aspect is your intended audience. It's no coincidence that children's TV shows feature very bland characters, and that the villains are particularly bland. Often, the villains are not themselves meant to be interesting, their only function is to act as adversaries and allow the heroes to defeat them.

It seems all of this is but a special case of a more general rule. After all, the same argument has been made with respect to clichéd metaphors.


A Proposal for a New Norm

There are three types of favours one might ask of a friend: Unacceptable requests, acceptable requests, and the grey area between the two of them. Acceptable requests are fine to just ask, unacceptable requests shouldn't normally be asked. Here I am interested in grey area-type requests: stuff that is not unacceptable, but that might place so heavy a burden on the askee that you wouldn't be cross with her if she refused.

Let me propose a norm for how you should go about if you've decided you still want to ask the favour. Your aim should be to make it as easy as possible for the friend to say no. This means you want to do the opposite of what's done by a power salesman, who tries to get people to say yes. A power salesman will try to get as close to you as he can. Ideally, he wants you in person; the telephone is the next best thing. Conversely, you should try to keep a distance from the askee when you ask  and when she gives her answer. This means you put the request in writing: e-mail, letter, fax - I don't care.

Also, including something along the lines of "of course you can say no" helps.


Low Status and Economic Inequality: Two Points Often Overlooked

Lots of talk about economic inequality around U.S. blogs recently, on the occasion of the translation of Piketty's book. Here are two points that I think are often overlooked. Each of the points could hold if the other does not.

1. Let us say we know with certainty that low-status people suffer because most others are higher up the ladder. This suffering may come in the form of envy or of more distal outcomes such as poor health. This phenomenon, considered well-established by many, is often presented as an argument for reducing inequality. But does it follow that high-inequality societies are worse off, all other things equal? Of course not! Presumably, if low-status people suffer because they occupy a low rung, high-status people benefit because they occupy a high rung. The benefit experienced by high-status people might outweigh the suffering experienced by low-status people. Put differently, it is conceivable that, net of the influence of other factors, the avarage utility per person is as high or higher in a high-inequality society as it is in a low-inequality society. You might say that similarity of utility is desirable in and of itself, but then you'd be introducing an additional moral principle that not everybody might share. I'm saying "additional" because, as soon as you're arguing on the basis of people's suffering, you're already arguing on a utilitarian basis, whether or not you're aware of it.

2. Again, let us say people experience psychological costs because others do better than they do. But, clearly, there are positive externalities, too. In any society which uses taxation to pay for free or subsidized goods, poorer people benefit from having rich people around. That's because rich people pay disproportionate shares of the cost of amenities such as public libraries, clean drinking water, and a functioning criminal justice system. Low-income earners pay less than their share, even in flat tax regimes. Put differently, they get more than what they pay for. Would they really be better off if they switched to a regime in which they were less envious, but got Zimbabwe-level sewage and criminal justice systems? Probably not.


Seth Roberts Is Dead

Today, from his siter Amy, via his blog, came the message that Seth Roberts has passed away. My condolences to his family and friends.

I never met him, only had a few exchanges with him on this and his blog. Generally, I felt he went to far in his criticism of standard approaches, and put too much weight on low-quality evidence. But, as long as I knew of his work - and I certainly include his blogging here - I valued him as an original, unusual, and stimulating thinker. I believe that once the great weight gain in affluent countries ca. 1970-present is better understood, his learning theory of the set point will be a large part of the explanation.

Here are posts in which I discuss Seths work (some of them quite critical):
Two of his posts made it onto my year-end "Best Blogposts of..." lists:
Here are quotes of his that I found worth keeping. Here is his paper on self-experimentation in Behavioral and Brain Sciences. Here is his paper "What Makes Food Fattening?" Blowhard, Esq. remembers. Ben Casnocha remembers. Andrew Gelman remembers.

Social Scientist of the Month

The best answer in quite a while to the question, "Why do people look down on social scientists?" comes from Roger Matthews, professor of criminology at the University of Kent. The context is the idea that the removal of lead from gasoline may have played a role in falling crime rates, given that higher lead levels have been linked to aggression at the individual level. Here comes Matthews, as quoted by Dominic Casciani (via):
"I don't see the link," he says. "If this causes some sort of effect, why should those effects be criminal?

"The things that push people into crime are very different kinds of phenomena, not in the nature of their brain tissue. The problem about the theory is that a lot of these [researchers] are not remotely interested or cued into the kinds of things in the mainstream.

"There has been a long history of people trying to link biology to crime - that some people have their eyes too close together, or an extra chromosome, or whatever.

"This stuff gets disproved and disproved. But it keeps popping up. It's like a bad penny."
If you tried to come up with a parody of the daftness of those mushy-heads in the social sciences, could you think of anything better?


The Beatles' Please Please Me in Cover Versions and Originals

The Beatles' Please Please Me album in cover versions. Or originals in the many cases in which the Beatles' version is a cover. Have a nice weekend!


Intelligence Researchers: "Regression to the mean [...] is purely a statistical artifact"

Whoa Nelly! In a very interesting ask-a-researcher thread on Pschological Comments, researcher Michael A. Woodley drops the following, which surprised me a fair bit. The context is that there is intergenerational regression to the mean in intelligence. That is, very smart parents tend to have children who are less smart than they are; very dull parents tend to have children who are less dull than they are. Or so I thought - and not just I, I'm sure. Woodley disagrees. He quotes from a book by himself and Aurelio Jose Figueredo. Here's the central bit:
Furthermore in the case of parent-offspring correlations on g, oversampling parental scores with positive errors of measurement on IQ, as by selecting those identified as high-g individuals based on high observed IQ scores for special study, will produce regression to the mean when assessing the IQ of their offspring, even if the offspring were genetically identical to the parents, given the nature of this statistical artifact. This can be confirmed by retesting the parents themselves, which is rarely done, because one will then no doubt observe regression to the mean of the parental IQ scores in the parents themselves, presumably without having undergone any genetic recombination whatsoever. The proposition that offspring are necessarily closer to the mean of the general population in their actual latent g-factor (as opposed to their observed IQ scores) is therefore a fallacy, especially under conditions of assortative mating.
Quite a claim. Is this generally accepted, or perhaps Woodley & Figueredo's minority position? When they say that the claim "can be confirmed by retesting the parents themselves, which is rarely done", does this mean it has been done? Repeatedly?

This would explain a puzzle, though: If there were regression to the mean in a substantial sense, then it should not be over after a generation, which would mean that, by now, we should all be pretty much equally intelligent, right? With the above interpretation, that problem does not exist.

I just hope he means to restrict his statements about the nonexistence of regression to the mean to the context at hand. The phenomenon is certainly real in other contexts - unless you want to redefine, for example, a particularly hot day in a certain city as just an expression of a city's underlying latent hotness measured with upward error, and the like.