Big Issues, Successor to the Scholars' Survival Manual: March 2014

Monday, March 31, 2014

Climate Impacts and Other Impacts

I am sure there are impacts of global warming, and the latest report suggests they are substantial. But I always wonder: Compared to what?

Locally we are quite capable of destroying our livelihoods: war is very effective, as is genocide, major local pollution sources, earthquake, etc. I include things like earthquake, because we choose where to live, and how prudently to build. There will be major surprises, but they are fewer than are the anticipated disasters.

As for food and fuel: Food and starvation are matters of market and logistics, and in general many of the famines have been a matter of the food not being available although it is there. Let us put aside Stalin and the Ukraine, or the Irish potato famine. The green revolution should make most famine much less likely, but again it is a matter of politics and market (the latter being a matter of whether those who are supported by government in their agricultural activities then have total say about how food is distributed).

As for fuel and energy, the usual observation is that the big consumers pollute and send harm to poor, although poor countries can be very effective local polluters using highly-polluting heating stoves etc. I don't know if the current models include effects of scarcity on consumption and the modes of consumption. As for the dangers from nuclear power, given the possibilities of accidents, my suspicion is that many more people are killed in coal mining and other resource extraction activities.

I recall from elementary school the major concern being nuclear bomb catastrophe, the wonderful setup of mutual assured destruction without second strike capabilities, etc. There was nuclear winter. There are the economic consequences not only of war, but of terrorism and fear thereof, etc.

I just need to know, Compared to what? Surely, there may well be long term impacts, although one suspects that such longer term impacts will encourage invention and innovation since the payoffs there are much surer than short-term impacts. And perhaps there will be major consequences in terms of shorelines, storms, etc. But I know that we have been very successful in doing ourselves in, albeit even the World Wars were not quite so overreaching as is global warming.

I do not find it helpful to think apocalyptically. I need to see particulars. And it is surely the case that these climate reports are detailed. I would love to see them professionally critiqued, to give me a sense of the reliability of their claims. I know they have been critiqued, but it would help me to know in detail where the uncertainties are.

Sunday, March 30, 2014

Wilson has two papers published when Cornell tenured him, after two years. You are NOT KG Wilson.

"The people at Cornell had more of an interest to know what I was doing than people at Harvard, because they were going to have to make a decision... Of course one of the things that happened was, as you may or may not be aware, is that they gave me tenure after only two years and with no publication record. In fact, there was one or two papers on the publications list when I was taken for tenure and Francis Low complained that I should have made sure there was none. Just to prove that it was possible. "

Kenneth G. Wilson, Nobel Prize 1982, re his first assistant professorship. He had done his PhD at Caltech under Gell-Mann and Feynman was his examiner (so to speak), was a Junior Fellow at Harvard (spending lunchtime conversations with MIT--Low, Johnson,...), was at CERN at some point, and then went to Cornell. From his Caltech days on he was working lots on approximations, computer ways of doing this, as well as more "analytical" work.

Saturday, March 29, 2014

What I have learned about modeling, statistics, and quantitative methods

1. Have a theory, a mechanism that connects the various aspects of what you are studying.
2. From fieldwork, other studies, whatever, be sure the right aspects are included. Thinking won't tell you much, compared to going out and talking to people, getting a sense of what might matter. You intuitions are likely to reflect your own experience, not those of what you are studying.
3. Sensitivity analyses are crucial--what if a parameter were somewhat different, what if you have a polluted data set, what if your a priori or estimates of certain factors could be different by 50% or maybe a factor of ten.
4. If you can understand your model without invoking fancy physics theory or whatever, and if what you say is in the end independent of that theory, don't bring it up. Hence if you talk about complex adaptive systems, but never actually model one using real data, perhaps you are just using some useful metaphors.
5. On the other hand, if fancy theory really helps, go for it--so for example, Cuernavaca, Mexico bus drivers' schedules are in fact understandable using what is called random matrix theory, not only in the distributions that such theory predicts--theTracy-Widom distribution and the Wigner semi-circle law, but you can actually model the process, you are really in great shape. [See: The statistical properties of the city transport in Cuernavaca (Mexico) and random matrix ensembles, J. Phys. A: Math. Gen. 33 (2000): L229.]
6. If you do statistical analysis, it would really help to have some mechanism behind your regression or whatever statistical reduction process you employ. You might well do some such analysis to get an idea of what matters, but having something like a theory (#1 above) will serve you well.
7. Keep in mind that numbers that come out of your programs and packages are unlikely to be reliable to more than two significant figures, most likely one. Of course, you can decide if some factor or coefficient is close to zero or to one, or is important or not very important. But the actual number is a 1.5 significant figure number. You can locate a mean to high precision with lots of data points (snce the standard error of the mean is the SD/SQRT(N), but it's hard to believe that you have a conventional gaussian to that precision, or whatever distribution. You need robust and resilient methods, and you also need a sense of how polluted is your data set. High N does not save you from pollution. You might do lots of tests about the quality of the discovered distribution, at least.
8. How much would you bet on the quality of your reported figures? For example, in some of the natural sciences, you can measure something to anywhere from 3-12 significant figures. And you have theoretical connections of what you measure to other measurements. No economic or social science studies have this level of quality or connection--although for the most part it does not matter. Now, when you make a claim about economic growth or unemployment rates, what matters is to get it roughly right and in the correct direction. And in general you can measure such to 1% or so, although some numbers can be known quite well, you might say things changed 0.1%, since the denominator is very large. But the uncertainty in that is still likely to reflect the uncertainty in the numerator.
9. Recurrently, there is the call for measureable consequences of policies, and the claim that if you can't measure it, it does not exist. What is needed is a much better sense of what you should be measuring, and that means you have to go out into the field and find out more about how your policy is working. Ahead of time, you want to lower unemployment, but you had better go out and find out just what you are measuring when you do such surveys. Of course, this is well known.
10. But in less well trodden circumstances than national statistics, you have to find out what is going on, the ground truth that corresponds to what you measure, or perhaps what to measure that corresponds to the ground truth.
11. If there are very substantial consequences to your claims, you have to be sure that your modeling, statistics, and quantitative methods are good enough. You have to reveal their uncertainties when you make your claims. Of course, there are interested actors who will take what you say and selectively present it. But when the consequences are serious, you owe it to society to allow others to have a sense of how reliable are your claims.
12. Some of the time, there are policies or programs that are justified by what they do on the input side. Perhaps you cannot show any connection between income inequality, however understood, and economic growth, or whatever. But it might make sense to lower inequality for the purposes of social integrity and political legitimacy. Maybe it won't change much, but at least you don't look so unequal compared to other places. Of course, others will argue that income is a matter of dessert. But one of the problems in saying this is that we have a system of taxes and incentives--tax expenditures--that allow some people to have much larger incomes than they might have. We might decide those are OK, but they don't look so good when you compare them to food stamps.
13. Keep in mind that physics envy is likely to shift to biology envy. That is, biology, genetics, computational biology, etc will be providing some of the models. More to the point, physics, biology, chemistry,... all of them are crutches to allow you to think about what is going on. It's very hard to identify actual processes and measureable parameters that correspond to these models. Econometrics may well allow you to take a messy data set and make it speak to you. But be sure that it is speaking in some understandable language about how the world works.
14. Simple models are often very illuminating, suggesting mechanisms that might be behind what you see. If the models get very complicated, it's hard to discern the mechanism in this black box.
15. Before you do your analysis, write down what you think the numbers will be. You are allowed to analyze 1/10 the data, and then have another write down having seen the results with 1/10 of the data. And you may well need to fix things at this point since the results are surprising. But what you need to do is to open up the black box of your total data analysis and accept its outcome, or at least be quite reluctant to fiddle with things.

Sunday, March 23, 2014

The Stupid Zone

Someone gave me a great notion, The Stupid Zone.

Put simply, you know that doing something may have some short term benefits or pleasures, but you will soon after regret it, or find that the blowback is harmful. You have entered The Stupid Zone when you think about engaging in such actions.

You know what you are doing is not good for you, or will hurt you, and you know that it is stupid to pursue it, but you cannot stop yourself.

At least you can label what you are doing, The Stupid Zone.

I have no cure or even short term preventive to offer, although much of The Scholar's Survival Manual is meant to deter you from spending much time in The Stupid Zone. I keep discovering that I am in the Zone.

None of this is new or inventive. It's just that the label works for me.

Friday, March 21, 2014

The credibility and reliability of claims concerning harassment and bias.

In the news these days are stories of military situations where those who have claimed to have been sexually harassed have been judged wanting in credibility or reliability. Some of the time, where there is smoke, there must be fire, but some of the time it's just fog. My suspicion is that something untoward has occurred, the claims are likely the case, but the claimants have been disbelieved.

This is not unusual in the history of sexual harassment, and much effort has been made to give the claimants a fairer shake. In the past, it is clear that there has been a big effort to get rid of the claim and the claimant (almost always a woman, against a man). So the current efforts are appropriate.

But, once a legal or even an investigatory process begins, it is likely that the claimant will be tested as to their credibility (are they truth-tellers?) and reliability (is their account accurate?).

If the claimant has a history of making such claims, or a history of unreliability (in job performance, or other such), it is likely that their claims will be subject to challenge. (Yet, of course, that claim of harassment may well be true.)

Also, even in non legal contexts, the investigators are likely to take into account the impact of the claim on the supposed perpetrator's reputation, and want to be sure that the claimant is credible and reliable. Hence, it behooves the investigator to check the background of the claimant without impugning the current claim. Of course, the investigator will be checking on the counter-claims of the supposed perpetrator.

For example, if the claimant is known to be hyperbolic in their other claims in other contexts, how do you handle their current claim. If the claimant turns out to have been otherwise trying for some favor from the accused (job advancement, for example), again the current claim is subject to some doubt. And perhaps the claimant has misinterpreted what happened, and there are witnesses or video evidence of the event that suggest that it is a misinterpretation.

This is not a happy situation. Let us say that the claimant feels that their claim is warranted, and is not up to some sort of revenge. The claimant may well feel that they have not been vindicated, largely because their claim is polluted by their past behavior. And the accused rarely has a chance to fully clear themselves.

What's the lesson? Likely, it is that our adjudicatory and investigatory processes are imperfect--especially when it is one person's word against another's. More to the point, social processes don't solve every problem and issue.

Saturday, March 15, 2014

Multiple Authors, Grants Raised, Getting the Research Done

1. We are in fields in which it is now the norm to have multiply authored publications. Perhaps, virtually none of one's work is singly authored. Your co-authors may be graduate students, post-docs, other faculty, and directors of labs (who may have been the one who went out and got the research funds, and so are the PI).

Now we might count articles discounting for joint authorship, where there were N articles, with M authors each, and say that you had published N/(M-1) articles, if M is greater than 2, and N/1.5 for M=2. Or some such. In any case, you don't want to say X published 24 articles and take that to be stronger than someone else who published 8 articles where they were the single author. And of course, one wants to take into account the strength of venue.

One solution to compare the candidate or researcher with top-most others at the same stage in their careers in the same field, as well as with more typical researchers. This strikes me as more useful.

2. But what counts in the end is your contribution to scholarship and "advancing" the field, your ideas and discoveries and their impact. I believe it would be useful for all faculty to write a one page statement (single-spaced), better less, where they describe their contribution to the field, and their contribution to joint research. Collaborators and external letter writers can judge those claims.

3. It would help us to evaluate CV's and other such lists, if it were clear what role the person had in obtaining funding the research. Were they the PI, were there several more or less equal Co-PIs (I think there are restrictions on the number, so perhaps there can be only two Co-PIs), were they in charge of one project in the research grant, were they taking on a particular task. And if you are a Co-PI, please list the other PI. And it should be clear if the amount listed is before or after overheads. It's vital that none this not be fudged, because someone will check up and once there is a sense of unreliability in the CV, the rest comes under question.

4. Research faculty might well be expected to have raised their own grant monies. But I can imagine that their main role is making sure the research gets done (think of biostatisticians). A unit has to decide how to evaluate those contributions.

Friday, March 14, 2014

Tenure Idiocies

Don't hide anything. Someone is bound to go deeper and if you are found out, your case will not be so credible.

Be balanced.

Don't impugn your own witnesses, whether faculty or letter writers. It's ok to say something to the effect that some of them come from very different traditions than the candidate, and so are not so sympathetic. But deal with their concerns substantively.

You are more credible if you don't sound like a sales job.

There are some candidates who are terrific. Most are OK. At the few very strongest universities, more than a few are terrific, but then often your problem is how to choose from among them.

Usually there are about five or ten tenurings that are mistakes compared to one tenure-denial that is a mistake. Unfortunately, there is some bias in these statistics--so you have to be diligent about tenuring people in some groups (usually white males).

Tenure, Lemons, and Devil's Advocates

Given today's proposed discussion of our tenuring etc. procedures, the following may be of interest. (I once wrote two reports on Jesus of Nazareth, both "tenuring" and not. Meant to be fair but coming out on different sides.) Keep in mind that the UCAPT has diverse members, and any claim you make is likely to be tested by one of the members (citation counts, choices of referees, summary of letters, quality of work by being read by someone else, ...)

1. When you are proposing someone for appointment, promotion, or tenure, there are two intrinsic problems: the Lemons problem, the Devil's Advocate.

a. There is an asymmetry of information between one level and the further up ones. You know more about the candidate than does the UCAPT. So you are much like Akerlof's used car salesman: your price will be discounted by the buyer, unless you offer a warranty or ...

b. Reports need to be balanced and fair so that the Lemons do not immediately come up. A Devil's Advocate should be on the committee to take the opposite position to force the committee to face problems directly. For example, have the letters' contents been fairly assessed rather than picked for one position.

2. The ad hoc committee should include one person outside the department/field, in effect an agent of the School. The letter writers should be seen as fair, not bunched in any way, and authoritative. Detailed analysis, of strengths and weaknesses, detailed reports on the scholarship (read the papers!), teaching evidence. Also, always assume that someone up there is likely to read your report with a critical eye. Professors are trained to be skeptics. If you hide something, it is more than likely to be discovered, and your whole report will be discounted. There are almost no candidates who are flawless, and often deep weaknesses do not lead to negative reports at all.

The comparison cohort should be the top people in the field, those at roughly the same stage in their careers as the candidate, and perhaps some more senior. Of course, if the university's aspirations are modest, you need to have comparable modest comparisons.

3. The committee should have a frank discussion of the strengths and weaknesses. In general, what you see is what you get, so expectations of much weaker or stronger performance five years down the line are unlikely to be fulfilled.

4. In the departmental/field discussion, there should be enough time to air issues, and the committee report should have all the information needed such as citations counts or whatever. One member of the department might well take the Devil's Advocate role here. It may be important to have a small enough group (say all tenured for tenure decisions) so that negative remarks are seen as being in camera and not being a bad member. The report of the discussion should never dismiss negatives, or interpret the vote. It should be substantive.

5. At all levels, do not discount a dissenting referee letter. Deal with its content. You chose your witnesses.

6. The School's APT is there to represent the School, and to make sure that departments/fields have comparable and high standards. They are concerned with scholarship (and teaching and service), not School needs. Again, enough time needs to be allowed for real discussion and dissent.

7. If you have gone this far, the dean is not making excuses for problems in earlier levels, but summarizing, weighing, and taking into account institutional needs.

Thursday, March 13, 2014

Credible Claims in Statistical Analyses

You have a data set. Let us for the moment claim that there is no measurement error in each measurement or data point.

1. You might find the mean and the standard deviation of some measurement. It's likely that the standard error of the mean is small if the N is large. But it is hard for me to believe that the mean +- SE should be taken too seriously if the standard deviation is substantial. That is, the location of the mean may be statistically sharp, but given the substantial dispersion of the measurements, I would be surprised if I should bet on more than two significant figures, often just one. Put differently, you measure 6+-2, and the SE is .02. I would find it hard to distinguish 6 from perhaps 7, given the spread of the measured values. That is another measurement might have gotten 7+-2, with similarly small SE. I would not believe 6 is different from 7 in this context.

2. When I say that I would not believe, what I am saying is there is enough noise, non gaussian intermixture, junk in the data so that the spread given by the SD prevents me from making vary sharp claims about the difference of two different means.

3. You do a variety of statistical studies. Ahead of time, before you do the studies, you might estimate what you think the statistics will be. The mean, the SD, the regression coefficient. Roughly estimate them. Maybe only relative sizes. Maybe there is previous research that gives you a decent idea.

Then do your studies. Are you surprised by any of the statistics that come out. Keep in mind that it is hard to believe more than two significant figures, often one, whatever the statistical error.

4. You are trying to measure the effect of an intervention and the like. I suspect that any effect smaller than 1% is not credible, again whatever its statistical error. Maybe 10%, maybe 30%. Your problem is that typically you might account for a fraction of the R-squared, and you have to assure yourself that the rest is truly random noise or randomized by your research design. A small amount of impurity or contamination in the data will be problematic.

5. Whatever you measure, can you think of a mechanism that would lead to the number, roughly, that you measure. This is after you have done your analysis. Before, see #3 above. Could you eliminate a range of mechanisms by your statistical work?

6. If you make claims about a discount rate, say, why should I believe your claim? Have you done sensitivity analyses with different rates, to see if your conclusions are robust? And how many years ahead would you want to use such a notion as discount rate and believe it is credible in reflecting our attitude about the more distant future?

7. In the natural sciences, measurements almost always are about actual objects and their properties, say their mass, their energy, etc. Usually those properties are connected to measurements of other properties and perhaps to theories that predict values or connect the value of one property to that of another. In the social sciences, that is rarely the case, as far as I can tell. And you believe you could have, in some particular measurements, many significant figures (high accuracy). I don't see such a belief in social science.

Saturday, March 8, 2014

Precision in Social Science Resarch

For many good reasons, social science research and its statistics and numbers function differently than what many natural scientists are able to do. This is not to impugn such social science research, for often it is of very great import, appropriate import.

1. When numbers are quoted, it is likely that they have between 1 and 2 significant figures at best. That is, even if the quoted errors or dispersion is small, one would not bet on their actual value (whatever that might mean) being so small--so 0.21 and 0.24 are likely indistinguishable in actuality, even if their errors are quite small.

2. Hence, when one compares two numbers, it is unlikely that a difference in say the second figure is credible. So it is quite unlikely that 0.21 differs from 0.24, no matter what the errors, in actuality.

3. Rarely is research repeated in such a way that the numbers reported in one project are checked to better than one significant figure. That is 0.2 and 0.3 are the same, no matter what the errors.

4. Dispersions, as say standard errors or standard deviations are measures of either variations in the data (you have no way of accounting for it), or they are measures of reliability of say regression coefficients. But, they have no intrinsic significance. But in finance, the dispersions are measures of volatility and so of risk in the context of some equilibrium.

5. Explanations, even if they have good reason to be taken as causal, will explain some fraction of the variation (the R-squared). Enormous effort goes into showing that the unexplained variation is not much influencing the explained fraction, and is merely noise.

6. What you are aiming for is a sense of what's influential, what is much less influential, what's worth attending to in making a decision. Rarely is the actual number, even with its claimed precision, so crucial.

5. When a particle physicist or an atomic physicist quotes a number, it often matters at the level of 3-10 significant figures, and if done properly, the error is in the last figure or so. Often there is a good reason to have the number not only be precise, but accurate in the sense that you have a good reason to believe what the number should be in the context of theory and other numbers. Differences of two means, for example, may represent mass differences, or masses that are or are not zero, and these differences or tests for nonzero have great importance, and may well be a matter of differences in the last digit of three significant figures. Often the differences can be measured with much greater precision than can the two numbers that are being compared. Research is repeated very often, if not often enough, in the sense that a particular claimed number is checked by a different method, a method that claims to investigate the same fact. And dispersions are not always measures of variation, but often have deep meaning much as in finance. In general, for physicists, they would bet the farm on the claimed number with its precision as being accurate.

Thursday, March 6, 2014

What you are measuring...

I was trained as a physicist. When we made a single measurement, of an event or a whatever, we have a sense of the measurement error, what we called systematic error. It usually depended on the quality of our apparatus and its limits. In the case of rare events, we might even have a sense of a Poisson process and so say that we saw an event at time T. In any case, whatever it was that we were measuring was as real as anything, not an artifact.

When we combined measurements, a combination of the data, that is we were being statistical, we might expect to get something like a gaussian with a mean and standard deviation (or some other expected or suprising distribution). The more events we had, the better known were those statistics. And we believed that those statistics were referring to something as real as anything--say the mass of the particle and its lifetime (the inverse of the standard deviation) with an unavoidable systematic error that affected our statement of the standard deviation and perhaps that of the mean.

If you wondered whether the mean were different from zero, you wondered whether some real thing were different from zero--say the mass of the tau neutrino. If you wondered whether the mean of two different measured quantities were different, you were wondering whether the mass of the K particles were different. You would take the width or standard deviation seriously, because in fact there was surely some systematic error and there was an intrinsic width of the measured quantity, its lifetime, but still the mean, the mass was something real, and you would quote the mean with an error and the standard deviation with an error (systematic and statistical)

In social science studies, as far as I can tell (and I have become acutely aware of this only recently), again we make a single measurement and have sense of measurement error (if that makes sense in this context: sometimes it may be a matter whether people are reliable reporters in a survey, whether the data is dirty,...). Again, we might say that whatever we are measuring or surveying is as real as anything.

Again, "When we combined measurements, a combination of the data, that is we were being statistical, we might expect to get something like a gaussian with a mean and standard deviation (or some other expected or suprising distribution). The more events we had, the better known were those statistics." And we act as if those statistics were referring to something as real as anything--say the average height of a population. But almost always there is no reason to take those statistics as real, they were artifacts of our combination and we had no theory that gave that number a deep reality--or so is my observation of actual practice. And of course there might well be "an unavoidable systematic error [ore measurement] that affected our statement of the standard deviation and perhaps that of the mean."

If you wondered whether the mean were different from zero, you would check the power of your statistic, you would see how well measured it was (standard error) and so you might get a good sense of whether the mean were different from zero. But, there was nothing real about it in the sense that a particle mass were real. It was a statistical measure of that artifact, the height of a population. Presumably the width is substantial but not overwhelming, but it shows the dispersion of heights.

However, say you wanted to check whether the difference in heights of two populations were significant. Surely you can do much as the physicists would do, and see if the difference of the means were statistically significant (and say that the systematic or measurement errors were not important). But say as well that the distributions overlapped substantially. You could surely say something about whether the means were different. But, I would find it hard to take such a statement very seriously, since the distributions overlap so much and so any problems in the distributions would make me skeptical that the measured difference were credible in actuality.