Outcome measures – On the other hand…

Moving to outcome measures won’t make things perfect in terms of evaluating charities.

There are some downsides to consider, as mentioned by Nonprofit QuarterlyWant Charities to be Evaluated Based on Impact? Be Careful What You Wish For.

The article raises three concerns, all of which we need to think about very carefully. I will mention the three issues and comment on each.


Impact measures are less reliable

The article suggests that outcome measures will be more uncertain and less verifiable than the functional allocation of expenses.

I agree.

If you think the debate over valuing donated medicine is messy, just wait until we open up discussion of the recovery rate of addiction treatment programs or changed literacy rates in a community created by one NPO.

The article mentions accusations that one cancer hospital allegedly screens out potential patients who are not likely to be treatable. What would be the motivation?

Marketing. Survivor rates are a key metric people look at when selecting a hospital for cancer treatment. If a hospital doesn’t admit really sick people, as the rumors allege, then the survival rate rises.

Outcomes will be complicated, messy, subject to manipulation, hard to explain, and hard to verify. The range of uncertainty will go up. The verifiability will go down.

Some people say that adopting SFAS 157 contributed to the meltdown that led to the great recession. Professionally, I find those arguments to be invalid and bordering on silly, but there are many people who have reached the conclusion that booking assets at what they are worth caused a recession.

Outcome measures will be even softer than fair value. They will be more subject to disagreement that what value to put on purchased 500 mg mebendazole.

The conversation over GIK might be tiny compared to what we could eventually have with outcome measures.


Impact measures are less easily compared

The article suggests outcome measures will be less comparable across sectors and within sectors.

I agree, but that same issue applies today.

How do you compare a literacy organization with X% learning rate with an alcohol recovery charity that sees a Z% short-term recovery rate?

For that matter, how do you compare a literacy organization working with at-risk children who are already at the point of learning to read versus a literacy organization that works with teens who just dropped out of school? Or a literacy organization that works with functionally illiterate adults who have decided they are tired of living on the fringe of the economy?

I am sure those three organizations would have substantially different outcomes. Can you pick which will have the highest and lowest success rates? I can.

By the way, the methodology used by all the rating agencies would go out the window.

As the article points out, this issue already exists with overhead ratios. If a donor gives to the cancer research organization across town only because their overhead ratio is half that of your drug rehab program, then outcome measures showing a huge portion of your program participants get on their feet and are still sober five years later won’t do you any good.

With the current accounting method of calculating overhead ratios, those charities with a mission that can easily incorporate lots of GIK look far ‘better’ than all other charities. Charities with an unpopular cause or who are new must spend more to communicate their message and look far ‘worse’ than all others. I perceive few donors understand that dynamic.

Third concern mentioned in the article:

Impact measures are less controllable

Management controls how much money is spent on fundraising and G&A and how it is spent. Management can’t control variables like what happens in the economy and how that affects program outcomes.

The article suggests that will make it nire difficult to get good outcome measures and make sure they are interpreted correctly. Accountants call this representational faithfulness.

I agree.

Picture two child sponsorship programs each working in a different country. For whatever reasons, the country where NPO A is located sees a tremendous boom in the economy over the next decade. There will be a magnified impact from the efforts in A’s program compared to NPO B.

Looking at the outcome measures will show A is more effective than B. In that scenario, A get the benefit of things outside their control.

The irony would be that more contributions likely would be flowing to NPO A even though the economy then starts growing in the country where B is located and their outcomes improve dramatically while A’s slow down.

Which organization was more effective? We won’t know for sure because the economic growth skewed the outcomes.

This would be a bigger impact if one of the organizations successfully isolated the economic growth from their outcome measure and the other did not. Another researcher would have to disassemble the data to find out if the stats were properly adjusted for the same outside variables.


I believe the movement toward outcome measures is a good thing.

We will need to carefully consider the downside risks and figure out how to manage them.

The article is superb. I recommend you check it out.

The conclusion:

Taken together, these concerns about reliability, comparability, and controllability of impact measures suggest the staying power of accounting measures amidst a barrage of criticism is no accident. … I am not suggesting measuring impact is not worth the trouble. What I am saying is that the notable weaknesses of impact measures are the very things at which accounting excels. For this reason, impact measures are better viewed as a complement to, not substitute for, accounting measures.

5 thoughts on “Outcome measures – On the other hand…

  1. 1. Impact measures are less reliable – Less reliable than overhead rates? Besides being an inaccurate measure of NFPs, aren’t overhead rates unreliable? Perhaps what they meant to say is “less verifiable.” Overhead rates are reliable only in that anyone can pull up a 990 and calculate the overhead rate for themselves, but that doesn’t make it reliable.

    2. Impact measures are less easily compared – Overhead rates are comparable only in that they are all called overhead rates. Perhaps they mean “less convenient. Overhead rates are convenient to compare because you can look at two numbers on a page and note that one is different than the other, but you’ve learned nothing about the underlying programs. Evaluating programs is hard, and in our culture of instant gratification, people just want to pick orgs based on the number of stars.

    3. Impact measures are less controllable – Evaluating the impact of organizations is always going to be difficult, but that doesn’t mean it should be ignored and we just go back to looking at arbitrary financial measures that are used only because of convenience.

    NPOs are not measured by financial ratios because financial ratios are not what NPOs are trying to achieve. Their arguments seems to be based on the fact that accounting measures happen to be available and available conveniently through public 990s and AFS, not becuase they are accurate measures of NPOs (notice none of their points are “Impact measures are less accurate of NPO’s purpose”) I would turn their conclusion around and say, “Accounting measures are better viewed as complements to, not substitutes for, impact measures.”

    1. Hi Leif:

      Thanks for the comments.

      1. Using some very loose accounting descriptions, the functional allocation is easier to prepare, clearer to review, and easily tested by others. Outcome measures far less so. Thus, less reliable.

      Laying the numbers out on the 990 allows others to review the calculations at some level for their own assessment of believability. For example, if you see a huge amount of telemarketing time allocated to program and you don’t believe it, you can reallocate the numbers yourself. Can’t do that with an outcome measure.

      Still worth doing but we need to acknowledge it will be messy. Not that functional allocations aren’t already messy.

      2. I’ve long had a problem with the lack of comparability of program services. It isn’t appropriate to compare the ratios for a GIK based org, an org that’s new or working with an unpopular cause, and all the others.

      As to the instant gratification factor, I agree. That will be an issue with any measurement.

      Do you suppose that the huge variation between sectors will make it easier? Drug rehab programs will all be measured with low numbers. Domestic abuse shelters would probably have low number, but higher than rehab. A relatively low annual outcome measure for a child sponsorhip program which is aiming for change over 10 or 15 years would be a tremendous success. Really messy sectors, like organizations trying to end sex trafficking or finding a cure for cancer, might not see any measureable outcomes for years. Some sectors might see all entities with 60 or 80 or 90% measures on whatever it is they are measuring because it is an easy sector to work in.

      3. Less controllability is an issue that can be dealt with. As I’ve talked to people who’ve done actual research, it is possible for people with research skills to isolate multiple variables to identify the specific factor you are looking for. It can be done. It’s difficult, but possible.

      The danger here is having to trust that the researcher did a good job isolating all the uncontrollables. To review the work would require another researcher going through all the data.

      Here are a few questions for discussion (you can probably guess my answers):

      Should we try to figure out how to implement some sort of outcome measures?

      Will it be difficult?

      Would it be worth the effort?

    2. Leif:

      In the context of the article on Compassion, you and I are probably clueless on how to isolate the overall GDP change from the impact of Compassion’s programs.

      However, those are the skill sets that come natural to a research statistician or economist.

      Just a stray question to illustrate the concept – how does the mix of countries where an NPO works, oh say World Vision as an example, affect the expected outcomes over the course of a decade compared to the mix of countries where Compassion works? Should we expect one program to have a higher outcome because of where they are working? I have no clue. A developmental economist could quantify an answer.


  2. I think you’ve hinted at another key part of this discussion, and that is that “Not-for-Profit” isn’t an industry.

    Think about the for-profit industry, does anyone consider all for-profits to be exactly the same with the same measures of success? If you were deciding on an investment, would you evaluate McDonalds the same way you evaluate Google? If McDonalds had two stars and Google three stars, would you automatically choose Google as the right investment strategy? Or course not. There’s so much more that goes into it. For one, they are completely different industries. Just because they are “for-profit” does not make them the same. There are tech companies, fast-food companies, pharmaceutical companies, broadcasting, staff, auto manufacturers. “For-profit” is a extremely broad, overarching term describing a wide variety of companies.

    If you work at a large CPA firm, you likely work only in a specific industry. You don’t say, “I work in the for-profit” industry, you likely work in the manufacturing, or retail, or tech, or financial institutions, etc.

    However, when it comes to non-profits, we do group them all up together as if they are one homogeneous group. CN does make some accomodations for different industries, but the same measures nontheless. In evaluating charities, you’re going to have to apply different measures to different industries just as is done in the for profit realm. You won’t be able to compare the outcome measures of a drug rehab center to the outcome measures of an Int’l R&D org (nor would you be able to compare the overhead rates for that matter).

    Evaluation of a charity is going to need to be focused, not broadly applied.

    1. Leif:

      Great comment.

      One of the more entertaining exercises in my long-ago grad finance class was looking at a dozen ratios for a bunch of companies and matching them to a list of industries. It was easy for me as a CPA. The one with a huge receivables and payables with around 10% equity was a bank. The one with very low turnover of inventory and very high margin was a jewelry store. The one with huge turnover and very low margins was a grocery store.

      One of my classmates was amazed that I was able to match them. “How did you do that?” he wondered.

      By knowing what different industries look like. They are very different.

      Same with nonprofits.

      Universities, bible colleges, mission sending organizations, R&D entities, counseling centers, rehab programs, and pregnancy care centers all look very different.

      They have different operational risks and audit risks. They have different sensitivities on the functional allocation and different fundraising strategies.

      The types of outcomes will vary just as much.

      Comparing outcome measures will take some serious thinking. Will it be worth it?


Leave a Reply

Your email address will not be published. Required fields are marked *