This is a guest post by Justin Han, who interned with the Spark team in summer 2021
Medicare Advantage serves one in every three Medicare members, equating to around 19 million Americans. With over 3,550 plans available, what makes a Medicare Advantage plan “good” and what makes a Medicare Advantage plan “bad”?
In this article, I’ll break down the CMS (Centers for Medicare & Medicaid Services) Five-Star Quality Rating system, exploring how it’s measured and the incentives it creates. I’ll conclude with an analysis of where it succeeds and fails, and recommend ways to improve it.
Breaking down the stars
Medicare Advantage plans are rated using the CMS 5 Star quality rating; one star reflects a poor plan, while five stars is an excellent plan. Who uses it? You’d think that consumers would use the Stars ratings to evaluate plans, but we actually see that they don’t factor into their decision at all. The rating system is primarily used by state regulators, health care insurers, lenders, and investors as it impacts incentive payments, referral networks, and financing.
The table below shows the incentive structure; a 5-star plan gets a 5% bonus on new enrollees from Medicare and a 70% rebate, compared to a 3-star plan with no bonus and a 50% rebate.
The CMS Five-Star Quality Rating System factors in 38 measures with different weights depending on their importance.
Let’s play a bit of Jeopardy. The 38 measures for Medicare Advantage with a Prescription Drug Plan (MAPD) are organized into five categories:
- Staying Healthy: Screenings, Tests and Vaccines
- Managing Chronic (Long Term) Conditions
- Member Experience with the Health Plan
- Member Complaint and Changes in Health Plan’s Performance
- Health Plan Customer Service
Let’s take a closer look at what one of the measures looks like. Let’s go with Managing Chronic Conditions for $500!
What is the measure “Controlling Blood Sugar” (which is weighed three times)?
The CMS observes the percent of plan members with diabetes who had an A1C lab test during the year that showed their average blood sugar is under control in reference to HEDIS (Healthcare Effectiveness Data and Information Set). In short, we see a lot of number crunching (shoutout to our Excel wizards) for each healthcare plan based on various measuring methods such as surveys or percentages reported by the carrier. With so many measures being dependent on patient interaction/participation, healthcare companies have to somehow encourage their enrollees to actually get tested and go to their physicians… and sit in those awkward waiting rooms… with terrible wifi… and Vogue magazines from 2010…
Clearly there’s a lot that goes into measuring the plan ratings. But is it worth it?
Where the stars don’t align
If you want to dive into the performance data, be my guest. But what the data doesn’t show are the litany of factors that impact the star ratings that are outside of the plan’s control, such as patient demographics and the financial strength of health systems where the plan operates. If a plan just operates in an area where the cards are already stacked against it, it will face fewer bonuses and rebates. This, in turn, either pushes that plan out entirely or leaves it with less to invest in the future.
CMS is aware of this issue, but its efforts are falling short. One attempt to address demographic variance is the Categorical Adjustment Index, which was introduced in 2017. It adjusts ratings for dually Medicare-Medicaid eligible individuals and disabled beneficiaries. But the problem extends to all Medicare beneficiaries.
As this paper found, when adjusting for socioeconomic factors, plans significantly improve their rankings for treatment of diabetes, cholesterol, and high blood pressure. We actually are able to see lower star ratings for some of the top insurance plans if we implement diversity measures. Top-rated insurance plans actually had larger racial (comparing White, Hispanic, and Black) and socioeconomic disparities in care. Some of our 5-star plans could be 3-star plans in hiding!
Furthermore, the ratings system focuses on upstream administrative tasks that don’t necessarily lead to better outcomes (either improved quality or lower costs).Plans are extraordinarily good at gaming the system, so we may be seeing another instance of Goodhart’s Law. See how ratings have improved over the last five years, with a (highly) questionable impact on costs and quality. 80% now have 4 stars are more.
Relative or absolute stars?
Today, the Star system applies an absolute rating that evaluates plans across measures with few adjustments. This is why we see rating inflation and variability across geographies. Certain plans look “bad” as outside factors (hospital distance, specialists accessibility, or internet access) might limit them to a four-star rating. The concentration of highly-rated plans is already significantly higher in affluent cities or suburbs, and conversely the concentration of lower-rated plans is higher in rural areas or areas of lower socioeconomic status.
There are two potential other approaches. The first is an absolute rating that adjusts for geography and demographics. The second is a relative rating that puts all plans on a bell curve either within their county or across the U.S., adjusted for demographics.
The pushback for a relative rating system is that it would lead to inconsistencies with the standard of care across the country. But it would more accurately credit plans that are performing well given the people and resources they’re working with.
I don’t have a clear answer, but there’s a lot more work to be done to create an equitable system that routes funding to where it’s needed most.