The Washington-based Providence health system has undertaken a rigorous research program, including randomized trials, to understand the impact of new technology deployments internally. Providence researchers recently published results on the impact of ambient clinical intelligence (ACI) on documentation workload and burnout.
This study, published in Future Healthcare Journal, measured the effect of using Nuance DAX Copilot from Microsoft to reduce documentation workload and improve provider well-being. Using provider feedback as well as data from Epic’s Signal tool, the study found that ACI significantly reduced documentation burden, provider frustration and burnout. Providers spent less documentation time each day, and 2.5 hours less per week of off-hours documentation.
To dive into the study’s details, Healthcare Innovation spoke recently with Providence’s Maulin Shah, M.D., chief medical information officer; Scott Smitherman, M.D., M.B.A., associate vice president, CMIO – Providence Clinical Network; and Staci Wendt, Ph.D, director of the Providence Health Research Accelerator.
HCI: Has Providence been working on the issue of documentation burden for a while? For instance, has it participated in the KLAS Arch Collaborative?
Shah: We’ve been involved with KLAS for a number of years, and we’ve been doing surveys as frequently as multiple times a year to try to see what our interventions do. We have technological interventions, but frequently they are operational and just personal interventions.
Scott’s team actually built a structure for physician one-on-one coaching to be able to sit down and help them get better. What we found is that of all the possible interventions, the most cost-effective and the most effective is when you go and talk to a doctor, one on one.
Now we’re looking at all sorts of potential technological interventions as well, because documentation burden isn’t just writing notes. You could argue that writing messages to patients is a type of documentation. We have AI tools that we’ve developed here at Providence to drive that as well. Documentation burden could also be just surfing the chart to find information about our patients, and being ready for your encounter. So we’re working on tools in that space as well, partnering with our EHR vendor.
HCI: The study mentions using the Epic Signal data. Can you use that to assess which clinicians are having the most challenges with “pajama time” or other issues and and reach out to them to offer help?
Smitherman: Epic gives us a wonderful set of tools to measure this. We can tracking those clicks, and we have long done that. The ambulatory informatics teams, both virtually and in person, has long used those tools to target who’s having a problem. But we have been so excited by our ambient tool, DAX, that we’re offering it to all of our clinicians. We feel it’s valuable enough that we’re not just prioritizing it to doctors having the most problems with documentation. We’re offering it to everybody. More than half of our primary care doctors are using it on a regular basis.
HCI: Staci, I wanted to ask about your role. You are director of the Providence Health Research Accelerator. Can you describe that accelerator and how that fed into working on a project like this?
Wendt: We’re a relatively new group at Providence. We’re part of the Providence Institute for Clinical Innovation. We’ve been around for about four years, and we are a research-focused group at the enterprise level.
Our group is responsible for testing ideas and recruiting patients and our clinicians, looking across a variety of stakeholders to understand what’s happening. We also test new ideas and new care pathways. We work with Maulin’s team in the AI tool space, and this project in particular is one of our first collaborations.
HCI: Is there also an AI governance structure set up to work with outside vendors to vet and pilot new AI tools?
Shah: Yes, I lead our clinical AI review council, and there’s a very formal process for both internal and external AI, including defining what AI is, and what has a clinical impact. We have a clear risk matrix of how likely it is that if the AI fails, it’s going to result in harm. And the more harmful it might be, the higher level of scrutiny required. Our AI review council wants to understand how you’re addressing things like anomalies, hallucinations, and bias. We also look specifically at what is the ongoing monitoring? How are we going to know if, for example, bias is creeping into this?
HCI: When this project started, was DAX already being used across the health system or was this when it was brand new and you were first piloting it?
Smitherman: It was really brand new. The data was collected in late 2022, which was before large language models burst onto the scene into everybody’s consciousness. Dax and Nuance were the first in this space. My last trip before COVID, in January 2020, was to their headquarters to see the technology. All of us were skeptical that this was really going to work. And at that time, Nuance was using the model of needing a human reviewer before they pushed it into the medical record, before a physician used it. So this was really early on.
It’s a credit to the organization that Staci leads that we were able to do it. I mean, as a big organization like ours tries to implement these tools, we want to get them in the hands of clinicians as fast as possible. But it was very new, and, putting the structure around it to do a study like this early on, while you’re trying to do all of the work of making sure it’s safe and getting it in the hands of folks, is a lot of effort.
HCI: The study mentioned that there haven’t been that many studies done in this space yet. Some have surveyed clinicians about their impressions of the impact, but haven’t actually measured it. Is that just because it’s so new or is this kind of research challenging to do anyway?
Wendt: This is one of the first studies to randomize physicians to either an early implementer group or a late implementer group. That actually can be difficult to do, because you need to have buy-in. It creates an added layer of difficulty for the teams who are trying to roll out the technology. Also, you need to have buy-in from your research participants that they might be randomized to receive the technology nine months down the road. And you need to have an idea of what the randomization looks like and be controlling for all those other outside factors.
But when you have randomization, that is the gold standard of research, and it allows you to look at what the actual impact of the tool is. I think another design feature of this study was that we asked the physicians over time for their subjective measurement of their own burnout and their frustration with their documentation. We had a very high engagement rate from all of our research participants over this multi-month period. We also looked at changes for each individual provider over time, relative to what they were experiencing before DAX. We saw a decrease in their burnout, a decrease in their frustration, and then that, paired with objective measurements from Epic Signal data, really just reinforces from this multi-method pathway that there is something going on here with the use of DAX.
Shah: Randomization is really rare and it’s very hard to do, but we’re not only doing this for research. I mean, it’s nice that we have research studies out of it. But we’re really doing this to understand what is the impact internally, and not get persuaded by observational, fun data that people show to say that this has a clear impact, and now let’s scale. So we’re doing this in a number of different areas, where we have high-value, high-impact tools that we think will make a big difference. Let’s randomize, let’s do formal studies in order to drive our business, in addition to contributing to the literature.
HCI: So you are doing this type of study with implementations of other technologies as well?
Shah: Yes, and a lot of those are still in progress in terms of publication, but Staci’s team’s been driving a lot of the work to bring that research rigor to our applied informatics work.
HCI: I have talked to other health system execs who describe doing kind of a taste testing of these ambient AI solutions, where they’re exposing the clinical teams to four different ones to get their feedback. Did you feel a need to look at the other ones like Abridge or Ambience? Or do you feel like you’ve had this positive reaction to DAX and you’ve done this research, so that’s good enough to scale up with it?
Shah: It is a complicated, nuanced question. When you’re operating at our scale, which is one of the largest medical groups in the country, if you look at just our primary care, you have to consider things more than flavors, right? We have to consider things like scalability. What kind of backing do these companies have? What is the likelihood that they’ll sell to somebody else? We have to look at a lot more things than doctors’ opinions.
In addition, we also have to be thinking about their roadmaps and how those roadmaps fit our strategy. We’ve got nursing in our sites, and we need to move forward on nursing. Are those other vendors doing that? So once you start getting down to the specifics of what a large organization like Providence needs, it’s a very small group. We are continuing conversations with some of those other groups to always understand their roadmap, and with Microsoft to understand their roadmap and looking at where they converge. I’ll tell you my opinion, and I believe it is Scott’s opinion as well, is that a lot of these technologies are ready to converge over the next couple of years. So while one may be ahead in one feature, one is ahead in another feature, it’s going to be hard to distinguish. And it is not clear to us right now what the differentiators will be in three or four years. And since we don’t know what they will be in three to four years, going with the horse we have and the relationship with Microsoft we have right now makes the most sense.
HCI: Do you also have to understand and factor in what the EHR vendor Epic itself is doing in the AI space?
Shah: 100%. No matter what solution we implement, no matter what technology, one of our core concepts is workflow integration. The last thing we need to do is add technology that increases burden while it reduces burden, right?
HCI: So how are you looking at AI around areas such as data retrieval, chart summarization and documentation and billing. Are you developing some of those in house?
Smitherman: To build on your earlier question of looking at what our vendor, Epic, is doing, there are lots of areas, like chart summarization. If I sit down and and see a patient, in a perfect world, it would bring up information relevant to the conversation I’m having. Once I start talking to a patient with congestive heart failure, it would be great if the AI was surfacing for me the patient’s last echocardiogram, their last visit with their cardiologist, as well as relevant clinical decision support. Those kind of connections, which would have been sci-fi five years ago, are now going to happen. And it’s a question of how it does happen. Using ambient to document is definitely converging among the vendors, and we’re excited about what’s next and how we’re going to build these connections. The vendors are trying to differentiate themselves by offering one piece of that or another or jumping into a specific space, Our job is to look at this as a whole, evaluate all of our tools and find the right solution. So we’re excited with DAX’s connections to Epic and what it can do right now, and we’re strategically looking at other vendors as well to address some other areas.
HCI: How are you feeling about the pace of scaling up of the ambient tools now? You have such a huge system; does it feel like the uptake has been great? Or are there areas where you have to double down on efforts to get the uptake you want?
Smitherman: It is hard work. I mean, we’ve been at this longer than almost any other health system. We were one of the first to offer an enterprise license to all of our clinicians in the ambulatory space. We’ve been pushing it, but there are certain areas, such as our specialists in certain areas, where it’s been a slower ramp-up than it has been in primary care. Getting doctors to try new things is always difficult, even if some of the benefits seem so obvious to us.
Shah: Not just doctors, but getting anyone to try new things is actually pretty hard. You’re comfortable with what you do. I mean, do you really want to change it, even if it’s better? But once they try it, we have a very low rate of people stopping using it, and that’s actually an area that we’re looking at carefully to understand who has tried it that’s not still using it. That’s important for us to understand.
Wendt: I think there’s also some buy-in that comes with just hearing that there’s a dedication from Providence, and a plan for the tools that we’re using, and that there’s an investment. We’re not taking time for folks to learn a new technology that is just going to go away in a month because we decide we want to pivot. I think that’s something that we’re hearing in some other work that we’re doing, where folks say that knowing that about Providence is very helpful in deciding to sign in and try something.
HCI: Are you also rolling this out in the inpatient setting — in the ED, with nursing and hospitalists? And are the challenges there different?
Shah: I’m gonna put nursing aside for a second. I’ll come back to that. We already rolled this out to the emergency centers, and we’re getting pretty good uptake there. We started our pilot with inpatient just the last week or two, and it seems like it’s going pretty well, but we have very little data so far.
What I could say is that how they have optimized the notes for the ambulatory setting, they’re going to have to spend just as much time optimizing it for those other settings. And I think it’s going to take time, because they need to see examples, right? They need to get enough feedback —in the millions — and they will pretty quickly, and I think it’s going to be a really strong tool in those spaces.
Nursing is tricky, and we’re 100% committed to finding the value of ambient to nursing, but I don’t know that it’s a slam dunk, in terms of the same type of workflow. It’s not a conversation that you’re documenting. Nurses don’t inherently say out loud all the things they do. So what is it they’re recording and what’s the value? What’s the highest and best use of that ambient? I don’t think it’s clear. We’re looking at what the vendors are doing right now, and we’ll see what they’re going to do, although, as we’re looking at it right now, I’m not convinced, because they’re all looking at how do you do structured documentation from ambient, which is cool, but maybe it’s not the right use of ambient. So we’re thinking hard about what we’ll do, in addition to what the vendors are doing.
link
