denubis's blurblog

Breaking the [Flight] Rules by waynehale
Thursday April 25^th, 2024 at 7:08 PM

Wayne Hale's Blog

The official NASA history of STS-109 can be found on the agency web page:

STS-109

The last part of that official account reads:

“After a successful launch, flight controllers in Mission Control noticed a degraded flow rate in one of two freon cooling loops that help to dissipate heat from the orbiter. After reviewing the loop’s performance, mission managers gave the crew a “go” to proceed with normal operations. The problem had no impact on any of the crew’s activities. Both cooling loops performed normally on de-orbit and landing.”

The official NASA description of what happened on STS-109 is a lie.

I should know. I was there.

Marianne Dyson and I worked together in Mission Control in the early days of the Space Shuttle program. She had been in touch with Jim Newman, a crew member on STS-109. Jim asked her if she knew how she figured in STS-109 even though she was not working in MCC. Marianne asked me to fill in the story.

Marianne: ‘I was in the Flight Activities Branch and was the book manager for the Post Insertion (nominal) timeline, Launch Day Deorbit, Loss of FES Deorbit, Loss of 2 Freon Loops Deorbit, and ‘If BFS Omit’ procedures. I was responsible for developing and validating all those procedures for STS-1, 2 and 3.’

These were serious and complex checklist procedures for the astronauts to use in flight. Post-Insertion covered the period just after launch when the crew was turning the Space Shuttle into an operational orbital outpost. The Deorbit procedures were all failure responses. Of all the checklists in the pantheon of Space Shuttle procedures, the very hardest to perform was the dreaded ‘Loss of Two Freon Coolant Loops’ power down and deorbit procedure.

To understand, a short discussion of the Flight Rules is necessary. By training and practice the adherence to Flight Rules was burned into the culture of Mission Control. Careful consideration prior to any mission went into the development, review and approval of Flight Rules. During a flight, it was considered a cardinal sin to break a Flight Rule.

Buried deep in the book, page 18-105 (or as my electronic version has it, page 1947 of 2053 total pages), the operative words are found in a table (background rationale following in italics):

Space Shuttle Operational Flight Rules Volume A All flights

Rule A18-1001 Thermal Go/No-Go Criteria

FCL (2)

Ascent Abort if: Invoke MDF if: Enter NPLS if:

2 Lost —- 1 Lost

Loss of one Freon loop requires a PLS because the next failure (loss of other Freon loop) could result in loss of crew/vehicle.

Nominal ascent is continued so that more time is available to reconfigure for a one Freon loop entry and also because loss of one Freon loop is not an emergency. If both loops are lost, an emergency entry (ascent abort) is required because all cooling to the vehicle is lost.

If both Freon loops are lost, an emergency entry is required because the FC stack temperatures will reach the specification operational limit of 250 deg F within approximately 50 minutes. In addition, the electrolyte will reach the 25 percent operational limit in approximately 75 minutes. At this point, continued operation of the FC’s is questionable. This assumes . . .

To decode: this Flight Rule required an immediate abort during the launch phase if two Freon Coolant Loops fail: either Return to Launch Site, Trans-Atlantic Abort, or Abort Once Around depending on when the failures occurred. Loss of only one FCL during launch did not require an abort but entered into the next step.

During the ‘on orbit’ phase of the mission, the failure of one of the two freon loops would result in ending the mission, planning a landing at the Next Planned Landing Site (NPLS) which by definition was within 24 hours. A PLS landing was always to one of our three primary sites: KSC’s Shuttle Landing Facility in Florida, the Edwards Air Force Base in California, or the White Sands Space Harbor in New Mexico. Weather and timing would determine which of the three to use. NPLS minimized the time exposure to the failure of the remaining good loop balanced with the safety of returning to one of the best landing sites. A third category for some failures involved something called a Minimum Duration Flight but that was not an option for this equipment.

As an aside, if both of the freon loops were to fail while on orbit, perhaps while waiting for NPLS, other rules mandated an emergency landing as soon as possible. ELS sites were identified all around the world but did not have the equipment, long runways, and weather forecasting capability that the PLS sites did.

Since my copy of the Flight Rules dates from late in the program, it documents the history of STS-109. Section 18 contains a definition of ‘loss of a freon loop’ with 3 full pages of background ‘rationale’ (all in italic) describing what happened on STS-109 and the subsequent engineering analysis.

The Space Shuttle was an electric airplane; nothing happened without electricity. There was no control, no anything without the electrical power generated by the three fuel cells. Batteries were non-existent. If the fuel cells did not make electricity, the shuttle was a rock.

Fuel cells combine hydrogen and oxygen to produce electricity, water, and lots of heat. That heat had to be removed to condense the water vapor in the fuel cells so it could be removed. If the water was not removed, the fuel cells would ‘flood’ and the chemical process would stop working, electrical generation would cease. The Freon Coolant Loops (FCL) circulated freon as a fluid to collect the heat generated in various parts of the orbiter and transport it to the radiators or the flash evaporators where the heat was dissipated out into space. For redundancy the orbiter had been designed with two loops and each of those had two redundant circulating pumps.

Mission Operations made sure that there was a crew checklist procedure for each and every single item that could break or otherwise fail on the orbiter. Starting before the first Space Shuttle flight, the Mission Control team built step-by-step procedures which were documented, tested, practiced, and validated. Which is to say, proven to work properly with either engineering tests or rigorous numerical analysis.

In a very few cases, there were procedures written for two failures. Since the loss of both freon loops could be catastrophic in a very short time, quick but complex action had to be taken by the crew. This was one of the few checklist procedures to address two like-systems failures. Marianne and a host of other folks worked diligently to provide a way out of that terrible situation. The Loss of Two Freon Loops procedure required powering down much of the electrical equipment on the orbiter to both conserve electricity and reduce the heat generated which had to be removed. The checklist was extremely complex, time consuming, and – worst of all – attempts to validate it were unsuccessful. In other words, working the checklist completely ‘right’ was unlikely to succeed. The probability of LOCV was high.

LOCV – Loss Of Crew and Vehicle.

That is all background to what happened on March 1, 2002.

STS-109 was a mission to service and repair the Hubble Space Telescope. The crew and Mission Control team were well trained, excited about the mission, and dedicated to leaving the Hubble in perfect condition. The Hubble Space Telescope Operations team was anxious to get their instrument fixed. Prelaunch had been difficult with launch scrubs due to weather and technical issues. When STS-109 finally left the ground all of us were pleased.

Ascent Flight Director for the mission was my good friend and colleague John Shannon. The Lead Flight Director was Bryan Austin. I was assigned to be the Mission Operations Director. This was a replay of the team on STS-93, when I got launch fever. I was determined not to fall into that again. See https://waynehale.wordpress.com/2013/10/31/keeping-eileen-on-the-ground-part-ii-or-how-i-got-launch-fever/

My position as MOD was to coordinate with the other members of the Mission Management Team. During the countdown and launch, everybody on the MMT except the MOD was in the Firing Room at the Launch Control Center in Florida. The MMT included the Space Shuttle Program Manager, the JSC, KSC, MSFC, and SSC Center Directors, the Orbiter Project Managers (and project managers of all the other shuttle elements), the Head of the Astronaut Office, the Chief of Space Flight Safety, and almost all the other senior managers in the Space Shuttle Program. The MMT was charged with making the most important decisions, if time were available, regarding any Space Shuttle flight. The MMT was the only body that was allowed, after deliberation, to change a Flight Rule.

When the Public Affairs Officer refers to ‘mission managers’ he means the MMT. The MOD is not authorized to act without their direction. Flight Rules are always to be followed unless the MMT rules otherwise.

Since the countdown had gone so well, and the launch had been delayed, the MMT was really anxious to get home – back to JSC, MSFC, or SSC – as soon as the launch was over. After nominal cutoff of the main engines (MECO), the management team had few short speeches, took part in the ceremony of the beans and cornbread, and quickly headed to the Shuttle Landing Facility to board the Gulfstream II management aircraft for the flights home. Very limited to no communication was available while they were in flight.

In short, those of us in Mission Control were without senior leadership direction for those hours.

Mission Control never considered ‘ascent’ to be over until after the OMS-2 burn put the orbiter into a stable, non-re-entry orbit, and completed various other critical tasks. Among those required was closing the ET umbilical doors on the belly of the orbiter; changing the onboard computer system from launch to on-orbit software configuration; opening the payload bay doors and establishing freon loop cooling through the radiators; checking out the star tracker navigation system. When all those items were completed, the crew was given a ‘go for orbit ops’. Their first step after that was usually to get out of the bulky launch/entry pressure suits, activate the toilet, and start putting away the chairs on the middeck.

Sometime after MECO, sometime after the MMT got on the airplanes, but before getting a ‘go for orbit ops’, the EECOM (Environmental Electrical, Consumables Manager) spoke up. Responsible for the cooling on the orbiter, he pointed out that one of the freon coolant loops was not operating at full flow. The flowmeter in Freon Coolant Loop #1 was showing a flow of only 200 lbs./hour.

The failure limit defined in the Flight Rules was anything less than 211 lbs./hour.

Technically, legally, analytically, FCL #1 was considered failed.

Things got very quiet in the Flight Control Room.

We all knew what that meant.

At that point, theoretically, the Flight Director should declare a First Day PLS (Planned Landing) and the crew should start working the procedures to land at Edwards AFB on orbit 3. The timing of the discussion made that dicey; starting down that path would have required a rush job to be ready to retrofire in about 90 minutes. Also, theoretically, the crew should be directed to perform the Loss of One Freon Coolant Loop power down which was long and involved turning off quite a bit of the redundant equipment. That would leave the vehicle open to other failures.

The Ascent Flight Director started doing what any good FD will do – asking a lot of questions of the EECOM. It was the EECOM’s opinion that the Flight Rule was ‘conservative’, the flow rate was just below the limit, and there was enough flow to at least consider continuing. Flight strongly wanted to get to a stable situation and sort options out.

There was a short discussion with the crew about potential power down. They were told to pull out the loss of one freon coolant loop checklist and review it, but take no action just yet.

Here is the crux of the situation: if the other freon loop – the good one – were to fail, quit, leak, whatever; would the questionable freon loop provide enough cooling to avoid the dreaded 2 Freon Coolant Loop procedure?

Maybe.

As John remembers it: “Definitely one of those “is it failed or not” cases and of course being stable on-orbit while you figure things out is not a bad idea.”

The Flight Director turned around and leaned over the MOD console. John looked at me and said: ‘better tell the MMT’. But I couldn’t. They were in the air.

A decision was required.

I punted. I asked John what he recommended. He was inclined to continue on rather than terminate. I told him I concurred and the flight should continue.

Later on, I did get to have a long conversation with the MMT. Much engineering analysis was turned on and worked on very hard during the entire mission.

FCL #1 never regained full flow during flight. So much for ‘Both cooling loops performed normally on de-orbit and landing.’

In the end, all the analysis indicated we made the right decision. As John recently discussed: “This would be a good case for why you have a flight control team instead of just programming the flight rules into a computer. Human judgment and risk trades are critical to spaceflight operations.”

Indeed.

In a pinch, the low-flowing freon loop would have provided just enough cooling by itself, with an appropriate powerdown, to avoid disaster.

But that does not change the fact that we broke the Flight rule that day.

Weeks after the flight, after all the engineering analysis was complete and double checked, the flight rule was revised. The new limit at which a FCL is considered failed was 163 lbs./hour, less than the old limit of 211 lbs./hour. New procedures were written and passed validation. Much work was done in case the situation should ever happened again.

It never did.

But the decision on STS-109 launch day wasn’t made by ‘the mission managers’. It was John, EECOM, and me.

One final change: when it came my turn to set the rules for the MMT, I added one more step after launch: the MMT had to stay on station at KSC, where there was data and good communications, until after the ‘go for orbit ops’.

Read the whole story

denubis

1 hour ago

reply

Bash dot org may be gone, but I'll carry its spirit with me in 2024.
Thursday April 25^th, 2024 at 5:35 PM

Corey Quinn

Bash dot org may be gone, but I'll carry its spirit with me in 2024.

Read the whole story

denubis

2 hours ago

reply

H5N1 update: We have to do better, faster by Katelyn Jetelina
Thursday April 25^th, 2024 at 7:43 AM

Your Local Epidemiologist

H5N1—also known as bird flu—continues to spread among animals and, for the first time, is spreading among cows. Thankfully, the risk to the public is low, but the more the virus spreads, the more chances it has to mutate and jump species to spread among humans. Given H5N1’s high mortality rate, we don’t want this to happen.

This outbreak is concerning; unfortunately, communication and data transparency have been profoundly lacking. We must do better and faster — repeating the same communication mistakes that fueled confusion, distrust, and misinformation during Covid-19 is inexcusable.

Here’s what we do and don’t know about H5N1.

This post builds off previous YLE updates of the current cow outbreak. To get up to speed, start here.

H5N1 continues to spread among animals

H5N1 been detected among 33 dairy cattle herds in 8 states. The virus is spreading through multiple known pathways: wild bird → cow; cow → cow; cow → poultry, and once from cow → human. Thankfully, there is currently no evidence of human-to-human transmission.

How big is the “true” outbreak? We don’t know. Symptomatic testing of animals and humans is voluntary, and asymptomatic testing is not happening (likely due to industry pressure), which means we are flying blind. It’s been reported that more workers have symptoms — such as fever, cough, and lethargy — but are unwilling to test. We could have more human cases. Among the tests conducted, it’s unclear how many have been done, how many were positive, and how many humans have been exposed.

Two positive updates:

There is a federal rule for moving cows now. As of yesterday, the federal government requires testing all lactating cows before moving across states. Finally! Unfortunately, it’s likely too late to contain transmission.
Pigs have been testing negative. This is good news. Pigs are dangerous hosts for H5N1 because they have avian and human receptors. They are known as “mixing vessels” for influenza viruses.

The outbreak is much larger and started earlier than we thought.

We don’t know how big the outbreak is from physically swabbing animals and humans, but two clues suggest this has been spreading under our noses for a while:

Clue #1: The FDA found H5N1 fragments in the milk supply. This was surprising because milk from known infected cows was not going to the market. This confirms that the cow outbreak is bigger than previously known.
Thankfully, milk pasteurization should deactivate any bird flu that makes it into the milk supply. Scientists confirmed this yesterday: they could not grow active virus from the milk samples. This means the virus fragments detected in milk were broken pieces that cannot replicate and, thus, cannot harm humans. The FDA is testing more samples just in case. Thus far, the public has seen no data.

Clue #2: Genomic surveillance can tell us how, where, and when H5N1 mutates. After being reluctant to share genetic data, USDA finally shared 239 viral sequences from animal infections. However, the data was incomplete—the date and location of the sample collection were not included, making it difficult to answer key questions about how the virus has changed over time and, therefore, predict where it might end up.
Also, there was no mention of what USDA scientists found with this data internally. We’ve relied on scientists on social media to walk through what they found after analyzing the raw data, which suggests an estimated spillover to dairy cows starting in December.

Wastewater is spiking. Is it H5N1?

Some on social media discovered wastewater spiking in some places with infected herds. For example, in Amarillo, Texas, wastewater is skyrocketing for flu A while the rest of the state remains low.

Could this be H5N1? Yes, but we don’t know for sure; the government has not officially shared anything about wastewater. This is incredibly disappointing, given this is the perfect use case for wastewater monitoring.

While we don’t have a wastewater test for H5N1 yet, it has many similarities to flu strain A. Some wastewater systems include human wastewater and stormwater. Given that Amarillo has stockyards, this strengthens the possibility that it’s H5N1, especially since we aren’t seeing it in other parts of the state. The most likely scenario? The spike is from milk dumping or animal sewage. We need more data.

The tools work, but…

The U.S. government has confirmed that Tamiflu and the stockpiled H5N1 vaccines are predicted to have efficacy if this does move to humans. This is great news. And I agree there shouldn’t be panic. But I caution leaders against the “we’re fine, we have vaccines” attitude.

What about manufacturing and supply?
What about the rest of the globe?
What about vaccine hesitancy?
And decline in trust?
And access problems?

Of course, fully relying on vaccines and overconfidence were among the big mistakes of the Covid-19 emergency. As one of our colleagues wrote, “It’s best to face these threats with humility and determination.”

A communication void will be filled with confusion, mistrust, and misinformation

Information from the response has not been easy to find, has not been complete, and has not been backed with data, leaving many of us to piece together a fuzzy picture.

This is a big problem for many reasons:

Misinformation brews in information voids. People, rightfully so, have questions and can’t find answers.
Trusted messengers don’t know what’s going on. During an outbreak, top-down, credible, and consistent communication is necessary. Equally important is actively equipping trusted messengers—mass media, scientists, physicians, and community leaders—so they can communicate from the bottom up.
Tools could be at risk. For example, wastewater surveillance is one of the only tools that wasn’t weaponized during Covid-19 emergency. My biggest concern is that if we aren’t transparent—walking the public through what we are and are not doing with wastewater—people will become hesitant about this surveillance.
Chips away at credibility. Although brilliant scientists are within each agency, their expertise isn’t shining through.
Harming global capacity to respond. This is an emerging global pandemic threat. Other countries need to know, for example, if they should start testing their cows. If they should be looking for mutations, and where.

We need a coordinated response from our government. Yes, there are multiple players involved. And yes, they have their own priorities, legal authorities, agility, experience, and politics. However, honest, frequent, direct communication earns the public’s trust and confidence. If not, communities are starved for good information during outbreaks or emergencies, leading to unnecessary anxiety, confusion, and frustration.

After much pressure, government agencies finally hosted a live briefing for the media yesterday. This is a positive step in the right direction and, I hope, a sign that the winds are changing.

Bottom line

Responses need to get better faster. H5N1 is a dangerous disease that can affect our economy, food security, and animal and human health. This response has been incredibly difficult to watch on the heels of Covid-19 (and mpox and other emergencies like the East Palestine train accident). We get just so many “practice runs” before it starts costing lives again.

Love, YLE and Dr. P

“Your Local Epidemiologist (YLE)” is written and founded by Dr. Katelyn Jetelina, M.P.H. Ph.D.—an epidemiologist, wife, and mom of two little girls. During the day, she is a senior scientific consultant to several organizations, including CDC. At night, she writes this newsletter. Her main goal is to “translate” the ever-evolving public health world so that people will be well-equipped to make evidence-based decisions. This newsletter is free, thanks to the generous support of fellow YLE community members. To support this effort, subscribe below:

Subscribe now

Kristen Panthagani, MD, PhD, is a resident physician and Yale Emergency Scholar, completing a combined Emergency Medicine residency and research fellowship focusing on health literacy and communication. You can find her on Threads, Instagram, or subscribe to her website here. Views expressed belong to Dr. P, not her employer.

Read the whole story

denubis

12 hours ago

reply

Quoting James Betker
Thursday April 25^th, 2024 at 5:07 AM

Simon Willison's Weblog

I’ve been at OpenAI for almost a year now. In that time, I’ve trained a lot of generative models. [...] It’s becoming awfully clear to me that these models are truly approximating their datasets to an incredible degree. [...] What this manifests as is – trained on the same dataset for long enough, pretty much every model with enough weights and training time converges to the same point. [...] This is a surprising observation! It implies that model behavior is not determined by architecture, hyperparameters, or optimizer choices. It’s determined by your dataset, nothing else. Everything else is a means to an end in efficiently delivery compute to approximating that dataset.

— James Betker

Read the whole story

denubis

15 hours ago

reply

All the cloud vendors say they’ve been “working on AI for decades,” but Intel wa...
Wednesday April 24^th, 2024 at 9:49 PM

Corey Quinn

All the cloud vendors say they’ve been “working on AI for decades,” but Intel was shipping Pentiums that were bad at math in 1994.

Read the whole story

denubis

22 hours ago

reply

The Machine of Eternal Summer 022 by Novil
Wednesday April 24^th, 2024 at 7:53 PM

The Adventurous Scarlet Carolus and the Machine of Eternal Summer

Read the whole story

denubis

1 day ago

reply

Breaking the [Flight] Rules by waynehale Thursday April 25th, 2024 at 7:08 PM

Bash dot org may be gone, but I'll carry its spirit with me in 2024. Thursday April 25th, 2024 at 5:35 PM

H5N1 update: We have to do better, faster by Katelyn Jetelina Thursday April 25th, 2024 at 7:43 AM

H5N1 continues to spread among animals

The outbreak is much larger and started earlier than we thought.

Wastewater is spiking. Is it H5N1?

The tools work, but…

A communication void will be filled with confusion, mistrust, and misinformation

Bottom line

Quoting James Betker Thursday April 25th, 2024 at 5:07 AM

All the cloud vendors say they’ve been “working on AI for decades,” but Intel wa... Wednesday April 24th, 2024 at 9:49 PM

The Machine of Eternal Summer 022 by Novil Wednesday April 24th, 2024 at 7:53 PM

Breaking the [Flight] Rules by waynehale
Thursday April 25^th, 2024 at 7:08 PM

Bash dot org may be gone, but I'll carry its spirit with me in 2024.
Thursday April 25^th, 2024 at 5:35 PM

H5N1 update: We have to do better, faster by Katelyn Jetelina
Thursday April 25^th, 2024 at 7:43 AM

Quoting James Betker
Thursday April 25^th, 2024 at 5:07 AM

All the cloud vendors say they’ve been “working on AI for decades,” but Intel wa...
Wednesday April 24^th, 2024 at 9:49 PM

The Machine of Eternal Summer 022 by Novil
Wednesday April 24^th, 2024 at 7:53 PM