The interview went beautifully. She arrived five minutes early, asked smart questions about your check-in times, mentioned another host she'd worked with, and laughed at the story about the guest who tried to fit a kayak in the hall closet. You shook hands feeling lucky. Three weeks later you're standing in your unit at 3:40 p.m. — check-in at four — looking at a bathroom mirror still fogged with streaks and a bed made the way a teenager makes a bed when told to.

Nothing about that afternoon contradicts the interview. That's the uncomfortable part. The interview measured exactly what it always measures: how good someone is at interviews. If you want to know how to hire an Airbnb cleaner who will still be excellent on turnover forty, you need to understand why the conversation felt so convincing — and what actually predicts the work.

The Halo Effect Is Running Your Interview

In 1920, psychologist Edward Thorndike noticed something strange in how military officers rated their soldiers: a man judged handsome or likable was also rated more intelligent, more disciplined, a better leader — across traits that have nothing to do with each other. He called it the halo effect, and it has been replicated in one form or another for a century. One warm impression bleeds outward and colors every other judgment we make about a person.

An interview is a halo-generation machine. Punctuality, eye contact, an easy laugh — these are real signals, but they're signals of social fluency, not of whether someone will run a lint roller over the couch when nobody is watching. Related work by Nalini Ambady and Robert Rosenthal on "thin slices" showed that people form stable judgments of others from just seconds of observed behavior. Those snap judgments are impressively consistent. They're just consistent about the wrong thing: they predict how much you'll like someone far better than how they'll perform.

So when the interview goes well, what you've actually learned is that this person is pleasant to talk to. Useful, since you'll be texting them for months. But it's a personality reading, not a performance forecast.

What a Century of Hiring Research Actually Found

The good news is that this question — what predicts job performance before you've seen the job performed? — is one of the most studied in all of organizational psychology. Frank Schmidt and John Hunter's landmark 1998 meta-analysis in Psychological Bulletin pooled roughly eighty-five years of personnel selection research to rank hiring methods by how well they predicted later performance.

The pattern is blunt. Near the top: work sample tests — having the candidate actually do a piece of the job. Also strong: structured interviews, where every candidate gets the same job-relevant questions scored against defined criteria. Down the list: the classic unstructured interview, the friendly free-form chat most of us instinctively run. It isn't worthless, but it's dramatically weaker than watching someone work, largely because it's so easily hijacked by the halo effect.

Translate that to short-term rental hosting and the conclusion writes itself: the single most predictive thing you can do before hiring an Airbnb cleaner is not a better conversation. It's a paid trial clean.

The Paid Trial Clean: Your Work Sample Test

A trial clean is exactly what it sounds like. You pay the candidate your full turnover rate — always pay; free "auditions" select for desperation and start the relationship with resentment — to clean the unit once, under realistic conditions, before you commit.

Realistic conditions matter more than hosts expect. Don't hand over a lightly used unit after a two-night couple's stay. If you can, schedule the trial after a genuinely messy checkout, give them your actual checklist, your actual supply closet, and your actual time window. A work sample only predicts the job to the degree that it is the job. A cleaner who shines with unlimited time in an easy unit tells you little about a ninety-minute window behind a family of five.

Then leave. Seriously — a trial clean with the owner hovering in the kitchen is a different task than the one you're hiring for. The job is cleaning unsupervised. Test that.

Score It Before You See It

Here's where the second lesson from the selection research comes in: structure beats impressions even when you're evaluating real work. If you walk through the finished unit thinking "how do I feel about this?", the halo effect simply relocates from the interview to the walkthrough — you liked her, so the bathroom looks fine.

Instead, write your scoring criteria before the trial, and make them concrete and binary wherever possible. Under the beds or not. Inside the microwave or not. Coffee maker reservoir emptied or not. Duvet corners actually seated in the cover or not. Ten to fifteen checks, decided in advance, each answerable yes or no. Include two or three items a rushed cleaner reliably skips — the top of the refrigerator, the shower drain cover, the inside of the trash can lid — because those items measure the exact trait you're hiring for: conscientiousness when no one is checking.

One more high-signal test that costs you thirty seconds: stage a small problem. A burned-out bulb in the hallway. A remote with dead batteries. You are not testing whether they fix it. You're testing whether they tell you. A cleaner who volunteers "hey, the hall light is out" on a trial — when raising problems feels riskiest — is showing you the reporting instinct that will someday save you from a guest discovering the broken lamp instead.

Fix the Interview You Still Have to Do

None of this makes the conversation useless; it makes the conversation's job smaller. Use it for logistics fit — availability on weekends, backup plans, comfort with same-day windows — and make it structured: the same questions for every candidate, focused on past behavior rather than hypotheticals. "Tell me about a time a job took much longer than expected — what did you do?" outperforms "Are you reliable?" because everyone answers the second question the same way.

Do the same with references. "Was she good?" invites politeness. "What would she say was the hardest part of working with them?" and "Would you rehire, and for what kind of property?" invite information.

What the Trial Can't Tell You

A trial clean is the best predictor available, and it's still a snapshot. It tells you what this person's best looks like — their audition performance. It cannot tell you what turnover forty looks like, in August, third clean of the day, when the standard they're maintaining lives entirely in their memory of a walkthrough from months ago.

That's not a flaw in the person you hired. It's a property of every human system: performance drifts toward whatever gets noticed. The hosts who keep great cleaners great aren't re-running trials — they're running a lightweight loop where the checklist arrives fresh each turnover, completion is confirmed with photos, and problems have an easy path back to the host. Hire with a work sample; retain with a feedback loop.

Where Stayput Fits

This is the gap Stayput was built for. The trial clean tells you whom to trust; Stayput makes that trust durable — each turnover, your cleaner gets the job by SMS (no app to install), confirms the work with photos tied to your standards, and flags low supplies before a guest finds the empty shelf. The same structured, see-the-work principle that made your hiring decision sound keeps making your Tuesday turnovers sound, at $19 a month per property. If you've found the right cleaner and want to keep them that way, take a look at Stayput.