I guess at least HR doesn’t have to read 1,000 resumes. Heck, to be frank, could they make sense of the first 10 resumes?
dc3k · 2026-06-29 05:01:09 UTC
Disregarding the fact that this thing is completely broken, its grading rubric is ridiculous to begin with (as was mentioned in the article itself, but I must reiterate how completely stupid this is):
> 35 points for open source contributions
> 30 for personal projects
I don't contribute to open source or have personal projects because I don't spend my free time doing what I do 40 hours a week to make a living. My 15 years of work experience is worth a maximum of 25%, so any company using this idiotic system would pass on me immediately. Open source and personal projects are fine, but in no sane world are they worth 65% of a resume's score.
adrianN · 2026-06-29 05:11:44 UTC
They are selecting for people who are fine working in their free time. If you contribute to open source you are more likely to contribute to the company on weekends. If instead you have other hobbies or a family that takes up non-work hours you are more likely to drop your pen after forty hours.
emj · 2026-06-29 05:25:24 UTC
You might have numbers on that but after working in a place with a strict no more than 40 hour policy my view is that people overwork for many reasons. Being an open source enthusiast is not one of them.
stevesimmons · 2026-06-29 05:48:47 UTC
I'm not sure that follows. I stopped making open source contributions when I switched from mature companies to startups.
Now all my "non-work" time is spent on startup work. And none of that is visible via GitHub.
matheusmoreira · 2026-06-29 05:49:32 UTC
Maybe they're selecting for intrinsic motivation. People who enjoy programming to the point they do it for fun, not just because it pays.
Free software work doesn't imply we work for free. We work on our projects, the stuff that we actually enjoy working on. Nobody is going to work on corporate products without adequate compensation.
lukan · 2026-06-29 06:16:44 UTC
"Nobody is going to work on corporate products without adequate compensation."
I guess there sadly are many nobodies who do this to hope to become somebody.
matheusmoreira · 2026-06-29 06:19:19 UTC
If the open source work is part of a hiring pipeline, sure. Contribute to some repository and have it serve as a resume that gets you hired is also a form of compensation. If the work is also enjoyable, then it's a win either way.
another-dave · 2026-06-29 09:09:48 UTC
> If you contribute to open source you are more likely to contribute to the company on weekends
I wonder if that assumption is bourne out in reality though?
I'd imagine if someone's OSS contributions are enough of a factor that it's worth hiring them, they're not going to drop it on a whim to work extra hours on the day job.
(Assuming you weed out open source contributions like "I made a todo list app in React but licenced it as MIT" or "I fixed a typo in the docs for NextJS". )
jerrythegerbil · 2026-06-29 05:01:24 UTC
> I fail 65% of the time. Same exact resume, different luck.
As someone who’s run hiring pipelines for technical roles in the past few years, that’s actually a fantastic number. I objectively hate saying that, but it’s true.
35% chance of elevating a technical individual to the next stage with no effort? I’ve seen as many as 100+ applicants an hour even when including a domain specific screener question. That’s 35 “screened” applicants in an hour. Were valid candidates screened out? Yes. Does you still have a candidate pool 35x larger than you need? Unfortunately, also yes.
The volume of applicants is SO HIGH such that your chances of getting moved to the next stage are actually markedly worse if AI isn’t involved. If you didn’t apply immediately (using an AI bot) there’s 50+ people ahead of you, and an exhausted technical leader if they ever make it to your resume.
Referral bonuses exist for a reason.
kyralis · 2026-06-29 05:06:54 UTC
Is it? Or is it a 65% chance of a resume getting ignored before a single human sees it, reducing your pipeline's likelihood of catching qualified candidates by the same?
Gates that reduce resume flow-through are only useful if their reduction is correlated with quality. Otherwise they're just dragging out your hiring process or unnecessarily causing you to ultimately lower your hiring bars.
bagels · 2026-06-29 05:10:54 UTC
The goal for the interviewer is to have a much higher ratio of good/bad candidates after the first screening. This means the more costly time you spend on the second step has a better return.
jerrythegerbil · 2026-06-29 05:23:06 UTC
> Gates that reduce resume flow-through are only useful if their reduction is correlated with quality.
The volume is infeasible to review everyone for quality, even at an hour scale. The conclusion and solution is inevitable, though I wish it were different. 35% is actually really good if you’re not coming in through a referral.
The current reality is <1% and the person reviewing you is exhausted.
Brian_K_White · 2026-06-29 05:31:40 UTC
This reasoning isn't.
sevenzero · 2026-06-29 05:40:42 UTC
What a inhumane way of looking at this. Hiring is deeply flawed, you know it, and yet you keep job postings open for weeks/months in case "the one" magically appears on your doorstep instead of just interviewing 10-20 people and just pick one...
Corpo bullshittery at its finest.
LinXitoW · 2026-06-29 08:47:19 UTC
What's the alternative? Everyones up in arms, but I see ZERO viable alternatives proposed.
If you have 1000 applications for every job, and you know that a bunch of these applications are "a bad fit", to put it mildly, you have to filter. And you cannot realistically give every resume a good, human look. By the time HR would be done, the market has already moved on five times.
So, what is the real difference between being overlooked because HR could only look at the first 100 resumes, or the AI filtered all 1000 resumes down to 100? In the end, a fuckton of potentially great people get their feelings hurt either way.
sevenzero · 2026-06-29 09:07:15 UTC
>instead of just interviewing 10-20 people and just pick one
Here's a realistic proposition. HR just wants to inflate numbers so that they seem busy looking for the right fit. Keep posting open for 1 week, manually filter for another week, invite people, employ one. Plenty of people with degrees looking for jobs right now, I don't see what's the issue with just trying one. Companies desperately look for the "magic" applicant that checks all boxes, while also trying to pay them almost minimum wage.
kasey_junk · 2026-06-29 10:29:30 UTC
If your hiring pipeline is employing a filter that a) is not better than a random chance and b) is expensive to implement get rid of the filter.
Instead of spending all those resources on resume filtering, hire resume blind. Instead of using llms for a thing they are bad at (subjective decision making) use them to build a deterministic process that isn’t.
Use work sample hiring as the filter. Make the work sample automatic to sign up for and judge.
RugnirViking · 2026-06-29 11:18:38 UTC
great question. The alternative is not accepting 1000 applicants. Nobody said you have to keep up your job posting for two weeks, or two hours for that matter. stop once you have enough. Enough is defined by whatever number you would have filtered to. In the rare case none of the first ten applicants were appropriate, just open it again until youve got another tranche.
Arodex · 2026-06-29 11:58:21 UTC
That's just another type of randomness (who was online during the short time the posting was opened).
sevenzero · 2026-06-29 12:03:32 UTC
At least this would not force applicants to fine tune their applications to the latest LLM bullshit bingo.
RugnirViking · 2026-06-29 13:00:53 UTC
right. But if you go online and look for a job, then the ones you are available at that moment will actually read your application
Xirdus · 2026-06-29 13:37:07 UTC
"Being online during the short time" heavily favors bots. In a way, AI screening tools saved us from the future of everybody buying resume-spamming-as-a-service because it became as important to use these as getting a college degree.
jarito · 2026-06-29 12:04:06 UTC
You are assuming quality applicants are evenly distributed in terms of time of application - they aren’t. If you cut off at 100, you will only get a sample of people spewing fully automated application bots which mostly aren’t what you want.
MichaelDickens · 2026-06-29 13:54:43 UTC
If that's true, then it suggests an easy fix: leave your application up for four hours, then discard all applications you get for the first two.
Xirdus · 2026-06-29 13:45:16 UTC
> If you have 1000 applications for every job, and you know that a bunch of these applications are "a bad fit", to put it mildly, you have to filter. And you cannot realistically give every resume a good, human look.
At 10 seconds per resume, it would take you 3 hours to go through all 1000 resumes. I don't know what you consider "good" and "human", but my human eyes could easily do good enough, fully manual pre-screening at a rate of 1 requisition per day.
BeetleB · 2026-06-29 18:30:58 UTC
> At 10 seconds per resume, it would take you 3 hours to go through all 1000 resumes.
At 10 seconds per resume, I would not assume that you're screening better than the LLM.
AlexeyBelov · 2026-06-30 06:48:08 UTC
You could remove the least relevant resumes very quickly. Maybe not 10 seconds, but 30 seconds per resume for sure.
What I hear happens now: people apply for a "Senior Golang SWE" with 2 years of experience with C#. Or, relevant for hiring in the US, the job posting says the visa is required, but people apply without it anyway.
bee_rider · 2026-06-29 14:21:17 UTC
It’s weird because unemployment is still quite low, right?
Maybe a platform could be designed where candidates have one account for multiple companies, and the number of applications on the platform is limited to, say, ten per person per month or something. To get people to be selective. I don’t think this should be the only way to apply, but maybe the companies involved could look there first.
falsemyrmidon · 2026-06-29 07:07:28 UTC
You may as well just randomly pick 65 to discard, if your only goal is to reduce the number for review.
ayuhito · 2026-06-29 11:14:49 UTC
That’s exactly it for large scale hiring with finite resources.
It’s all probabilities in the end. And if an LLM gives you more a more relevant pool vs random distribution, that’s still a net benefit.
aesthesia · 2026-06-29 05:54:28 UTC
So the question is: is the score given by this system correlated with candidate quality? I don't think this post gives enough data to know.
lowbloodsugar · 2026-06-29 05:54:58 UTC
Except the bit about ranking a decades long S3 engineer lower than an intern with GitHub repo.
PufPufPuf · 2026-06-29 06:25:13 UTC
In that case, I have a pre-screening system to sell you. Through state of the art technology, it only lets through the best* 1% of applications.
*According to our proprietary, undisclosed, non-deterministic metric, which may or may not be Math.random
I worked at a startup that judged their hiring pipeline quality using rejection rate criteria.
spike021 · 2026-06-29 06:28:12 UTC
there have got to be better ways to optimize pipelines. maybe set a limit on number of applications for a role based on the number you/your team can reliably go through them. if more are needed then open the role for another wave of applications.
ludicrousdispla · 2026-06-29 06:56:26 UTC
So the logical solution is for candidates to submit multiple applications with slight variations to their contact info, "John Schmidt", "John J. Schmidt", "John J. J. Schmidt", "John Jacob J. Schmidt", "J. J. Jingleheimer Schmidt", etc.
ambicapter · 2026-06-29 14:52:05 UTC
It's a good day to have 3 middle names.
yuliyp · 2026-06-29 15:30:42 UTC
Hey, that's my name too!
Terr_ · 2026-06-29 19:25:17 UTC
Whenever I send them out
The filters always route:
"Spammer: John Jacob Jingleheimer Schmidt"
[N/A] [N/A] [N/A] [N/A]
IshKebab · 2026-06-29 07:19:26 UTC
I wonder if you could solve this for programming specifically as follows:
1. Give them some easy leetcode questions. Nothing that a competent programmer would have any problem with.
2. If they pass, ask for a deposit of like $20. Shouldn't be an issue for people who are actually serious.
3. Do more simple leetcode questions but this time on zoom so you can tell if they are using AI. If they pass that they get the deposit back.
(Yeah I know there are real-time interview cheat AI programs but based on what I've seen on demos of them it's super obvious when they're being used.)
Probably not practical but just a thought!
never_inline · 2026-06-29 11:36:25 UTC
This selects for desperation.
jghn · 2026-06-29 14:29:50 UTC
I'm not going to do any of those 3 things for a would-be employer.
IshKebab · 2026-06-29 15:20:05 UTC
They don't seem like unreasonable things to me so I guess it also helps filter out unreasonable people!
hju22_-3 · 2026-06-30 08:22:10 UTC
Number 2 is so unusual I would immediately flag the company as a scam. How is asking for money a reasonable approach? There are alternative ways, easier too depending on country, and all you'd be doing is selecting for desperation while also spreading bad rumors from all the people who nope out when they see such malpractice.
recursivecaveat · 2026-06-29 07:42:57 UTC
If you have no requirements for accuracy, you can just advance 35% of applicants at random.
If the first 50 people who apply are all bots, why are you reading resumes in order of submission?
wodenokoto · 2026-06-29 12:11:29 UTC
One of the first things you do when hiring is to set a period and randomize order of resume when reviewing because early application is not a strong signal.
mrhottakes · 2026-06-29 17:03:59 UTC
Sounds like you're pretty bad at hiring pipelines.
heldrida · 2026-06-30 14:47:06 UTC
Are you suggesting using AI/Agents to apply for roles?
dvt · 2026-06-29 05:07:08 UTC
An alarming number of people don't understand that LLMs work via purely stochastic processes, so I'm happy to see in-depth pieces like this. I'm looking for a job and maybe this is why it's so hard to get a callback these days: resumes are just dumped in some LLM black hole and no one really knows how it works. The author says:
> temperature 0.1 — low, supposedly nudging the model toward deterministic outputs
This is not correct (and is briefly touched on later in the piece when he sets temperature to 0), temperature is not some kind of "deterministic" switch, but rather it affects the sampling distribution (which becomes more "spiky"—but is still very much a distribution).
bluechair · 2026-06-29 05:10:55 UTC
Willing to be corrected but I believe this type of automated resume filtering is illegal. Not saying it never happens but my understanding is it is not typical.
small_scombrus · 2026-06-29 05:17:56 UTC
They don't need to actually filter/blackhole to have have the same virtual effect.
Show someone a list of resumes with an "applicant score*" and they'll naturally ignore the ones with a low ranking
*scores are generated with AI, mistakes may be made, use only as a guide and verify results
thayne · 2026-06-29 05:25:15 UTC
I would expect that to depend on jurisdiction.
I don't know for sure, but I would be surprised if it was illegal in my particular US state. You might be able to argue the AI has inherent biases that introduce illegal discrimination in the hiring process, but my understanding is winning I case like that would be very difficult, especially since most employers are very cagey about their hiring process and why they mades a decision.
ivan_gammel · 2026-06-29 05:41:11 UTC
In situations when you get hundreds of applications for one open position (real market now), whatever reduces your pool to the size a human can handle, works. You can preserve some diversity metrics in the process. This particular filtering is rather primitive, but LLM as a first filter can definitely do the job. You may burn less tokens than the hourly rate of your HR and it will be fairer than just dumping 50% of unread CVs in trash.
369548684892826 · 2026-06-29 06:48:20 UTC
Great until someone realises you’ve filtered out minority groups from the application process (most developers are men so maybe the LLM decided they’re the best fit, but you’ll never know exactly why it screwed your over) and you suddenly have an expensive lawsuit
TeMPOraL · 2026-06-29 09:51:39 UTC
LLMs are DEI-aware, as over past few years, their vendors all had various high profile news stories with their models and their default biases, so it's more likely they'll heavily discriminate in favor of minority candidates, not against them. Still, in both cases it would indicate whoever is operating the system is doing a really, really lazy job. It's really not hard to test and supervise LLMs on tasks where they give you mere 2-10x leverage, and prompt adherence today is much better than it was 3 years ago.
this happened a decade ago when a US courted tried to make sentencing decisions via ML. it was easialy demonstrated that the training data was flawed because the justice system was flawed so the data it was trained on was weighted against minorities because it oversampled because you know, police routinely oversample and poverty for es oversampling
nonetheless, people will defend history as perfect and say those samples, like nepo babies, are "perfect".
ivan_gammel · 2026-06-29 14:28:12 UTC
What „not so smart“ person would filter minority groups out of the process in 2026? It‘s more likely that 90/10 gender disbalance will be converted to 60/40 or even 50/50. Diverse teams are more fun and stable.
dgellow · 2026-06-29 06:28:17 UTC
Illegal where?
elric · 2026-06-29 07:06:21 UTC
Under GDPR, you have the right to request manual processing whenever personal data is processed automatically to make a decision about you that has "significant impact". Not being hired seems like it would qualify.
aesthesia · 2026-06-29 05:50:23 UTC
A distribution with all probability mass on one outcome is deterministic, so in principle, setting temperature to 0 _should_ result in deterministic outputs. There are a few reasons it might not, but I don't think any of these apply when running a local model like the author did.
valzam · 2026-06-29 05:59:05 UTC
I mean the easiest explanation would be that the model harness doesn't always take the most likely token but does top-k sampling or similar. temperatur just means that probabilities get more and more equalized, boosting the chance that an unlikely token gets picked. but even with temp 0 you could have 0.8 T1, 0.19 T2, ... and sometimes sample T2
aesthesia · 2026-06-29 06:07:14 UTC
No, this can't happen at temperature 0. The formula defining temperature-adjusted softmax isn't strictly defined at 0, but taking the limit (in the case where all logits are distinct) results in probability 1 being placed on the largest logit. Samplers will typically special case temperature 0 and pick the most likely token at each step.
dvt · 2026-06-29 06:13:29 UTC
This is a very authoritative answer that should be more nuanced and caveated as implementation-dependent. In some cases, repetition penalties take precedence over sampling; top_k and top_p can also be handled before or after the temperature step. In other cases, `0` is turned into like 1e-10 or some super tiny float value (which can drift if you do any arithmetic with it). Routing, quantization, etc. can also have an effect on sampling. And yes, in some cases, setting temperature to 0 can mean "pure greedy decoding" which makes the decoder about as deterministic as it can get.
easygenes · 2026-06-29 06:12:47 UTC
There are. If the kernels are nondeterministic (e.g. timing issues) there are minor changes between runs, on a single system, even with eager decode enabled (typically what temperature=0 achieves).
IshKebab · 2026-06-29 06:13:50 UTC
Setting the temperature to 0 should give deterministic results but that's not any better - it's just hiding the huge variance by only taking one sample.
317070 · 2026-06-29 06:21:29 UTC
> so in principle, setting temperature to 0 _should_ result in deterministic outputs
It is a common misconception, but it is not true even in principle. If I have 2 or more logits which are equal to the maximum of my logits, I will sample uniformly random from them with any temperature, even zero. Sampling from softmax([1, 0, 1]) is still stochastic at temperature 0, because the limit is to sample uniformly from the first or the last element.
Anyway: "GPUs don't do deterministic matrix multiplications" is the biggest source of randomness in LLMs. GPUs put the associativity of the sums in matrix multiplications in arbitrary order, and this has a huge impact on the logits coming out of the neural network.
EvgeniyZh · 2026-06-29 06:34:41 UTC
You don't have to sample uniformly. You could take the lowest index of all maxima.
But yeah, the main source of randomness is non-deterministic matmul, and temperature does nothing with it
DougBTX · 2026-06-29 08:21:08 UTC
> GPUs put the associativity of the sums in matrix multiplications in arbitrary order
That’s user-controlled too, not an inherent property of GPUs:
The matrix multiplication is only deterministic for sparse-dense products under these settings:
> torch.bmm() when called on sparse-dense CUDA tensors
And it's not listed under the operations that raise an exception otherwise, so I'm not sure the docs promise that dense-dense matrix-matrix products are deterministic.
DougBTX · 2026-06-29 12:02:23 UTC
Oh, thanks, that’s interesting, I thought it covered that too!
jstanley · 2026-06-29 08:22:09 UTC
> "GPUs don't do deterministic matrix multiplications" is the biggest source of randomness in LLMs.
But this isn't a fundamental property of LLMs, it's just an implementation detail. It's pretty obvious that if you evaluate the matrix multiplications correctly and deterministically sample from the highest-probability outputs, you will have a deterministic LLM.
vbarrielle · 2026-06-29 08:38:47 UTC
It may be an implementation detail, but in practice, if the only way to get a deterministic output is to run on the CPU, then it's not going to be usable.
317070 · 2026-06-29 09:53:13 UTC
Actually, Google's TPUs are also deterministic!
Dylan16807 · 2026-06-29 11:00:17 UTC
You can tell GPUs what order to do math instructions in.
croes · 2026-06-29 07:24:25 UTC
So you would get always the same result, but it could be the wrong one
srdjanr · 2026-06-29 07:36:28 UTC
Of course, nothing can guarantee the right answer from LLMs
make3 · 2026-06-29 06:32:30 UTC
A more spikey distribution exactly makes the distribution closer to deterministic. That's not the point though. Even in greedy (deterministic) decoding, it is still a black box though that reacts in ways ways that are unpredictable to the inputs. Switching one word around might lead to different scores for example.
fluoridation · 2026-06-29 10:25:31 UTC
Yeah, this is the forest that the people arguing about math trees are missing. It doesn't matter that the algorithm is deterministic if the algorithm passes the input through a cryptographic hash function to make a yes/no decision. The result may be perfectly reproducible and still non-sensical in its distribution with respect to its input domain.
miki123211 · 2026-06-29 07:04:10 UTC
In theory, temperature 0 does make the LLM deterministic.
Well, in theory theory, temperature 0 doesn't really exist. Mathematically, as lim temperature->0, the distribution gets spikier and spikier, the most likely sample goes to almost-but-not-quite infinity and the rest go to almost-but-not-quite 0. In practice, temperature=0 is literally a separate branch of an if statement that just picks the most common sample (using the actual formula that works for non-zero values would cause a zero division).
However, due to things such as batching and even different kinds of floating point imprecisions for different algorithm implementations, the probability distribution itself often differs run-by-run, so what you sample from it also differs.
sigmoid10 · 2026-06-29 07:12:40 UTC
>in theory theory, temperature 0 doesn't really exist.
It does exist very much, even if you go to pure math. Look at the softmax function and take the limit as T->0. It becomes a dirac-delta function. I.e. in a discrete setting (like for LLMs with a finite set of output tokens), probability P becomes one for argmax and 0 for everything else. Only in coding practice it is easer to implement T=0 as a simple if check that directly chooses argmax instead of calculating the limit of some function that includes 1/T quotients. But setting T to zero is in both, theory and practice, turning the usual probability function into greedy sampling.
317070 · 2026-06-29 09:51:03 UTC
> Look at the softmax function and take the limit as T->0. It becomes a dirac-delta function.
In pure math, it does not always do that. It becomes a dirac-delta comb with equal weight on every maximum. There can be more than 1 maximum. Setting the temperature to zero turns into greedy sampling, but greedy sampling is not necessarily deterministic as you can have multiple equally optimal options.
sigmoid10 · 2026-06-29 10:44:16 UTC
That is not a problem for LLMs, because in practice floating point inaccuracies (in particular after exponentiation) prevent values from being exactly equal. That's why greedy sampling generally produces deterministic output for LLMs. The real gotchas are elsewhere (like with batch inference as we've seen with earlier GPTs). But unlike what the earlier comment says, this is a non-issue mathematically.
skissane · 2026-06-29 10:59:23 UTC
> That is not a problem for LLMs, because in practice floating point inaccuracies (in particular after exponentiation) prevent values from being exactly equal
Any two tokens ending up with the exact same logit is very unlikely, but not impossible; and as the number of output tokens grows, the odds that it will happen eventually gets higher and higher.
I suppose, to ensure determinism, rank by logit then token ID, so you still have a deterministic winner even if occasionally two tokens get precisely identical logits.
spott · 2026-06-29 19:16:32 UTC
You aren't looking for a random set of tokens that have the exact same logit, you are looking for the largest n tokens to have the exact same probability.
This is exceedingly unlikely, as training will only push one of them up for any individual sample. There are likely some pathological situations that could end up with that situation, maybe, but it is pretty unlikely in a general case.
StilesCrisis · 2026-06-29 12:38:55 UTC
"Makes unlikely" is very different from "prevents."
If there's one counterexample, it's not really deterministic.
rkozik1989 · 2026-06-29 13:16:45 UTC
Exactly, consider the scenario where laws are at play and violating them could cost companies thousands. Recently my father received a 'request for address' letter addressed to me at his nursing home, the building has always been a nursing home, and he's also in his mid-70s. That's very obviously a violation of the Fair Debt Collection Practices Act. Imagine the implication of this if the law firm in questions used an AI-assisted data enriching product to find this information. That SaaS company is not only liable to that one law firm but every law firm who uses their software. Its potentially a federal class action lawsuit.
My point is, deterministic logic matters in certain circumstances 100% of the time. Forcing the LLM to make something unlikely is not good enough because a series of mistakes could very quickly bankrupt the company.
Lerc · 2026-06-29 14:27:46 UTC
>My point is, deterministic logic matters in certain circumstances 100% of the time. Forcing the LLM to make something unlikely is not good enough because a series of mistakes could very quickly bankrupt the company.
If your argument is that the danger of equal values being selected inconsistently breaks determinism, that's a trivial problem to solve.
Any non-infinite precision numbering system by definition is at the limits of it's precision when equal values occur. If you need to order such values you can extend the precision and add on a deterministically unique tiny value (position, order encountered, etc.) . Your original value stays in the same precision range but they are now unique.
It's usually more likely that you want to sacrifice a little precision for determinism so you can quantise to allocate the range where you apply the unique ID
For example if you had an array of 256 fp32 values but you required them to be unique, you can lop off 8 bits of mantissa and replace it with its index in the array, Every value is then unique.
Granted token dictionaries make for some fairly hefty indexes now, but the principle applies in general, it's easily solvable if you are prepared to spend some precision or do some extra calculation.
Comments
> 35 points for open source contributions
> 30 for personal projects
I don't contribute to open source or have personal projects because I don't spend my free time doing what I do 40 hours a week to make a living. My 15 years of work experience is worth a maximum of 25%, so any company using this idiotic system would pass on me immediately. Open source and personal projects are fine, but in no sane world are they worth 65% of a resume's score.
Now all my "non-work" time is spent on startup work. And none of that is visible via GitHub.
Free software work doesn't imply we work for free. We work on our projects, the stuff that we actually enjoy working on. Nobody is going to work on corporate products without adequate compensation.
I guess there sadly are many nobodies who do this to hope to become somebody.
I wonder if that assumption is bourne out in reality though?
I'd imagine if someone's OSS contributions are enough of a factor that it's worth hiring them, they're not going to drop it on a whim to work extra hours on the day job.
(Assuming you weed out open source contributions like "I made a todo list app in React but licenced it as MIT" or "I fixed a typo in the docs for NextJS". )
As someone who’s run hiring pipelines for technical roles in the past few years, that’s actually a fantastic number. I objectively hate saying that, but it’s true.
35% chance of elevating a technical individual to the next stage with no effort? I’ve seen as many as 100+ applicants an hour even when including a domain specific screener question. That’s 35 “screened” applicants in an hour. Were valid candidates screened out? Yes. Does you still have a candidate pool 35x larger than you need? Unfortunately, also yes.
The volume of applicants is SO HIGH such that your chances of getting moved to the next stage are actually markedly worse if AI isn’t involved. If you didn’t apply immediately (using an AI bot) there’s 50+ people ahead of you, and an exhausted technical leader if they ever make it to your resume.
Referral bonuses exist for a reason.
Gates that reduce resume flow-through are only useful if their reduction is correlated with quality. Otherwise they're just dragging out your hiring process or unnecessarily causing you to ultimately lower your hiring bars.
The volume is infeasible to review everyone for quality, even at an hour scale. The conclusion and solution is inevitable, though I wish it were different. 35% is actually really good if you’re not coming in through a referral.
The current reality is <1% and the person reviewing you is exhausted.
Corpo bullshittery at its finest.
If you have 1000 applications for every job, and you know that a bunch of these applications are "a bad fit", to put it mildly, you have to filter. And you cannot realistically give every resume a good, human look. By the time HR would be done, the market has already moved on five times.
So, what is the real difference between being overlooked because HR could only look at the first 100 resumes, or the AI filtered all 1000 resumes down to 100? In the end, a fuckton of potentially great people get their feelings hurt either way.
Here's a realistic proposition. HR just wants to inflate numbers so that they seem busy looking for the right fit. Keep posting open for 1 week, manually filter for another week, invite people, employ one. Plenty of people with degrees looking for jobs right now, I don't see what's the issue with just trying one. Companies desperately look for the "magic" applicant that checks all boxes, while also trying to pay them almost minimum wage.
Instead of spending all those resources on resume filtering, hire resume blind. Instead of using llms for a thing they are bad at (subjective decision making) use them to build a deterministic process that isn’t.
Use work sample hiring as the filter. Make the work sample automatic to sign up for and judge.
At 10 seconds per resume, it would take you 3 hours to go through all 1000 resumes. I don't know what you consider "good" and "human", but my human eyes could easily do good enough, fully manual pre-screening at a rate of 1 requisition per day.
At 10 seconds per resume, I would not assume that you're screening better than the LLM.
What I hear happens now: people apply for a "Senior Golang SWE" with 2 years of experience with C#. Or, relevant for hiring in the US, the job posting says the visa is required, but people apply without it anyway.
Maybe a platform could be designed where candidates have one account for multiple companies, and the number of applications on the platform is limited to, say, ten per person per month or something. To get people to be selective. I don’t think this should be the only way to apply, but maybe the companies involved could look there first.
It’s all probabilities in the end. And if an LLM gives you more a more relevant pool vs random distribution, that’s still a net benefit.
*According to our proprietary, undisclosed, non-deterministic metric, which may or may not be Math.random
https://stackoverflow.com/questions/16833100/why-does-the-mo...
1. Give them some easy leetcode questions. Nothing that a competent programmer would have any problem with.
2. If they pass, ask for a deposit of like $20. Shouldn't be an issue for people who are actually serious.
3. Do more simple leetcode questions but this time on zoom so you can tell if they are using AI. If they pass that they get the deposit back.
(Yeah I know there are real-time interview cheat AI programs but based on what I've seen on demos of them it's super obvious when they're being used.)
Probably not practical but just a thought!
If the first 50 people who apply are all bots, why are you reading resumes in order of submission?
> temperature 0.1 — low, supposedly nudging the model toward deterministic outputs
This is not correct (and is briefly touched on later in the piece when he sets temperature to 0), temperature is not some kind of "deterministic" switch, but rather it affects the sampling distribution (which becomes more "spiky"—but is still very much a distribution).
Show someone a list of resumes with an "applicant score*" and they'll naturally ignore the ones with a low ranking
*scores are generated with AI, mistakes may be made, use only as a guide and verify results
I don't know for sure, but I would be surprised if it was illegal in my particular US state. You might be able to argue the AI has inherent biases that introduce illegal discrimination in the hiring process, but my understanding is winning I case like that would be very difficult, especially since most employers are very cagey about their hiring process and why they mades a decision.
nonetheless, people will defend history as perfect and say those samples, like nepo babies, are "perfect".
It is a common misconception, but it is not true even in principle. If I have 2 or more logits which are equal to the maximum of my logits, I will sample uniformly random from them with any temperature, even zero. Sampling from softmax([1, 0, 1]) is still stochastic at temperature 0, because the limit is to sample uniformly from the first or the last element.
Anyway: "GPUs don't do deterministic matrix multiplications" is the biggest source of randomness in LLMs. GPUs put the associativity of the sums in matrix multiplications in arbitrary order, and this has a huge impact on the logits coming out of the neural network.
That’s user-controlled too, not an inherent property of GPUs:
https://docs.pytorch.org/docs/2.12/generated/torch.use_deter...
> torch.bmm() when called on sparse-dense CUDA tensors
And it's not listed under the operations that raise an exception otherwise, so I'm not sure the docs promise that dense-dense matrix-matrix products are deterministic.
But this isn't a fundamental property of LLMs, it's just an implementation detail. It's pretty obvious that if you evaluate the matrix multiplications correctly and deterministically sample from the highest-probability outputs, you will have a deterministic LLM.
Well, in theory theory, temperature 0 doesn't really exist. Mathematically, as lim temperature->0, the distribution gets spikier and spikier, the most likely sample goes to almost-but-not-quite infinity and the rest go to almost-but-not-quite 0. In practice, temperature=0 is literally a separate branch of an if statement that just picks the most common sample (using the actual formula that works for non-zero values would cause a zero division).
However, due to things such as batching and even different kinds of floating point imprecisions for different algorithm implementations, the probability distribution itself often differs run-by-run, so what you sample from it also differs.
It does exist very much, even if you go to pure math. Look at the softmax function and take the limit as T->0. It becomes a dirac-delta function. I.e. in a discrete setting (like for LLMs with a finite set of output tokens), probability P becomes one for argmax and 0 for everything else. Only in coding practice it is easer to implement T=0 as a simple if check that directly chooses argmax instead of calculating the limit of some function that includes 1/T quotients. But setting T to zero is in both, theory and practice, turning the usual probability function into greedy sampling.
In pure math, it does not always do that. It becomes a dirac-delta comb with equal weight on every maximum. There can be more than 1 maximum. Setting the temperature to zero turns into greedy sampling, but greedy sampling is not necessarily deterministic as you can have multiple equally optimal options.
Any two tokens ending up with the exact same logit is very unlikely, but not impossible; and as the number of output tokens grows, the odds that it will happen eventually gets higher and higher.
I suppose, to ensure determinism, rank by logit then token ID, so you still have a deterministic winner even if occasionally two tokens get precisely identical logits.
This is exceedingly unlikely, as training will only push one of them up for any individual sample. There are likely some pathological situations that could end up with that situation, maybe, but it is pretty unlikely in a general case.
If there's one counterexample, it's not really deterministic.
My point is, deterministic logic matters in certain circumstances 100% of the time. Forcing the LLM to make something unlikely is not good enough because a series of mistakes could very quickly bankrupt the company.
If your argument is that the danger of equal values being selected inconsistently breaks determinism, that's a trivial problem to solve.
Any non-infinite precision numbering system by definition is at the limits of it's precision when equal values occur. If you need to order such values you can extend the precision and add on a deterministically unique tiny value (position, order encountered, etc.) . Your original value stays in the same precision range but they are now unique.
It's usually more likely that you want to sacrifice a little precision for determinism so you can quantise to allocate the range where you apply the unique ID
For example if you had an array of 256 fp32 values but you required them to be unique, you can lop off 8 bits of mantissa and replace it with its index in the array, Every value is then unique.
Granted token dictionaries make for some fairly hefty indexes now, but the principle applies in general, it's easily solvable if you are prepared to spend some precision or do some extra calculation.