TPTestPrepİSTANBUL

How do you solve GMAT Critical Reasoning Evaluate questions without second-guessing the conclusion?

TP
TestPrep Istanbul
June 19, 202625 min read

A GMAT Critical Reasoning Evaluate question asks the candidate to treat the argument like a measuring instrument and pick the answer choice that would best measure how well the argument performs. Most candidates read it as a Strengthen or Weaken item and then get punished for choosing the option that makes the argument feel more persuasive. The stem, however, is asking something narrower: which option, if true, would do the most to help a reader decide whether the argument's conclusion is well supported, or would do the most to affect the argument's force. The conclusion is fixed, the premises are fixed, and the answer is a yardstick, not a verdict. On the GMAT Focus, this question type appears inside the Verbal section's 23 Critical Reasoning stems and rewards a very different reading protocol than Strengthen, Weaken, Assumption, Inference, or Flaw. The rest of this article walks through how to read the stem, how to pre-phrase a yardstick, how to test each option against that yardstick, and the five tactical moves that separate a 78 from an 84 on this single CR family.

What the Evaluate stem is actually asking the candidate to do

An Evaluate stem on the GMAT Critical Reasoning section usually contains a small cluster of phrasings that mark it as different from the other question types. The candidate who misses the marker falls straight into a Strengthen reading, picks the most supportive option, and loses the point. The marker phrases to scan for are 'would be most useful to determine', 'would be most helpful in evaluating', 'which of the following would be most important to know', and the rarer 'answer choice that would do the most to assess'. The shared grammatical shape of all of these is the word 'determine' or 'evaluate' attached to a known conclusion. The argument has already been made. The candidate is not being asked to repair the argument, to find a missing link, to attack a hidden assumption, or to draw a brand-new conclusion that the author was too timid to state. The candidate is being asked to nominate a piece of information whose truth value would change how much weight the argument deserves.

Reading the stem at this granularity is the first tactical move. In practice, candidates who score above 80 on GMAT Verbal almost always write down, in five words or fewer, what the author is trying to prove before they look at the options. They then ask themselves: 'what fact, if I knew it, would make this conclusion feel solid or shaky?' That fact is the yardstick. Once the yardstick exists, every answer option is checked against it like a key against a lock. A choice that supports the conclusion fails the lock, because the question was never about supporting. A choice that attacks the conclusion also fails the lock, because the question was never about attacking. Only the option whose truth value actually changes the weight of the conclusion, in either direction, sits inside the lock.

The most common error on Evaluate stems is to treat 'would help evaluate' as 'would help strengthen'. The two phrasings share a vocabulary, but they have different goals. Strengthening is one-directional: a Strengthen answer adds evidence on the conclusion's side. Evaluation is two-directional: an Evaluate answer must be capable of swinging the reader in either direction depending on which way the new fact falls. This is the feature that disqualifies Strengthen-shaped and Weaken-shaped distractors and that separates Evaluate from every other CR family. The candidate who can hold that distinction in working memory is already halfway to the right answer before reading the options.

Another practical reading habit: the Evaluate stem almost never contains a conclusion whose truth is in question. The author of the argument is the one making the claim. The candidate is being asked to play the role of a sceptical peer reviewer who is considering whether the author's claim is well grounded. A peer reviewer does not write the conclusion. A peer reviewer does not repair the argument. A peer reviewer asks one specific question that, once answered, lets them decide. That specific question is the yardstick the test wants the candidate to nominate.

The yardstick method: pre-phrasing before the options

Pre-phrasing is the single highest-leverage habit a candidate can build for GMAT Critical Reasoning, and it is especially powerful on Evaluate stems. The yardstick method has three steps. Step one: read the passage and identify the conclusion, the premises, and the assumed causal or comparative link between them. Step two: ask, in plain English, 'what fact, if I knew it, would change my mind about whether the conclusion is well supported?' Step three: convert that fact into a one-line binary question whose answer is yes or no. The yardstick is now an actual question, not a vague feeling.

For example, a typical GMAT CR passage might read: 'Last year, Company X replaced its paper-based training with an interactive video programme, and the average test score of newly hired staff rose by 14 points. Therefore, interactive video training is more effective than paper-based training for new hires.' A weak candidate will look at the options and pick the one that points to a third variable, such as the average seniority of newly hired staff, because that sounds like it weakens the argument. A strong candidate will pause after reading the stem 'which of the following would be most useful in evaluating the argument?' and will pre-phrase: 'did the new hires this year have more prior experience than last year's new hires?' That yardstick is two-directional. If the new hires this year were more experienced, the conclusion is weakened. If they were equally experienced, the conclusion is strengthened. A yardstick that swings both ways is exactly what Evaluate demands.

Why does pre-phrasing save time on the GMAT Focus, where each CR stem is budgeted at roughly 90 seconds? Because it converts the options section from a four-way argument contest into a four-key lock test. The candidate is no longer judging each option on its persuasive surface. The candidate is asking, of each option: 'does this fact, if true, change how much weight the conclusion deserves?' If yes, the option survives. If no, the option is rejected regardless of how interesting it sounds. In a timed section, this is the difference between 90 seconds of focused rejection and 4 minutes of circling.

Pre-phrasing also immunises the candidate against Strengthen-shaped distractors. The most common wrong answer on Evaluate is a Strengthen answer, and the reason is that Strengthen answers often contain real facts that would help the argument. The candidate reads the option, feels the help, and picks it. Pre-phrasing breaks this trap by reminding the candidate that the question is not asking what would help. The question is asking what would let the reader decide. Help and decisiveness are not the same criterion, and confusing them is the most common way candidates plateau at a 76 or 78 on GMAT Verbal.

How to test each option against a yardstick in under 30 seconds

Once the yardstick is on the page, each option is a five-second pass-fail test. The candidate reads the option, paraphrases it, and asks three binary questions in order. Question one: 'is this fact about the conclusion, or about something else entirely?' If it is about something else, the option is rejected. Off-topic options are common because the GMAT often includes a distractor that, while relevant to the topic of the argument, is irrelevant to whether the conclusion follows from the premises. Question two: 'does this fact have a clear yes or no consequence for the conclusion?' If the option's effect on the conclusion is ambiguous or one-directional, the option is rejected. Question three: 'is this option asking the same question as my yardstick, or a different one?' If it is asking a different question, even a good one, the option is rejected.

Question two is the one that catches most candidates. The GMAT deliberately includes Weaken-shaped distractors whose truth would, in fact, hurt the argument, but whose absence would tell the candidate nothing. A Weaken answer only has one direction of impact. An Evaluate answer must have two. The candidate should be ruthless: if the option's truth value only swings the conclusion in one direction, the option is not an Evaluate answer, no matter how persuasive it reads.

Let us run a worked example. Argument: 'Since the city of Astonbury banned single-use plastic bags in retail stores, the volume of plastic waste collected from residential bins has dropped by 9 per cent. The ban is therefore a successful environmental policy.' Stem: 'which of the following would be most useful in evaluating the argument?' The candidate pre-phrases: 'is the drop in residential-bin plastic waste actually caused by the ban, or by something else, such as a shift to reusable bags purchased before the ban?' Option A: 'A neighbouring city without a similar ban saw a 12 per cent drop in residential-bin plastic waste over the same period.' Option A is a two-directional yardstick: if the neighbour's drop is similar, the ban is less impressive. If the neighbour's drop is small, the ban looks stronger. Option A is the answer. Option B: 'Retailers in Astonbury reported a 20 per cent rise in sales of reusable cloth bags in the month after the ban.' Option B is one-directional: it strengthens the conclusion but cannot weaken it. Option C: 'The ban included a public-awareness campaign that cost the city 1.4 million dollars.' Option C is off-topic. Option D: 'Plastic bag manufacturers in Astonbury reported a 30 per cent drop in revenue.' Option D is about manufacturers, not about the conclusion's causal claim. The right answer is the option whose truth value changes the weight of the conclusion in either direction, which is exactly Option A.

Notice how the worked example illustrates the most common trap: Option B is a Strengthen answer that feels 'helpful'. A candidate who skipped the yardstick step would be tempted by B, because B sounds like it reinforces the policy. But the question was not 'which option would help the argument.' The question was 'which option would help the reader decide.' B does not help the reader decide, because knowing B is true only pushes the reader in one direction. The yardstick method forces the candidate to read every option through the lens of decision weight, not persuasion weight, and that is what makes the method portable across every Evaluate stem in the GMAT Focus question bank.

The five tactical moves that separate a 78 from an 84 on Evaluate

Move one is to mark the verb in the stem. The candidate should underline, mentally or on the scratch pad, the word 'evaluate', 'determine', 'assess', or 'most useful in deciding'. The verb is the lens, and the rest of the stem is the conclusion. Move two is to write the yardstick as a yes-or-no question, not as a sentence. A yardstick expressed as 'did X happen?' is much easier to test against options than a yardstick expressed as 'the fact that X happened would be relevant.' Move three is to reject any option whose impact on the conclusion is one-directional. One-direction options are Strengthen or Weaken answers, and they cannot be Evaluate answers, no matter how close they feel to the topic. Move four is to check that the option's fact is about the conclusion, not about a side topic. Side-topic options are common on the GMAT Focus and they lure candidates who have not pinned down the conclusion. Move five is to commit. The five-move protocol takes about 75 seconds per stem in practice, and committing to the protocol is what carries a candidate from a 78 plateau to a stable 82 or 84 on GMAT Verbal.

Candidates who already practise pre-phrasing on Strengthen and Assumption often find that Evaluate feels slower at first. That is expected, because Evaluate forces the candidate to think in two directions instead of one. The investment pays off in score reliability: a candidate who can pre-phrase a yardstick will rarely fall for a Strengthen-shaped distractor, and Strengthen-shaped distractors are the single most common reason candidates miss Evaluate questions. The protocol is also portable to the new Verbal section of the GMAT Focus, where Critical Reasoning shares the floor with Reading Comprehension. The same five moves apply, and the same scoring implications follow: a clean Evaluate answer adds one to the candidate's Verbal score; a Strengthen-shaped miss subtracts one when the candidate has been oscillating between 80 and 83 and is looking for the next push.

Why Evaluate feels harder than Strengthen, and how to retrain the reflex

Evaluate feels harder than Strengthen for a structural reason. Strengthen asks the candidate to push in one direction. The cognitive task is to find the option that adds the most force. Evaluate asks the candidate to think in two directions, holding both outcomes in working memory while checking the option. That is a heavier working-memory load, and it is why many candidates fall back on a familiar Strengthen reading when they are tired. The retraining is mechanical. After reading the stem, the candidate should write the words 'two-way' on the scratch pad. That tiny gesture reminds the candidate, on every Evaluate stem, that the answer must be capable of weakening or strengthening depending on the truth value. The 'two-way' marker is unglamorous and it works.

A second reason Evaluate feels harder is that the test's distractor design is more aggressive on this family. On Strengthen, the wrong answers are usually off-topic or weak, and the right answer is a clear winner. On Evaluate, the wrong answers are often perfectly relevant and perfectly true-sounding, and the right answer is the one that swings in two directions. Candidates who trained on official practice questions before encountering Evaluate on the real test often report a moment of panic, because the options all sound reasonable. The panic dissolves once the yardstick is on the page, but only if the candidate built the habit during preparation. The test is not the place to build a new habit, and the GMAT Focus scoring scale punishes verbal habits that are not yet automatic.

A third reason is that Evaluate and Assumption share a tempting overlap. The candidate who has been drilling Assumption questions sometimes reads an Evaluate stem and picks an answer that fills the missing link, because filling the missing link also feels like deciding the argument. The two question types are not the same. An Assumption answer, if false, would crash the argument. An Evaluate answer, if true, would tell the reader how much weight the argument deserves, regardless of whether the assumption is true. The candidate who confuses the two will pick the answer that 'makes the argument work' on an Evaluate stem, and that answer is almost always a Strengthen answer in disguise. The yardstick method, applied to every Evaluate stem, keeps the two question types separate in working memory.

In my experience tutoring candidates between 76 and 84 on GMAT Verbal, the Evaluate family is often the highest-leverage family to drill, even more than Flaw or Inference. The reason is that Evaluate requires a meta-cognitive move — thinking about the argument as an object — and once a candidate can do that move, every other CR family becomes slightly easier. The candidate starts reading passages as instruments to be measured, not as claims to be agreed with, and that is the cognitive shift that pushes Verbal scores from a plateau into a stable 82 to 85 band.

Reading the argument like an instrument, not a story

The most productive mental shift a candidate can make for Evaluate is to stop reading the passage as a story and start reading it as a measuring instrument. A story has characters, motives, and emotions. An instrument has a conclusion, premises, and a causal or comparative link between them. The candidate who reads the passage as a story is tempted to ask 'is the conclusion reasonable?' The candidate who reads the passage as an instrument asks 'how well does this instrument measure what it claims to measure?' The two questions sound similar, but they produce different yardsticks and different right answers.

Reading-as-instrument has a practical consequence for the scratch pad. After reading the argument, the candidate should be able to draw a one-line diagram: [Premise 1] + [Premise 2] → [Conclusion]. The arrow is the inferential claim, and the arrow is the part of the argument that the Evaluate yardstick is going to test. Most Evaluate questions on the GMAT Focus target the arrow. The premises are usually taken for granted. The conclusion is usually accepted as the author's claim. What is being evaluated is the inferential step from premises to conclusion, and the yardstick is the fact whose truth would let the reader decide whether the arrow holds.

This reading habit also helps on Assumption and Flaw, because both of those question types also target the arrow. The candidate who draws the arrow on the scratch pad is, in effect, training for three question families at once. Evaluate is the most demanding of the three, but it is also the one that gives the most back, because the meta-cognitive habit transfers cleanly. For most candidates reading this, the Evaluate family is the family that decides whether the Verbal score plateaus at 78 or climbs into the low 80s, and the reading-as-instrument habit is the cheapest way to make that climb.

Common pitfalls and how to avoid them on Evaluate stems

Five pitfalls account for the majority of Evaluate misses. Pitfall one is reading the stem as a Strengthen stem. The fix is to underline the verb in the stem and to write the word 'two-way' on the scratch pad. Pitfall two is picking the most supportive option because it 'helps' the argument. The fix is the yardstick: if the option's truth value only pushes in one direction, the option fails the Evaluate criterion. Pitfall three is picking an off-topic option that is true but irrelevant to the conclusion. The fix is to draw the [Premise] → [Conclusion] arrow and to check that the option's fact touches the arrow. Pitfall four is picking an Assumption-shaped answer that would crash the argument if false. The fix is to remember that Evaluate asks what would let the reader decide, not what the argument needs to stay standing. Pitfall five is spending more than 90 seconds on the stem. The fix is the five-move protocol: mark the verb, write the yardstick as a yes-or-no question, reject one-directional options, reject off-topic options, and commit.

Another common pitfall is misidentifying the conclusion. A surprising number of Evaluate misses come from the candidate testing the yardstick against the wrong sentence. If the candidate thinks the conclusion is 'the ban reduced plastic waste' when the author is actually claiming 'the ban is a successful environmental policy', the yardstick will be wrong and every option will be tested against the wrong target. The fix is to ask, sentence by sentence, 'is this the author's main point, or is it supporting the author's main point?' The author's main point is the conclusion, and the supporting sentences are the premises. Evaluate's right answer will always touch the conclusion, not the premises, because the question is about the inferential weight of the argument as a whole.

A third pitfall is the time-pressure panic. The candidate reads the stem, skips the yardstick, glances at the options, sees a Strengthen answer that feels right, picks it, moves on, and loses the point. The point loss is invisible in the moment, but it shows up in the score report as a missed Evaluate family. The fix is mechanical: write the yardstick on the scratch pad before reading the options. This takes 10 seconds and saves 30, and the net effect is a faster, more accurate Evaluate response. Candidates who have built this habit into their preparation find that Evaluate stops being the family they dread and becomes one of the more predictable families on the test.

Practice architecture: how to drill Evaluate without burning the question bank

Drilling Evaluate is a question of structure, not volume. The candidate who burns 200 Critical Reasoning stems in random order will plateau, because the brain generalises across question types and stops being able to tell Evaluate from Strengthen. The candidate who drills 40 Evaluate stems in a single sitting, with the yardstick method on the scratch pad, will move their score, because the brain is being asked to repeat one specific protocol until the protocol becomes automatic. For most candidates, a four-week block of pure Evaluate drilling is the difference between a 78 and an 84 on Verbal.

The week-by-week structure can be simple. Week one: 10 official Evaluate stems per day, untimed, with the yardstick method on the scratch pad. The goal is accuracy, not speed. Week two: 12 to 15 Evaluate stems per day, timed at 100 seconds per stem, with the five-move protocol. The goal is speed under the protocol. Week three: 10 mixed CR stems per day, including Evaluate, Assumption, Strengthen, and Weaken, with the verb in the stem underlined on every stem. The goal is to keep the protocol active when Evaluate is no longer the only family. Week four: 8 mixed CR stems per day, plus Reading Comprehension, with the scratch-pad protocol active. The goal is to keep Evaluate accuracy above 85 per cent while the rest of Verbal is also being trained. By the end of week four, the candidate should be answering Evaluate stems in roughly 75 to 85 seconds with a stable accuracy rate.

It is also worth keeping an error log. Every Evaluate miss should be written down with the yardstick that was pre-phrased, the option that was picked, and the reason the option was wrong. Most candidates discover, after 30 logged misses, that the same three or four pitfalls keep appearing. Those are the pitfalls to drill against. Error logs are unglamorous and they are the single highest-leverage preparation habit a GMAT candidate can build, because the test is not about practising what the candidate already does well. The test is about removing the specific mistakes that keep the score stuck on a plateau.

Scoring implications: how Evaluate feeds the Verbal total on the GMAT Focus

The GMAT Focus Verbal section is scored on a 60 to 90 scale, and each CR family contributes a portion of the 23 Critical Reasoning stems that share the section with Reading Comprehension. Evaluate is one of seven or eight CR families that appear, and on most test forms the candidate will see two to three Evaluate stems per Verbal section. A clean Evaluate answer adds one point to the candidate's Verbal score; a missed Evaluate costs one point. The Evaluate family is therefore worth roughly two to three points out of the Verbal total, which is the difference between a 78 and an 80, or between an 82 and an 84. For candidates who are oscillating between two adjacent score bands, mastering the Evaluate family is often the cheapest point-per-hour improvement available.

The scoring also interacts with preparation strategy. The GMAT Focus does not return a sub-score for each CR family, so the candidate cannot tell from the score report whether Evaluate is the family's weak point. The error log is the only way to see the family-by-family pattern, and the error log is therefore the diagnostic instrument that drives the next block of preparation. Candidates who score in the low 80s on Verbal often have a single family dragging them down, and on a typical diagnostic that family is Evaluate, because Evaluate is the family that punishes one-directional reading most heavily.

For most candidates, the right preparation strategy is to drill Evaluate early, before drilling Strengthen or Weaken, because the meta-cognitive move that Evaluate demands transfers to every other CR family. A candidate who can read an argument as an instrument and pre-phrase a yardstick will find Strengthen, Weaken, Assumption, and Flaw easier to handle, because the reading habit does not change from one family to the next. Evaluate is, in that sense, the family that pays compound interest across the rest of the Verbal section, and that is why a focused four-week block of Evaluate drilling is the highest-leverage move a candidate can make when the Verbal score is stuck below 80.

Putting the protocol together on test day

On test day, the protocol is mechanical and short. The candidate reads the passage, draws the [Premise] → [Conclusion] arrow, writes the yardstick as a yes-or-no question, marks the verb in the stem, reads the options, rejects one-directional options first, rejects off-topic options second, picks the option that matches the yardstick, and moves on. The whole sequence takes 75 to 90 seconds for a clean Evaluate stem and 100 to 120 seconds for a difficult one. The candidate should not linger. A missed Evaluate stem is worth the same as a missed Strengthen stem, and the test's adaptive structure means that the next stem is more important than the previous one. The right trade is to commit, move on, and trust the protocol.

Test-day discipline also includes the scratch pad. The yardstick is not a mental act; it is a written act. Candidates who try to hold the yardstick in working memory often drift back to a Strengthen reading under time pressure, because Strengthen is the most practised CR family and the brain defaults to it. Writing the yardstick on the scratch pad takes 10 seconds and removes the drift. The scratch pad is the only preparation tool the candidate can take into the test, and on Evaluate stems it is the difference between a clean answer and a guess.

The protocol is portable across the Verbal section. Once a candidate has built the habit on Evaluate, the same scratch-pad moves work on Assumption, Flaw, and Inference. The arrow, the yardstick, and the verb-marker are three small acts that convert every CR family into a measurable instrument. For most candidates reading this, the Evaluate family is the family that decides whether the Verbal score climbs from 78 to 84, and the five-move protocol is the cheapest way to make that climb. TestPrep İstanbul's diagnostic assessment is a natural starting point for candidates building a sharper preparation plan for the GMAT Focus Verbal section.

Conclusion and next steps

The Evaluate family is the CR family that asks the candidate to read the argument as an instrument and to nominate a yardstick whose truth would let the reader decide how much weight the conclusion deserves. The yardstick method, the five-move protocol, and the reading-as-instrument habit are the three preparation moves that carry a candidate from a 78 plateau into a stable 82 to 85 band on GMAT Verbal. The next step for any candidate serious about moving past a plateau is a focused four-week block of Evaluate drilling, with the protocol on the scratch pad on every stem, plus an error log that surfaces the specific pitfalls that keep the score stuck. TestPrep İstanbul's diagnostic assessment is a natural starting point for candidates building a sharper preparation plan for the GMAT Focus Verbal section, and the Evaluate family is the family to drill first.

Frequently asked questions

What is the difference between an Evaluate and a Strengthen question on GMAT Critical Reasoning?
A Strengthen question asks which option would add the most weight to the argument's conclusion; the right answer pushes the conclusion in one direction only. An Evaluate question asks which option, if true, would let the reader decide how much weight the conclusion deserves; the right answer must be capable of pushing the conclusion in either direction depending on its truth value. Strengthen is one-directional, Evaluate is two-directional, and confusing the two is the most common Evaluate mistake on the GMAT Focus Verbal section.
How long should I spend on an Evaluate stem during the GMAT Focus Verbal section?
Budget roughly 75 to 90 seconds on a clean Evaluate stem and up to 120 seconds on a difficult one, including the 10 seconds it takes to write the yardstick on the scratch pad. The five-move protocol, mark the verb, write the yardstick, reject one-directional options, reject off-topic options, commit, should fit inside that budget once the habit is automatic. Lingering past two minutes is rarely worth it, because the next stem is more valuable than a guessed-out one on an adaptive test.
How many Evaluate questions appear on a single GMAT Focus Verbal section?
On most GMAT Focus test forms, a candidate will see two to three Evaluate stems inside the 23 Critical Reasoning items that share the section with Reading Comprehension. Because the Verbal section is scored on a 60 to 90 scale, every clean Evaluate is worth one point and every miss costs one point, which is the difference between an 80 and an 82 or between an 82 and an 84 on the Verbal total.
Can I pre-phrase the yardstick before I read the answer options?
Yes, and that is the single highest-leverage habit on Evaluate stems. Write the yardstick as a yes-or-no question on the scratch pad, then test each option against the yardstick as a five-second pass-fail check. Pre-phrasing converts the options section from a four-way argument contest into a four-key lock test and immunises the candidate against Strengthen-shaped distractors, which are the most common wrong answers on Evaluate.
What is the best way to drill Evaluate without burning through my GMAT question bank?
Block the family. Spend a four-week block on pure Evaluate stems, ten to fifteen per day, untimed in week one and timed in week two, then mix the family back into a broader CR rotation in weeks three and four. Keep an error log that records the yardstick, the option picked, and the reason the option was wrong, and most candidates will find that the same three or four pitfalls account for almost every miss.
Quick Reply
Free Consultation