PTE Academic is a computerised English language proficiency test accepted by universities, governments, and professional bodies worldwide. Its Speaking section presents a series of integrated tasks that simultaneously evaluate listening comprehension, oral production, and time-pressured response generation. Among the most consequential task types are Repeat Sentence and Describe Image — two tasks that assess distinct skill sets yet share the same scoring rubric. Understanding precisely how these tasks are evaluated, what behaviours the scoring algorithm rewards, and which strategic responses maximise each criterion empowers candidates to move beyond surface-level preparation and develop targeted, evidence-based approaches.

The PTE Academic Speaking section: how it is structured and why task-specific preparation matters

The Speaking component of PTE Academic consists of six task types administered in sequence. After a brief introduction, candidates encounter Read Aloud, followed by Repeat Sentence, Describe Image, Re-tell Lecture, Answer Short Question, and, in some test centre deliveries, the Summarise Spoken Text task. The total time allocated to the Speaking section is approximately 30 to 35 minutes, with individual task time limits ranging from 25 seconds (for Repeat Sentence) to 40 seconds (for Describe Image).

What distinguishes high-scoring candidates from average performers is not raw language ability alone — though that remains foundational. The decisive factor is tactical execution under cognitive load. Each task type presents a particular combination of listening demand, short-term memory reliance, articulatory accuracy, and content organisation. Candidates who study the scoring dimensions for each task and align their response strategy accordingly consistently outperform those who rely on general fluency alone.

The two tasks examined in this article — Repeat Sentence and Describe Image — represent opposite poles of the speaking skill spectrum. Repeat Sentence tests receptive processing and verbatim or near-verbatim oral reproduction. Describe Image demands synthesising visual information into a coherent spoken summary within a strict time ceiling. Both are scored on three identical criteria: oral fluency, pronunciation, and content. Understanding how each criterion operates independently and interactively is the foundation of effective preparation.

The scoring rubric for Repeat Sentence and Describe Image: a granular breakdown

PTE Academic uses automated speech recognition (ASR) technology to evaluate speaking responses. The scoring system assigns separate marks for three dimensions, each contributing to the final speaking score and, by extension, the overall PTE score.

Oral fluency

Oral fluency measures the smoothness, rhythm, and naturalness of speech production. For a maximum score, responses must be produced at a natural pace without hesitation, repetition, false starts, or elongated pauses. The ASR system detects irregularities in speech flow that indicate non-native patterns. Candidates who pause mid-sentence, restart phrases, or insert filler sounds such as "er" or "um" systematically lose fluency marks. A perfectly fluent response reflects the cadence of a confident, articulate speaker — not the measured reading pace of someone translating thought into words.

Pronunciation

Pronunciation scoring evaluates whether individual phonemes and consonant clusters are produced with sufficient accuracy for the ASR system to recognise them. The scoring operates on a five-point scale: from minimal proficiency (consonantal approximation only) through developing, competent, and good, to an expert band. Expert-level pronunciation requires that vowel sounds and diphthongs conform to standard Received Pronunciation or General American patterns, and that stress patterns on polysyllabic words follow natural English rhythm. Candidates with non-native phonetic backgrounds should concentrate on problem phonemes specific to their first language rather than attempting wholesale accent modification.

Content

Content scoring differs between the two tasks. In Repeat Sentence, the system checks whether key content words from the original stimulus have been reproduced in the response. The algorithm is tolerant of minor morphological changes — for example, "walked" may be accepted when the stimulus contained "walk" — but the omission of critical lexical items substantially reduces the content score. In Describe Image, content scoring evaluates whether the response addresses the main features of the image (title, trend, comparison, or key data point) and whether the description remains relevant to what is actually depicted.

Six Repeat Sentence patterns and how to handle each one

The Repeat Sentence task presents candidates with an audio stimulus lasting three to nine seconds, followed by a microphone activation period during which the response must be produced. The stimulus is played only once. Preparation strategies must account for the structural variety of the sentences encountered.

Pattern 1: Simple declarative sentences

These contain a subject, verb, and object with no subordinate clauses. The primary challenge is articulatory accuracy and maintaining natural intonation. Candidates should focus on delivering the sentence in a single breath group without disrupting the intonation contour.

Pattern 2: Sentences with embedded subordinate clauses

These sentences require candidates to maintain clause structure during repetition. The common error is omitting the subordinate clause or merging it incorrectly with the main clause. A practised technique is to identify the conjunction (because, although, when, which) before beginning the response — this anchors memory of the clause boundary.

Pattern 3: Sentences containing numerical data or statistics

Numbers, percentages, and dates in the stimulus require precise reproduction. Candidates frequently transpose digits or misplace decimal points. Drilling number reproduction separately — including years, fractions, and large rounded figures — builds the motor memory needed for accurate repetition under time pressure.

Pattern 4: Sentences with compound structures joined by conjunctions

Coordinated sentences linked by "and," "but," or "or" require balanced pacing across both halves. Rushing through the first half to reach the microphone activation limit frequently results in compression or omission of the second clause. Equal pacing across coordinated elements signals fluent, confident delivery to the ASR system.

Pattern 5: Sentences with complex vocabulary or low-frequency lexical items

Stimulus sentences occasionally include academic or domain-specific vocabulary that candidates may not immediately recognise. When a word is unfamiliar, candidates should approximate the phonetic shape as closely as possible rather than omitting the word entirely. Partial credit is awarded for near-production of unfamiliar lexical items.

Pattern 6: Sentences with idiomatic expressions or fixed phrases

Idioms and fixed collocations (for example, "by and large," "to and from," "up and down") must be reproduced as unanalysed units. Attempting to parse the idiom during repetition disrupts flow and almost certainly reduces the oral fluency score.

Four Describe Image response structures that PTE markers reward

Describe Image presents candidates with a visual stimulus — typically a graph, chart, diagram, map, photograph, or process flow — and requires a 40-second oral description. The response must be well-structured, cover the main visual features, and maintain natural spoken delivery throughout. The following four structures provide reliable frameworks applicable across different image types.

Structure 1: Overview plus supporting details

This structure leads with a general statement about what the image depicts before unpacking specific data points. For a line graph, the overview might state the overall trend. The supporting section then references key values, turning points, and notable anomalies. This structure works well for all trend-based visualisations and ensures content coverage without requiring memorisation of every data point.

Structure 2: Categorical comparison

When the image presents multiple categories, groups, or time periods for comparison, this structure organises the response by addressing each category in sequence before drawing a comparative conclusion. For pie charts, bar charts, or grouped column charts, categorical comparison is the most efficient approach because it prevents the speaker from jumping between unrelated data series and risking disorganised, fragmented delivery.

Structure 3: Process narration

For diagrams illustrating processes, workflows, or life cycles, the speaker should narrate sequentially from beginning to end, using sequencing language (first, then, next, subsequently, finally). The process structure must include an opening statement identifying the nature of the process depicted, followed by sequential description of each stage, and a closing observation about the overall outcome or purpose.

Structure 4: Spatial-observational description

Photographs, maps, and architectural diagrams require a spatial organisation strategy. The speaker begins with the central or most prominent element, then moves systematically (left to right, top to bottom, foreground to background) through the remaining features. This structure is less formulaic than the others and requires practice to avoid repetition or omission of key elements.

Common pitfalls in Repeat Sentence and Describe Image - and how to avoid them

Both tasks present predictable failure modes that can be systematically eliminated through targeted practice and self-awareness.

Starting before the microphone is active

One of the most consequential technical errors in Repeat Sentence is initiating the response before the recording indicator confirms that the microphone is active. PTE Academic activates the microphone only after the stimulus has finished playing and a brief delay has elapsed. Responses produced during this pre-activation window are not recorded and receive zero marks. Candidates should wait for the visual cue — typically a progress bar or indicator — before beginning to speak.

Over-pronouncing in Repeat Sentence

Some candidates, attempting to maximise pronunciation marks, artificially exaggerate vowel and consonant sounds, slow their delivery significantly, or insert emphatic stress on every word. This strategy backfires because the oral fluency criterion penalises unnatural, non-continuous speech. The correct approach is to reproduce the sentence at a natural conversational pace, trusting that competent pronunciation embedded in fluent delivery will score higher than stilted over-pronunciation.

Describing without interpreting in Describe Image

A common content-level error in Describe Image is reciting data points without explaining their significance. For example, describing a graph by simply reading out each value in sequence ("In 2010, the value was 40. In 2011, it was 45. In 2012, it was 50...") fails to demonstrate the integrative, analytical skills the task is designed to assess. High-scoring responses include trend descriptions ("the value increased steadily"), comparative observations ("category A consistently exceeded category B"), and occasional inference ("this suggests a growing demand for..."). These interpretive elements substantially improve the content score.

Running out of time before covering key features

Describe Image responses that trail off before completing the description of the image's main features lose content marks proportionally. The 40-second time window should be managed by allocating approximately 10 to 15 seconds to the overview and remaining time to supporting details. Candidates who spend too long on the opening overview often find themselves unable to address the data points. Conversely, beginning with data points without establishing context risks an incoherent response.

Dimension	Repeat Sentence	Describe Image
Primary skill assessed	Short-term memory and auditory processing	Visual analysis and information synthesis
Input modality	Audio only (single exposure)	Static visual image
Typical stimulus duration	3 to 9 seconds	Not applicable — image visible throughout response window
Response time limit	25 seconds	40 seconds
Content scoring focus	Key lexical items reproduced from stimulus	Main image features described; relevance to visual maintained
Structural demand	Imitative — follow original sentence structure	Generative — construct original spoken text
Common failure mode	Omission of trailing clause or subordinate content	Incomplete description; no interpretive commentary
Optimal fluency strategy	Single breath-group delivery; natural intonation	Planned transitions; consistent pace without memorised scripts

Developing a practice routine that targets both tasks simultaneously

Effective preparation for Repeat Sentence and Describe Image requires a dual-focus practice schedule that addresses the distinct cognitive demands of each task while building the underlying skills they share.

For Repeat Sentence, daily listening-and-reproduction drills should form the foundation. Candidates should use authentic PTE practice materials and reproduce each sentence immediately after hearing it, recording the response for self-review. The review process should evaluate three dimensions simultaneously: oral fluency (smoothness of delivery), pronunciation (accuracy of individual sounds), and content (presence of all key lexical items). Identifying the weakest dimension for each stimulus and drilling that dimension specifically produces faster improvement than undifferentiated repetition.

For Describe Image, weekly timed practice sessions with a variety of image types are recommended. Candidates should aim to complete at least five to eight Describe Image responses per practice session, each within the 40-second limit, and review recordings against the scoring rubric. A useful exercise is to complete a Describe Image response twice — once with a structured framework (overview plus details) and once with a different framework (categorical comparison) — and compare the quality of content coverage and fluency in each version.

Shadowing exercises, in which candidates listen to model PTE responses and replicate the rhythm, pacing, and phrasing, are particularly valuable for building the oral fluency that both tasks require. Shadowing should be conducted at normal conversational speed, not slowed down, to train the motor patterns of natural English speech.

The relationship between speaking scores and overall PTE Academic results

It is important to understand that Repeat Sentence and Describe Image do not contribute to the speaking score alone. PTE Academic uses an integrated scoring model in which individual task scores contribute to multiple subscores and ultimately to the overall score. A candidate's performance in Repeat Sentence and Describe Image directly influences the speaking score, but may also contribute to the listening score — because Repeat Sentence requires simultaneous listening and speaking — and to enabling skills such as oral fluency and pronunciation that appear across multiple subscores.

This integrated architecture means that targeted improvement in Repeat Sentence and Describe Image can have downstream benefits for the overall PTE score that exceed the proportional weight of these individual tasks. Candidates whose target institutions require a minimum overall score, rather than minimum subscores, should note that maximising performance on the higher-weight tasks — including both speaking tasks — is the most efficient route to achieving the overall threshold.

The communicative skills assessed in Repeat Sentence (listening accuracy combined with spoken reproduction) and Describe Image (visual literacy combined with spoken summary) are precisely the competencies that universities and professional bodies seek in candidates who will need to function effectively in English-medium academic and professional environments. Preparing for these tasks is not merely a test strategy — it develops the underlying abilities that the PTE is designed to measure.

Next steps for targeted PTE Speaking preparation

Candidates who have reviewed the scoring rubric and identified which criterion — oral fluency, pronunciation, or content — represents their primary area for improvement should structure their preparation accordingly. For candidates whose primary weakness is oral fluency, the focus should be on shadowing practice, reducing hesitation markers, and building the confidence to deliver responses at a natural pace without self-correction. For those whose pronunciation requires attention, phoneme-level drills targeting the specific sounds that differ most from standard English patterns will yield the greatest return on investment. For candidates whose content scores are inconsistent, structured frameworks and deliberate practice with image analysis — ensuring every response includes an overview, supporting details, and at least one interpretive observation — will close the gap.

A structured diagnostic assessment provides an evidence-based starting point for identifying which of these areas warrants the greatest concentration of effort. TestPrep offers complimentary initial assessments that isolate specific skill gaps across all PTE task types, enabling candidates to build a preparation programme that is targeted, efficient, and grounded in the scoring criteria examined in this article.

Frequently asked questions

What is the difference between the content scoring criterion in Repeat Sentence and Describe Image?

In Repeat Sentence, content scoring evaluates whether key lexical items from the audio stimulus have been reproduced in the spoken response. Minor grammatical changes (such as verb tense variation) are tolerated, but omitting critical content words substantially reduces the score. In Describe Image, content scoring assesses whether the response covers the main features of the visual stimulus, remains relevant to what is actually depicted, and includes interpretive commentary rather than mere data recitation. The two tasks assess fundamentally different content types — auditory memory versus visual analysis.

How can I improve my oral fluency score in both Repeat Sentence and Describe Image?

Oral fluency is best improved through daily shadowing practice, in which candidates listen to natural English speech at conversational pace and reproduce it immediately, mimicking rhythm, intonation, and connected-speech features such as linking and elision. Reducing reliance on written notes, avoiding self-correction during responses, and training yourself to speak in complete breath groups rather than halting, word-by-word delivery all contribute to higher oral fluency scores on both tasks.

Is it better to speak quickly or slowly for maximum pronunciation marks in PTE Academic?

Neither extreme is optimal. Speaking too slowly produces an unnatural, non-continuous delivery that penalises the oral fluency score. Speaking too quickly risks reducing pronunciation accuracy as the articulatory system struggles to maintain precise sound production at speed. The ideal delivery for both Repeat Sentence and Describe Image is a natural conversational pace — approximately 120 to 150 words per minute — that allows the ASR system to clearly identify each phoneme while maintaining the smooth, uninterrupted flow that signals fluent oral production.

How long should I spend analysing the image before beginning my Describe Image response?

The image remains visible throughout the 40-second response window, so candidates can refer back to it during the response. However, a brief initial orientation of approximately five to seven seconds at the beginning is advisable to identify the image type, the main subject, and the most significant visual features. This orientation prevents the common error of beginning the description without establishing context, which often results in an incoherent or incomplete response. The majority of the 40-second window should be dedicated to speaking, not silent analysis.

Can partial responses still earn scores in Repeat Sentence if I cannot reproduce the entire sentence?

Yes. The content scoring criterion in Repeat Sentence is not all-or-nothing. The algorithm awards partial credit for each key content word that is successfully reproduced. A response that captures the first half of the sentence with all its key words but omits a trailing clause will earn partial content marks. However, omitting substantial portions of the sentence — particularly the main verb or subject — will result in a minimal content score. Candidates who cannot recall the complete sentence should reproduce whatever they can remember accurately and fluently, rather than attempting to reconstruct and risking multiple errors that would also reduce pronunciation and fluency scores.

How does PTE Academic score Repeat Sentence and Describe Image - and what separates high-flyers from average candidates