PTE Academic is a computerised English language proficiency test accepted by universities, governments, and professional bodies worldwide. Its Speaking section presents a series of integrated tasks that simultaneously evaluate listening comprehension, oral production, and time-pressured response generation. Among the most consequential task types are Repeat Sentence and Describe Image — two tasks that assess distinct skill sets yet share the same scoring rubric. Understanding precisely how these tasks are evaluated, what behaviours the scoring algorithm rewards, and which strategic responses maximise each criterion empowers candidates to move beyond surface-level preparation and develop targeted, evidence-based approaches.
The PTE Academic Speaking section: how it is structured and why task-specific preparation matters
The Speaking component of PTE Academic consists of six task types administered in sequence. After a brief introduction, candidates encounter Read Aloud, followed by Repeat Sentence, Describe Image, Re-tell Lecture, Answer Short Question, and, in some test centre deliveries, the Summarise Spoken Text task. The total time allocated to the Speaking section is approximately 30 to 35 minutes, with individual task time limits ranging from 25 seconds (for Repeat Sentence) to 40 seconds (for Describe Image).
What distinguishes high-scoring candidates from average performers is not raw language ability alone — though that remains foundational. The decisive factor is tactical execution under cognitive load. Each task type presents a particular combination of listening demand, short-term memory reliance, articulatory accuracy, and content organisation. Candidates who study the scoring dimensions for each task and align their response strategy accordingly consistently outperform those who rely on general fluency alone.
The two tasks examined in this article — Repeat Sentence and Describe Image — represent opposite poles of the speaking skill spectrum. Repeat Sentence tests receptive processing and verbatim or near-verbatim oral reproduction. Describe Image demands synthesising visual information into a coherent spoken summary within a strict time ceiling. Both are scored on three identical criteria: oral fluency, pronunciation, and content. Understanding how each criterion operates independently and interactively is the foundation of effective preparation.
The scoring rubric for Repeat Sentence and Describe Image: a granular breakdown
PTE Academic uses automated speech recognition (ASR) technology to evaluate speaking responses. The scoring system assigns separate marks for three dimensions, each contributing to the final speaking score and, by extension, the overall PTE score.
Oral fluency
Oral fluency measures the smoothness, rhythm, and naturalness of speech production. For a maximum score, responses must be produced at a natural pace without hesitation, repetition, false starts, or elongated pauses. The ASR system detects irregularities in speech flow that indicate non-native patterns. Candidates who pause mid-sentence, restart phrases, or insert filler sounds such as "er" or "um" systematically lose fluency marks. A perfectly fluent response reflects the cadence of a confident, articulate speaker — not the measured reading pace of someone translating thought into words.
Pronunciation
Pronunciation scoring evaluates whether individual phonemes and consonant clusters are produced with sufficient accuracy for the ASR system to recognise them. The scoring operates on a five-point scale: from minimal proficiency (consonantal approximation only) through developing, competent, and good, to an expert band. Expert-level pronunciation requires that vowel sounds and diphthongs conform to standard Received Pronunciation or General American patterns, and that stress patterns on polysyllabic words follow natural English rhythm. Candidates with non-native phonetic backgrounds should concentrate on problem phonemes specific to their first language rather than attempting wholesale accent modification.
Content
Content scoring differs between the two tasks. In Repeat Sentence, the system checks whether key content words from the original stimulus have been reproduced in the response. The algorithm is tolerant of minor morphological changes — for example, "walked" may be accepted when the stimulus contained "walk" — but the omission of critical lexical items substantially reduces the content score. In Describe Image, content scoring evaluates whether the response addresses the main features of the image (title, trend, comparison, or key data point) and whether the description remains relevant to what is actually depicted.
Six Repeat Sentence patterns and how to handle each one
The Repeat Sentence task presents candidates with an audio stimulus lasting three to nine seconds, followed by a microphone activation period during which the response must be produced. The stimulus is played only once. Preparation strategies must account for the structural variety of the sentences encountered.
Pattern 1: Simple declarative sentences
These contain a subject, verb, and object with no subordinate clauses. The primary challenge is articulatory accuracy and maintaining natural intonation. Candidates should focus on delivering the sentence in a single breath group without disrupting the intonation contour.
Pattern 2: Sentences with embedded subordinate clauses
These sentences require candidates to maintain clause structure during repetition. The common error is omitting the subordinate clause or merging it incorrectly with the main clause. A practised technique is to identify the conjunction (because, although, when, which) before beginning the response — this anchors memory of the clause boundary.
Pattern 3: Sentences containing numerical data or statistics
Numbers, percentages, and dates in the stimulus require precise reproduction. Candidates frequently transpose digits or misplace decimal points. Drilling number reproduction separately — including years, fractions, and large rounded figures — builds the motor memory needed for accurate repetition under time pressure.
Pattern 4: Sentences with compound structures joined by conjunctions
Coordinated sentences linked by "and," "but," or "or" require balanced pacing across both halves. Rushing through the first half to reach the microphone activation limit frequently results in compression or omission of the second clause. Equal pacing across coordinated elements signals fluent, confident delivery to the ASR system.
Pattern 5: Sentences with complex vocabulary or low-frequency lexical items
Stimulus sentences occasionally include academic or domain-specific vocabulary that candidates may not immediately recognise. When a word is unfamiliar, candidates should approximate the phonetic shape as closely as possible rather than omitting the word entirely. Partial credit is awarded for near-production of unfamiliar lexical items.
Pattern 6: Sentences with idiomatic expressions or fixed phrases
Idioms and fixed collocations (for example, "by and large," "to and from," "up and down") must be reproduced as unanalysed units. Attempting to parse the idiom during repetition disrupts flow and almost certainly reduces the oral fluency score.
Four Describe Image response structures that PTE markers reward
Describe Image presents candidates with a visual stimulus — typically a graph, chart, diagram, map, photograph, or process flow — and requires a 40-second oral description. The response must be well-structured, cover the main visual features, and maintain natural spoken delivery throughout. The following four structures provide reliable frameworks applicable across different image types.
Structure 1: Overview plus supporting details
This structure leads with a general statement about what the image depicts before unpacking specific data points. For a line graph, the overview might state the overall trend. The supporting section then references key values, turning points, and notable anomalies. This structure works well for all trend-based visualisations and ensures content coverage without requiring memorisation of every data point.
Structure 2: Categorical comparison
When the image presents multiple categories, groups, or time periods for comparison, this structure organises the response by addressing each category in sequence before drawing a comparative conclusion. For pie charts, bar charts, or grouped column charts, categorical comparison is the most efficient approach because it prevents the speaker from jumping between unrelated data series and risking disorganised, fragmented delivery.
Structure 3: Process narration
For diagrams illustrating processes, workflows, or life cycles, the speaker should narrate sequentially from beginning to end, using sequencing language (first, then, next, subsequently, finally). The process structure must include an opening statement identifying the nature of the process depicted, followed by sequential description of each stage, and a closing observation about the overall outcome or purpose.
Structure 4: Spatial-observational description
Photographs, maps, and architectural diagrams require a spatial organisation strategy. The speaker begins with the central or most prominent element, then moves systematically (left to right, top to bottom, foreground to background) through the remaining features. This structure is less formulaic than the others and requires practice to avoid repetition or omission of key elements.
Common pitfalls in Repeat Sentence and Describe Image - and how to avoid them
Both tasks present predictable failure modes that can be systematically eliminated through targeted practice and self-awareness.
Starting before the microphone is active
One of the most consequential technical errors in Repeat Sentence is initiating the response before the recording indicator confirms that the microphone is active. PTE Academic activates the microphone only after the stimulus has finished playing and a brief delay has elapsed. Responses produced during this pre-activation window are not recorded and receive zero marks. Candidates should wait for the visual cue — typically a progress bar or indicator — before beginning to speak.
Over-pronouncing in Repeat Sentence
Some candidates, attempting to maximise pronunciation marks, artificially exaggerate vowel and consonant sounds, slow their delivery significantly, or insert emphatic stress on every word. This strategy backfires because the oral fluency criterion penalises unnatural, non-continuous speech. The correct approach is to reproduce the sentence at a natural conversational pace, trusting that competent pronunciation embedded in fluent delivery will score higher than stilted over-pronunciation.
Describing without interpreting in Describe Image
A common content-level error in Describe Image is reciting data points without explaining their significance. For example, describing a graph by simply reading out each value in sequence ("In 2010, the value was 40. In 2011, it was 45. In 2012, it was 50...") fails to demonstrate the integrative, analytical skills the task is designed to assess. High-scoring responses include trend descriptions ("the value increased steadily"), comparative observations ("category A consistently exceeded category B"), and occasional inference ("this suggests a growing demand for..."). These interpretive elements substantially improve the content score.
Running out of time before covering key features
Describe Image responses that trail off before completing the description of the image's main features lose content marks proportionally. The 40-second time window should be managed by allocating approximately 10 to 15 seconds to the overview and remaining time to supporting details. Candidates who spend too long on the opening overview often find themselves unable to address the data points. Conversely, beginning with data points without establishing context risks an incoherent response.
| Dimension | Repeat Sentence | Describe Image |
|---|---|---|
| Primary skill assessed | Short-term memory and auditory processing | Visual analysis and information synthesis |
| Input modality | Audio only (single exposure) | Static visual image |
| Typical stimulus duration | 3 to 9 seconds | Not applicable — image visible throughout response window |
| Response time limit | 25 seconds | 40 seconds |
| Content scoring focus | Key lexical items reproduced from stimulus | Main image features described; relevance to visual maintained |
| Structural demand | Imitative — follow original sentence structure | Generative — construct original spoken text |
| Common failure mode | Omission of trailing clause or subordinate content | Incomplete description; no interpretive commentary |
| Optimal fluency strategy | Single breath-group delivery; natural intonation | Planned transitions; consistent pace without memorised scripts |
Developing a practice routine that targets both tasks simultaneously
Effective preparation for Repeat Sentence and Describe Image requires a dual-focus practice schedule that addresses the distinct cognitive demands of each task while building the underlying skills they share.
For Repeat Sentence, daily listening-and-reproduction drills should form the foundation. Candidates should use authentic PTE practice materials and reproduce each sentence immediately after hearing it, recording the response for self-review. The review process should evaluate three dimensions simultaneously: oral fluency (smoothness of delivery), pronunciation (accuracy of individual sounds), and content (presence of all key lexical items). Identifying the weakest dimension for each stimulus and drilling that dimension specifically produces faster improvement than undifferentiated repetition.
For Describe Image, weekly timed practice sessions with a variety of image types are recommended. Candidates should aim to complete at least five to eight Describe Image responses per practice session, each within the 40-second limit, and review recordings against the scoring rubric. A useful exercise is to complete a Describe Image response twice — once with a structured framework (overview plus details) and once with a different framework (categorical comparison) — and compare the quality of content coverage and fluency in each version.
Shadowing exercises, in which candidates listen to model PTE responses and replicate the rhythm, pacing, and phrasing, are particularly valuable for building the oral fluency that both tasks require. Shadowing should be conducted at normal conversational speed, not slowed down, to train the motor patterns of natural English speech.
The relationship between speaking scores and overall PTE Academic results
It is important to understand that Repeat Sentence and Describe Image do not contribute to the speaking score alone. PTE Academic uses an integrated scoring model in which individual task scores contribute to multiple subscores and ultimately to the overall score. A candidate's performance in Repeat Sentence and Describe Image directly influences the speaking score, but may also contribute to the listening score — because Repeat Sentence requires simultaneous listening and speaking — and to enabling skills such as oral fluency and pronunciation that appear across multiple subscores.
This integrated architecture means that targeted improvement in Repeat Sentence and Describe Image can have downstream benefits for the overall PTE score that exceed the proportional weight of these individual tasks. Candidates whose target institutions require a minimum overall score, rather than minimum subscores, should note that maximising performance on the higher-weight tasks — including both speaking tasks — is the most efficient route to achieving the overall threshold.
The communicative skills assessed in Repeat Sentence (listening accuracy combined with spoken reproduction) and Describe Image (visual literacy combined with spoken summary) are precisely the competencies that universities and professional bodies seek in candidates who will need to function effectively in English-medium academic and professional environments. Preparing for these tasks is not merely a test strategy — it develops the underlying abilities that the PTE is designed to measure.
Next steps for targeted PTE Speaking preparation
Candidates who have reviewed the scoring rubric and identified which criterion — oral fluency, pronunciation, or content — represents their primary area for improvement should structure their preparation accordingly. For candidates whose primary weakness is oral fluency, the focus should be on shadowing practice, reducing hesitation markers, and building the confidence to deliver responses at a natural pace without self-correction. For those whose pronunciation requires attention, phoneme-level drills targeting the specific sounds that differ most from standard English patterns will yield the greatest return on investment. For candidates whose content scores are inconsistent, structured frameworks and deliberate practice with image analysis — ensuring every response includes an overview, supporting details, and at least one interpretive observation — will close the gap.
A structured diagnostic assessment provides an evidence-based starting point for identifying which of these areas warrants the greatest concentration of effort. TestPrep offers complimentary initial assessments that isolate specific skill gaps across all PTE task types, enabling candidates to build a preparation programme that is targeted, efficient, and grounded in the scoring criteria examined in this article.