Strings and Text Processing字符串与文本处理
A string is a sequence of characters — every piece of text your program touches is a string. This guide builds from the ground up: how strings are indexed and sliced in Python, which built-in methods (upper, lower, split, join, replace) do the heavy lifting, how concatenation and f-strings produce formatted output, how to search inside a string with in, find, and count, how to loop character by character, and how to combine these tools in a complete text-processing program. All code is Python; bilingual explanations throughout.字符串(string,字符串)是字符(character,字符)的序列——程序处理的每一段文本都是字符串。本指南从基础讲起:Python 中字符串的索引(index,索引)和切片(slicing,切片)、哪些内置方法(method,方法)承担重活(upper、lower、split、join、replace)、如何用拼接(concatenation,拼接)和 f-string 生成格式化(formatting,格式化)输出、如何用 in、find、count 在字符串中搜索、如何逐字符遍历,以及如何将这些工具组合成完整的文本处理(text processing,文本处理)程序。全部代码为 Python;全篇双语说明。
How to use this guide如何使用本指南
Strings are present in every curriculum, but the depth of coverage varies. AP CSP Topic 3.4 directly assesses string operations via the exam reference sheet. Ontario ICS3C A1.2 is the most explicit string-manipulation expectation, naming character swapping, capitalisation, extracting substrings, and counting occurrences — all covered here. Ontario ICS3U A1.1 covers strings as a variable type. BC has no dedicated string standard but anchors to "manipulate numbers and text" in Computer Studies 10. Alberta CSE1110/CSE1120 name strings as a data type and concatenation as an operator. The table below shows which rows are core for you; each section cites the curriculum document it was checked against.字符串出现在所有课程大纲中,但覆盖深度各有不同。AP CSP 主题 3.4 通过考试参考手册直接评估字符串操作。安大略 ICS3C A1.2 是最明确的字符串操作期望,点名了字符交换、首字母大写、提取子字符串和计算出现次数——本指南均有涵盖。安大略 ICS3U A1.1 将字符串作为变量类型涵盖。BC 没有专门的字符串标准,但依附于 Computer Studies 10 中的"操作数字和文本"。阿尔伯塔 CSE1110/CSE1120 将字符串列为数据类型,将拼接列为运算符。下表显示哪些行对你是核心内容;每节均注明所依据的课纲文件。
| If you are in…如果你在… | Focus on these sections重点学习 | Defer / lighter可推迟 / 减负 | Source依据 |
|---|---|---|---|
| 🇺🇸 US CSTA / AP CSP美国 CSTA / AP CSP | §1 through §7 in full. AP CSP Topic 3.4 (Strings) and Skill 4.B assess string operations directly on the exam. CSTA 3B-AP-12 names strings as a fundamental data structure.§1 至 §7 完整学习。AP CSP 主题 3.4(字符串)和技能 4.B 在考试中直接评估字符串操作。CSTA 3B-AP-12 将字符串列为基本数据结构。 | The word-count worked example (§7) is enrichment; the core exam skill is reading and predicting string-expression results.词频统计例题(§7)为拓展内容;核心考试技能是阅读和预测字符串表达式的结果。 | CSTA K-12 and AP CSP — CSTA 3B-AP-12; AP CSP Big Idea 3 Topic 3.4; Skill 4.B— CSTA 3B-AP-12;AP CSP 大概念 3 主题 3.4;技能 4.B |
| 🇨🇦 ON Grade 11 — ICS3U / ICS3C安大略 11 年级 — ICS3U / ICS3C | §1 through §7. ICS3U A1.1 requires strings as a variable type (§1). ICS3C A1.2 requires character-level manipulation, substring extraction, and occurrence counting (§3, §5, §7).§1 至 §7。ICS3U A1.1 要求字符串作为变量类型(§1)。ICS3C A1.2 要求字符级操作、子字符串提取和出现次数计数(§3、§5、§7)。 | ICS3U university floor: §6 (looping over characters) and §7 (full word-count program) are enrichment; ICS3C college stream: all seven sections are assessed.ICS3U 大学预备基础:§6(遍历字符)和 §7(完整词频统计程序)为拓展;ICS3C 大学课程:全部七节均被评估。 | ON/BC Computer Studies 11-12 — ICS3U A1.1; ICS3C A1.2— ICS3U A1.1;ICS3C A1.2 |
| 🇨🇦 BC — Computer Studies 10BC — Computer Studies 10 | §1 through §4 as core (string basics, slicing, methods, formatting). §5–§7 (searching, looping, word-count) as enrichment.§1 至 §4 为核心(字符串基础、切片、方法、格式化)。§5–§7(搜索、遍历、词频统计)为拓展。 | BC has no dedicated string-processing standard; all content here sits under the general "store and manipulate numbers and text" bullet.BC 没有专门的字符串处理标准;本指南所有内容均在通用"存储和操作数字和文本"条目下。 | ON/BC Computer Studies 11-12 — BC CS10 "store and manipulate numbers and text"— BC CS10"存储和操作数字和文本" |
| 🇨🇦 AB — CSE1110 / CSE1120阿尔伯塔 — CSE1110 / CSE1120 | §1 through §4 as core. CSE1110 outcome 2.4.3 (strings as a data type) and 2.4.6 (concatenation operators) are the assessed anchors. §5–§7 map to CSE1120 outcome 2.6 (concatenation and interpolation operators).§1 至 §4 为核心。CSE1110 结果 2.4.3(字符串作为数据类型)和 2.4.6(拼接运算符)是被评估的依据。§5–§7 对应 CSE1120 结果 2.6(拼接和插值运算符)。 | Alberta CSE has no standalone string-processing module; all content here sits within the Structured Programming 1/2 outcomes.阿尔伯塔 CSE 没有独立的字符串处理模块;本指南所有内容均在结构化编程 1/2 结果范围内。 | Alberta CTS Computing Science — CSE1110 outcomes 2.4.3, 2.4.6; CSE1120 outcome 2.6— CSE1110 结果 2.4.3、2.4.6;CSE1120 结果 2.6 |
Memorise: indexing is zero-based; slicing is s[start:stop] (stop excluded); the five key methods (upper, lower, split, join, replace); f-string syntax f"..."; and the in operator for membership testing. Read every cram-cheat box and skim the worked examples.背熟:索引从零开始;切片为 s[start:stop](stop 不含);五个关键方法(upper、lower、split、join、replace);f-string 语法 f"...";以及用于成员测试的 in 运算符。阅读每个速记框并浏览例题。
Work through all seven sections and the worked example in §7. AP CSP Skill 4.B asks you to trace string expressions step by step — practice reading s[1:4], s.split(","), and " ".join(words) without running the code. ON ICS3C A1.2 requires you to write programs that extract substrings and count occurrences from scratch.完整学习全部七节及 §7 的例题。AP CSP 技能 4.B 要求你逐步追踪字符串表达式——练习在不运行代码的情况下阅读 s[1:4]、s.split(",") 和 " ".join(words)。ON ICS3C A1.2 要求你从头编写提取子字符串和计算出现次数的程序。
String Basics and Indexing字符串基础与索引
- A string is a sequence of characters.字符串是字符的序列。 Every character — letter, digit, space, punctuation — is stored at a numbered position called an index. Indices start at
0, not 1.每个字符——字母、数字、空格、标点——都存储在一个称为索引的编号位置。索引从0开始,不是 1。 - Read one character with bracket notation.用方括号表示法读取单个字符。
s[0]is the first character;s[-1]is the last character;s[-2]is the second-to-last.s[0]是第一个字符;s[-1]是最后一个字符;s[-2]是倒数第二个字符。 - Strings are immutable.字符串是不可变的。 You cannot change a character in place (
s[0] = "X"raises a TypeError). Every "modification" creates a new string object.你无法就地修改字符(s[0] = "X"会引发 TypeError)。每次"修改"都会创建一个新的字符串对象。
Predict the output of each line without running the code.不运行代码,预测每行的输出。
name = "Dingrui"
print(name[0]) # D
print(name[3]) # g
print(name[-1]) # i
print(name[-3]) # u
print(len(name)) # 7
以上代码:name[0] 取第 1 个字符,name[-1] 取最后一个字符,len() 返回字符串长度。Index map: D=0, i=1, n=2, g=3, r=4, u=5, i=6. Negative indices count from the right: -1=i, -2=u, -3=r... wait — "Dingrui" ends in r,u,i so -1=i, -2=u, -3=r. len("Dingrui") = 7.
s = "Python", what is s[2]?给定 s = "Python",s[2] 的值是什么?s[2] is "t".索引:P=0, y=1, t=2, h=3, o=4, n=5。因此 s[2] 为 "t"。s = "Hello", what is s[-1]?给定 s = "Hello",s[-1] 的值是什么?-1 always refers to the last character. "Hello" ends in "o", so s[-1] = "o".负索引 -1 始终指向最后一个字符。"Hello" 以 "o" 结尾,因此 s[-1] = "o"。s[-1] = last character = "o". s[0] = "H". Negative indices count from the right: -1 is last, -2 is second-to-last.s[-1] = 最后一个字符 = "o"。s[0] = "H"。负索引从右计数:-1 是最后,-2 是倒数第二。String Slicing字符串切片
s[start:stop] — includes start, excludes stop.切片语法:s[start:stop]——包含 start,不含 stop。
s[1:4]— characters at indices 1, 2, 3 (NOT 4).— 索引 1、2、3 处的字符(不含 4)。s[:3]— omit start → defaults to 0. Gives first 3 characters.— 省略 start 则默认为 0。返回前 3 个字符。s[2:]— omit stop → goes to end. Gives everything from index 2 onward.— 省略 stop 则到末尾。返回从索引 2 开始的全部字符。s[::2]— step of 2 → every other character.s[::-1]reverses the string.— 步长为 2 → 每隔一个字符取一个。s[::-1]反转字符串。
s[0:3], not s[0:2]. ON ICS3C A1.2 names "extract a portion of an address" — that is a slice.最容易出错的地方:stop 索引是不包含的。如果你想要位置 0、1、2 处的字符,你写 s[0:3],不是 s[0:2]。ON ICS3C A1.2 点名"提取地址的一部分"——那就是切片。
Given date = "2024-06-15", extract the year, month, and day as separate strings.给定 date = "2024-06-15",将年、月、日提取为单独的字符串。
date = "2024-06-15"
year = date[0:4] # "2024"
month = date[5:7] # "06"
day = date[8:10] # "15"
print(year, month, day) # 2024 06 15
以上:date[0:4] 取索引 0~3(共 4 字符),date[5:7] 取索引 5~6,date[8:10] 取索引 8~9。这是 ON ICS3C A1.2"提取地址的一部分"的典型例子。Index positions: 2=0, 0=1, 2=2, 4=3, -=4, 0=5, 6=6, -=7, 1=8, 5=9. date[0:4] captures indices 0–3 = "2024". date[5:7] = indices 5–6 = "06". date[8:10] = indices 8–9 = "15". This is the canonical ON ICS3C A1.2 "extract a portion" operation.
s = "computer", what does s[2:5] return?给定 s = "computer",s[2:5] 返回什么?s[2:5] = indices 2,3,4 = "mpu". Stop index 5 is excluded."computer":c=0,o=1,m=2,p=3,u=4,t=5,e=6,r=7。s[2:5] = 索引 2,3,4 = "mpu"。stop 索引 5 不包含。[2:5] gives indices 2, 3, 4 — the stop (5) is excluded. "computer"[2]="m", [3]="p", [4]="u" → "mpu".切片 [2:5] 给出索引 2、3、4——stop(5)不包含。"computer"[2]="m",[3]="p",[4]="u" → "mpu"。"Hello"[::-1] return?"Hello"[::-1] 返回什么?-1 reverses the string. "Hello"[::-1] reads every character from right to left: "olleH".步长 -1 会反转字符串。"Hello"[::-1] 从右到左读取每个字符:"olleH"。[::-1] = reverse. Start and stop are omitted (whole string), step is -1 (backwards). Result: "olleH".[::-1] = 反转。start 和 stop 省略(整个字符串),step 为 -1(反向)。结果:"olleH"。String Methods字符串方法
s.upper()— returns a new string with all characters uppercased."hello".upper()→"HELLO".— 返回所有字符大写的新字符串。"hello".upper()→"HELLO"。s.lower()— returns all lowercase. Useful for case-insensitive comparison.— 返回全小写。用于不区分大小写的比较。s.split(sep)— splits onsepand returns a list."a,b,c".split(",")→["a","b","c"]. Ifsepis omitted, splits on any whitespace.— 按sep分割并返回列表。"a,b,c".split(",")→["a","b","c"]。省略sep则按任意空白分割。sep.join(iterable)— the inverse of split."-".join(["a","b","c"])→"a-b-c".— split 的逆操作。"-".join(["a","b","c"])→"a-b-c"。s.replace(old, new)— replaces every occurrence ofoldwithnew."cats and cats".replace("cats","dogs")→"dogs and dogs".— 将每个old替换为new。"cats and cats".replace("cats","dogs")→"dogs and dogs"。
upper()/lower(); "extract a portion" → slice + split; "count occurrences" → count() in §5. AB CSE1120 outcome 2.6 names "concatenation and interpolation operators" — join() is the idiomatic Python concatenation tool.课程依据:ON ICS3C A1.2(原文):"将首字母大写" → upper()/lower();"提取一部分" → 切片 + split;"计算出现次数" → count()(见 §5)。AB CSE1120 结果 2.6 点名"拼接和插值运算符"——join() 是 Python 的惯用拼接工具。
Given row = " Alice , Biology , 92 ", produce a clean list of stripped, lowercased fields.给定 row = " Alice , Biology , 92 ",生成去空格、小写的字段列表。
row = " Alice , Biology , 92 "
fields = row.split(",") # [" Alice ", " Biology ", " 92 "]
clean = [f.strip().lower() for f in fields]
print(clean) # ["alice", "biology", "92"]
以上:split(",") 按逗号分割(方法,method),strip() 去除两端空白,lower() 转为小写。这三个步骤可以链式调用。Chain: split(",") splits on comma (returns list), then strip() removes surrounding whitespace, then lower() normalises case. This pattern — split, strip, lower — is the standard CSV-cleaning idiom.
"Hello World".lower() return?"Hello World".lower() 返回什么?lower() converts every character to lowercase, returning a new string. Spaces are unchanged.lower() 将每个字符转换为小写,返回新字符串。空格不变。lower() lowercases everything. upper() would give "HELLO WORLD". The original string is unchanged because strings are immutable.lower() 将所有字母转为小写。upper() 会得到 "HELLO WORLD"。原字符串不变,因为字符串是不可变的。",".join(["a","b","c"]) return?",".join(["a","b","c"]) 返回什么?join() concatenates the list elements with the separator "," between each pair. Result: "a,b,c".join() 将列表元素用分隔符 "," 连接在一起。结果:"a,b,c"。sep.join(list) produces a string, not a list. The separator goes between each element: "a" + "," + "b" + "," + "c" = "a,b,c".sep.join(list) 产生字符串,不是列表。分隔符位于每个元素之间:"a" + "," + "b" + "," + "c" = "a,b,c"。Concatenation and String Formatting字符串拼接与格式化
+operator (concatenation)+运算符(拼接) —"Hello" + " " + "World"→"Hello World". Only works with strings; usestr(n)to convert numbers first.—"Hello" + " " + "World"→"Hello World"。只能用于字符串;先用str(n)将数字转换。- f-string (formatted string literal)f-string(格式化字符串字面量) —
f"Score: {score}". The expression inside{}is evaluated and inserted. Readable and preferred in modern Python.—f"Score: {score}"。{}内的表达式被求值并插入。现代 Python 中首选,可读性强。 str.format()str.format()—"Score: {}".format(score). Older style, still common in legacy code.—"Score: {}".format(score)。旧式写法,在旧代码中仍常见。
Given a student name, subject, and score, produce a formatted report line three different ways.给定学生姓名、科目和分数,用三种不同方式生成格式化的报告行。
name = "Alice"
subject = "Biology"
score = 92
# Method 1: + concatenation
line1 = name + " | " + subject + " | " + str(score)
# Method 2: f-string
line2 = f"{name} | {subject} | {score}"
# Method 3: .format()
line3 = "{} | {} | {}".format(name, subject, score)
print(line1) # Alice | Biology | 92
print(line2) # Alice | Biology | 92
print(line3) # Alice | Biology | 92
以上三种方式输出相同。+ 需要 str(score) 转换;f-string(格式化字符串)最简洁,现代 Python 推荐。注意字符串是不可变的(immutable),每次操作都返回新字符串。All three produce the same output. The + method requires explicit str(score) conversion; f-strings handle it automatically and are the modern Python recommendation. Note: every operation returns a new string — strings are immutable.
"Score: " + str(85) return?"Score: " + str(85) 返回什么?str(85) converts the integer 85 to the string "85", then + concatenates it with "Score: " to give "Score: 85".str(85) 将整数 85 转换为字符串 "85",然后 + 将其与 "Score: " 拼接,得到 "Score: 85"。str() the + between a string and int raises TypeError. With str(85) the conversion succeeds and produces "Score: 85".没有 str(),字符串和整数之间的 + 会引发 TypeError。有 str(85) 则转换成功,产生 "Score: 85"。x = 7, what does f"Value is {x * 2}" return?给定 x = 7,f"Value is {x * 2}" 返回什么?{} is evaluated at runtime. x * 2 = 7 * 2 = 14, so the result is the string "Value is 14".在 f-string 中,{} 内的表达式在运行时被求值。x * 2 = 7 * 2 = 14,因此结果是字符串 "Value is 14"。{} — they do not insert the literal text. x * 2 evaluates to 14, giving "Value is 14".f-string 对 {} 中的表达式求值——不是插入字面文本。x * 2 求值为 14,得到 "Value is 14"。Searching Within Strings在字符串中搜索
sub in s— returnsTrueorFalse. Fastest way to check membership."cat" in "concatenate"→True.— 返回True或False。检查成员关系的最快方式。"cat" in "concatenate"→True。s.find(sub)— returns the index of the first occurrence, or-1if not found."hello".find("ll")→2.— 返回第一次出现的索引,未找到则返回-1。"hello".find("ll")→2。s.count(sub)— returns the number of non-overlapping occurrences."banana".count("a")→3.— 返回不重叠出现次数。"banana".count("a")→3。
s.count(). AP CSP Topic 3.4 and Skill 4.B assess reading and predicting string-operation results — all three tools above appear on the exam reference sheet as string procedures.课程依据:ON ICS3C A1.2(原文):"计算单词或字母的出现次数"——那就是 s.count()。AP CSP 主题 3.4 和技能 4.B 评估阅读和预测字符串操作结果——以上三种工具都出现在考试参考手册的字符串程序中。
Use all three search tools on the sentence "the quick brown fox".在句子 "the quick brown fox" 上使用全部三种搜索工具。
sentence = "the quick brown fox"
# in — membership test
print("fox" in sentence) # True
print("cat" in sentence) # False
# find — first index (-1 if absent)
print(sentence.find("quick")) # 4
print(sentence.find("cat")) # -1
# count — occurrences
print(sentence.count("o")) # 2 (brown, fox)
print(sentence.count("the")) # 1
以上:in 返回布尔值(True/False),find() 返回第一个匹配的索引(方法,method),count() 计算不重叠的出现次数。"brown" 中的 o 和 "fox" 中的 o 各算一次,共 2 次。in returns bool. find() returns the start index of the first match (or -1). count() counts non-overlapping occurrences: "o" appears in "brown" (index 10) and "fox" (index 16) — total 2.
"banana".count("an") return?"banana".count("an") 返回什么?count() counts non-overlapping matches, so the answer is 2."an" 在 "banana" 中出现两次:位置 1 和 3。count() 计算不重叠匹配,因此答案是 2。"hello".find("x") return?"hello".find("x") 返回什么?find() returns -1 when the substring is not found. "x" is not in "hello", so the result is -1.find() 在未找到子字符串时返回 -1。"x" 不在 "hello" 中,因此结果是 -1。find() returns an integer: the index if found, or -1 if not found. It never returns False or None. Use in for a boolean membership test.find() 返回整数:找到则返回索引,未找到则返回 -1。它不会返回 False 或 None。用 in 进行布尔成员测试。Looping Over Characters遍历字符
- For-each (direct)for-each(直接遍历)
for ch in s:—chtakes each character in turn. Cleaner when you only need the character, not its index.—ch依次取每个字符。当只需要字符而不需要索引时更简洁。 - For-range (index access)for-range(索引访问)
for i in range(len(s)):— uses[i]inside the loop. Needed when you must know the position.— 在循环内用s[i]。当必须知道位置时使用。
Count how many vowels are in the word "education" using both loop styles.用两种循环方式统计单词 "education" 中的元音字母数量。
word = "education"
vowels = "aeiou"
# Style 1: for-each (cleaner)
count1 = 0
for ch in word:
if ch in vowels:
count1 += 1
print(count1) # 5
# Style 2: for-range (index-based)
count2 = 0
for i in range(len(word)):
if word[i] in vowels:
count2 += 1
print(count2) # 5
以上:Style 1 用 for ch in word 逐字符(character)遍历;Style 2 用索引(index)访问。"education" 中的元音:e, u, a, i, o,共 5 个。两种方式结果相同。Both styles produce the same result. "education": e(vowel), d, u(vowel), c, a(vowel), t, i(vowel), o(vowel), n — 5 vowels. Style 1 is more Pythonic; Style 2 is needed when the position matters (e.g., to replace word[i]).
s = "hello"result = ""for ch in s:result = ch + resultprint(result)以下代码打印什么?s = "hello"result = ""for ch in s:result = ch + resultprint(result)result. Trace: ""→"h"→"eh"→"leh"→"lleh"→"olleh". The loop reverses the string.每个字符被前置(放在)result 之前。追踪:""→"h"→"eh"→"leh"→"lleh"→"olleh"。该循环反转了字符串。result = ch + result, which places the new character before the accumulated string. After all 5 characters, the string is reversed: "olleh".每次迭代执行 result = ch + result,将新字符放在累积字符串之前。经过所有 5 个字符后,字符串被反转:"olleh"。for i in range(len(s)) gives both the index i and the character s[i], so you know where each character is. Use this when position matters (e.g., swapping characters, building a new string with changes at specific positions).for i in range(len(s)) 同时给出索引 i 和字符 s[i],因此你知道每个字符的位置。当位置重要时使用此方式(例如,交换字符、在特定位置构建带更改的新字符串)。for ch in s gives the character but not its index. When you need the position, use for i in range(len(s)) and access the character as s[i].for ch in s 给出字符但不给出其索引。当你需要位置时,使用 for i in range(len(s)) 并以 s[i] 访问字符。Text-Processing Worked Example: Word Count文本处理综合例题:词频统计
- Step 1 — Normalise第 1 步——规范化 —
lower()+strip()so "The" and "the" count as the same word.—lower()+strip()使"The"和"the"被计为同一个词。 - Step 2 — Split第 2 步——分割 —
split()with no argument splits on any whitespace, collapsing multiple spaces.— 无参数的split()按任意空白分割,合并多个空格。 - Step 3 — Count第 3 步——计数 — loop over the word list and count occurrences, or use a dictionary for frequency per word.— 遍历词列表并计数,或用字典统计每个词的频率。
Given a sentence, count the total number of words, find the most frequent word, and report whether a target word appears.给定一个句子,统计总词数,找出出现最多的词,并报告目标词是否出现。
sentence = "the cat sat on the mat and the cat"
# Step 1: Normalise
text = sentence.lower().strip()
# Step 2: Split into words
words = text.split() # ["the","cat","sat","on","the","mat","and","the","cat"]
# Step 3: Count total words
total = len(words)
print(f"Total words: {total}") # Total words: 9
# Step 4: Frequency of each unique word
freq = {}
for w in words:
if w in freq:
freq[w] += 1
else:
freq[w] = 1
print(freq)
# {'the': 3, 'cat': 2, 'sat': 1, 'on': 1, 'mat': 1, 'and': 1}
# Step 5: Most frequent word
top = max(freq, key=freq.get)
print(f"Most frequent: '{top}' ({freq[top]} times)")
# Most frequent: 'the' (3 times)
# Step 6: Target search
target = "cat"
print(f"'{target}' appears: {'yes' if target in words else 'no'}")
以上流水线:lower() 规范化,split() 分割为词列表,len() 计总词数,字典统计词频(频率),max() 找最高频词,in 检查目标词是否存在。这综合了本单元所有字符串技能。Pipeline: lower() normalises case, split() tokenises (splits into word list), len() counts total words, the dict loop counts frequency per word, max() finds the top word, and in tests membership. This integrates all six earlier sections.
text = "To be or not to be", what does text.lower().split() return?给定 text = "To be or not to be",text.lower().split() 返回什么?lower() first converts to all lowercase → "to be or not to be". Then split() splits on whitespace into a list: ["to","be","or","not","to","be"].lower() 先转为全小写 → "to be or not to be"。再 split() 按空白分割成列表:["to","be","or","not","to","be"]。.lower().split() applies both operations in sequence. Result is a list of lowercase words.链式调用 .lower().split() 按顺序应用两个操作。结果是小写词的列表。"the cat sat on the mat and the cat", what is the total number of words?对 "the cat sat on the mat and the cat" 运行词频统计程序后,总词数是多少?len(words) = 9.按空格分割:"the","cat","sat","on","the","mat","and","the","cat"——9 个词元。len(words) = 9。Going deeper — removing punctuation before word-count Honors — ICS3C / CSE1120深入 — 计数前去除标点 荣誉 — ICS3C / CSE1120
Real text contains commas, periods, and quotation marks, so "word," and "word" count as different tokens. The standard fix is to strip punctuation from each token before inserting it into the frequency dictionary. One approach: word.strip(".,!?;:\""). A more general solution uses the str.translate() method with str.maketrans() to remove all punctuation at once. ON ICS3C A1.2 names "extract a portion" and "count occurrences" as the assessed skills; handling punctuation cleanly is the distinction between a basic and a polished solution.真实文本包含逗号、句号和引号,因此 "word," 和 "word" 会被计为不同的词元。标准修复是在将每个词元插入频率字典之前去除标点。一种方法:word.strip(".,!?;:\"")。更通用的解决方案使用 str.translate() 方法配合 str.maketrans() 一次去除所有标点。ON ICS3C A1.2 将"提取一部分"和"计算出现次数"列为评估技能;干净地处理标点是基础解法和精良解法之间的区别。
Exam Strategy and Common Pitfalls考试策略与常见陷阱
- Zero-based indexing.从零开始的索引。 The first character is always index 0, not 1. Every year, students lose marks by writing
s[1]for the first character. Negative indices count from the right:s[-1]is always the last character.第一个字符始终是索引 0,不是 1。每年都有学生因写s[1]作为第一个字符而失分。负索引从右计数:s[-1]始终是最后一个字符。 - Stop is excluded in slices.切片中的 stop 不包含。
s[0:3]gives three characters (indices 0, 1, 2), NOT four. To get the first n characters, writes[:n].s[0:3]给出三个字符(索引 0、1、2),不是四个。要获取前 n 个字符,写s[:n]。
- Strings are immutable.字符串是不可变的。
s.upper()returns a new string — it does NOT changes. Always assign the result:s = s.upper(). Forgetting this is the most common method-question mistake.s.upper()返回新字符串——它不改变s。必须赋值:s = s.upper()。忘记这点是方法题最常见的错误。 find()returns-1, notFalse.find()返回-1,不是False。 Useinfor a boolean test; usefind()when you need the position. On AP CSP exam: the reference-sheet string operations use 1-based indexing — adjust by one relative to Python.布尔测试用in;需要位置时用find()。在 AP CSP 考试中:参考手册字符串操作使用从 1 开始的索引——相对于 Python 调整一位。
- Evaluate inside-out.由内而外求值。 For a chain like
"hello".upper().replace("L","X"): evaluate left to right. Step 1:"HELLO". Step 2:"HEXXO". Show each intermediate value for full marks.对于链式调用如"hello".upper().replace("L","X"):从左到右求值。第 1 步:"HELLO"。第 2 步:"HEXXO"。展示每个中间值以获得满分。 - Count characters, not words.计字符,不是词。 When predicting
s.count("x"), count every occurrence of the literal character, not the number of words. Overlapping substrings are NOT counted:"aaa".count("aa")= 1, not 2.预测s.count("x")时,计算字面字符的每次出现,不是词数。重叠子字符串不计:"aaa".count("aa")= 1,不是 2。
- Convert numbers before
+concatenation.在+拼接前转换数字。"Score: " + 92raises a TypeError. Usestr(92)or an f-string instead:f"Score: {92}"."Score: " + 92会引发 TypeError。改用str(92)或 f-string:f"Score: {92}"。
Flashcards闪卡
s[0] vs s[-1]?s[0] 和 s[-1] 分别是什么?s[0] = first character. s[-1] = last character. Indices start at 0; negative indices count from the right.s[0] = 第一个字符。s[-1] = 最后一个字符。索引从 0 开始;负索引从右计数。s[start:stop] — includes start, excludes stop. s[:3] = first 3. s[2:] = from index 2 to end. s[::-1] = reversed.s[start:stop]——包含 start,不含 stop。s[:3] = 前 3 个。s[2:] = 索引 2 到末尾。s[::-1] = 反转。s.upper() / s.lower()s.upper() / s.lower()s = s.lower().返回所有字母大写/小写的新字符串。原字符串不变(不可变)。需赋值才能使用:s = s.lower()。s.split(sep)s.split(sep)sep, returns a list. No argument = split on any whitespace. "a,b".split(",") → ["a","b"].在 sep 处分割字符串,返回列表。无参数 = 按任意空白分割。"a,b".split(",") → ["a","b"]。sep.join(iterable)sep.join(iterable)sep between each pair. "-".join(["a","b","c"]) → "a-b-c".split 的逆操作。将列表元素用 sep 连接成一个字符串。"-".join(["a","b","c"]) → "a-b-c"。s.replace(old, new)s.replace(old, new)old with new, returns a new string. "aa".replace("a","b") → "bb".将每个 old 替换为 new,返回新字符串。"aa".replace("a","b") → "bb"。f"text {expr}" — the expression inside {} is evaluated and inserted. f"Score: {90+2}" → "Score: 92". Modern Python preferred.f"text {expr}"——{} 内的表达式被求值并插入。f"Score: {90+2}" → "Score: 92"。现代 Python 推荐。sub in ssub in sTrue if sub is found anywhere in s, else False. Fastest membership test. "cat" in "concatenate" → True.如果 sub 在 s 中出现则返回 True,否则 False。最快的成员测试。"cat" in "concatenate" → True。s.find(sub)s.find(sub)-1 if not found. Never returns False or None.返回第一次出现的索引,未找到则返回 -1。不会返回 False 或 None。s.count(sub)s.count(sub)sub in s. "banana".count("a") → 3. ON ICS3C A1.2 "count occurrences."返回 sub 在 s 中不重叠的出现次数。"banana".count("a") → 3。ON ICS3C A1.2"计算出现次数"。for ch in s: gives character, no index. for i in range(len(s)): gives index and character via s[i]. Use range when position matters.for ch in s:给出字符,无索引。for i in range(len(s)):通过 s[i] 给出索引和字符。当位置重要时用 range。lower() normalise. 2. split() tokenise. 3. Loop and count with a dict. ON ICS3C A1.2 "count occurrences of a word."1. lower() 规范化。2. split() 分词。3. 用字典循环计数。ON ICS3C A1.2"计算单词出现次数"。s[0] = "X" — raises TypeError. Every method returns a new string. To "modify" s, reassign: s = s.replace("a","b").不能执行 s[0] = "X"——引发 TypeError。每个方法都返回新字符串。要"修改" s,需重新赋值:s = s.replace("a","b")。Practice Quiz综合测验
s = "Data", what is s[1]?给定 s = "Data",s[1] 是什么?s[1] = "a".从零开始索引:D=0, a=1, t=2, a=3。s[1] = "a"。"Science"[1:5] return?"Science"[1:5] 返回什么?[1:5] = indices 1,2,3,4 = "cien". Stop index 5 excluded."Science":S=0,c=1,i=2,e=3,n=4,c=5,e=6。[1:5] = 索引 1,2,3,4 = "cien"。stop 索引 5 不含。"hello world".split() return?"hello world".split() 返回什么?split() with no argument splits on whitespace, returning a list of words: ["hello", "world"].无参数的 split() 按空白分割,返回词列表:["hello", "world"]。split() returns a list of tokens, not individual characters and not a number.split() 返回词元列表,不是单个字符,也不是数字。name = "Alice", what does f"Hello {name}!" evaluate to?给定 name = "Alice",f"Hello {name}!" 的结果是什么?{name} is evaluated and replaced with the value of name = "Alice". Result: "Hello Alice!".在 f-string 中,{name} 被求值并替换为 name 的值 = "Alice"。结果:"Hello Alice!"。{} expressions at runtime. {name} becomes "Alice".f-string 在运行时对 {} 表达式求值。{name} 变为 "Alice"。"mississippi".count("ss") return?"mississippi".count("ss") 返回什么?count("ss") counts non-overlapping "ss" substrings. In "mississippi": positions 2 and 5 = 2 occurrences.count("ss") 计算不重叠的 "ss" 子字符串。在 "mississippi" 中:位置 2 和 5 = 2 次出现。"code" and print 2. What does it actually print?s = "code"count = 0for ch in s:if ch in "aeiou":count = 1print(count)以下代码本应统计 "code" 中的元音并打印 2。实际打印什么?s = "code"count = 0for ch in s:if ch in "aeiou":count = 1print(count)count = 1 instead of count += 1. Every time a vowel is found, count is reset to 1, not incremented. "code" has vowels "o" and "e"; the last one sets count = 1. Fix: count += 1.错误是 count = 1 而不是 count += 1。每次找到元音时,count 被重置为 1,而不是递增。"code" 有元音 "o" 和 "e";最后一个将 count 设为 1。修复:count += 1。count = 1 resets count to 1 on every vowel — it doesn't accumulate. The last vowel "e" sets count = 1, so print(count) outputs 1.count = 1 在每次遇到元音时将 count 重置为 1——不累积。最后一个元音 "e" 将 count 设为 1,因此 print(count) 输出 1。Readiness Checklist准备就绪清单
Tick each item when you can do it cold, without notes, on a first attempt.能在无笔记、首次尝试下完成,再勾选每一项。
- Given any string, state the character at a specific positive index and negative index without running code. Explain why indices start at 0. 🇺🇸 AP CSP 3.4 / 🇨🇦 AB CSE1110 2.4.3给定任意字符串,不运行代码说出特定正索引和负索引处的字符。解释为何索引从 0 开始。🇺🇸 AP CSP 3.4 / 🇨🇦 AB CSE1110 2.4.3
- Predict the output of any slice expression
s[start:stop]ors[start:stop:step]by applying the "stop is excluded" rule. 🇺🇸 AP CSP Skill 4.B / 🇨🇦 ON ICS3C A1.2通过应用"stop 不含"规则,预测任意切片表达式s[start:stop]或s[start:stop:step]的输出。🇺🇸 AP CSP 技能 4.B / 🇨🇦 ON ICS3C A1.2 - Use
upper(),lower(),split(),join(), andreplace()correctly and explain why assigning the return value is required. 🇨🇦 ON ICS3C A1.2 / AB CSE1120 2.6正确使用upper()、lower()、split()、join()和replace(),并解释为何需要赋值返回值。🇨🇦 ON ICS3C A1.2 / AB CSE1120 2.6 - Write a formatted output line using an f-string that embeds at least one variable and one computed expression. 🇨🇦 AB CSE1110 2.4.6 / AB CSE1120 2.6用 f-string 写出包含至少一个变量和一个计算表达式的格式化输出行。🇨🇦 AB CSE1110 2.4.6 / AB CSE1120 2.6
- Distinguish the return values of
in(bool),find()(int / -1), andcount()(int); predict each for a given string and target without running the code. 🇺🇸 AP CSP 3.4 / 🇨🇦 ON ICS3C A1.2区分in(布尔值)、find()(整数 / -1)和count()(整数)的返回值;对给定字符串和目标,不运行代码预测各自结果。🇺🇸 AP CSP 3.4 / 🇨🇦 ON ICS3C A1.2 - Write a for-each loop that counts vowels in a string from scratch, without using built-in methods. 🇨🇦 ON ICS3C A1.2 / BC CS10从头编写一个 for-each 循环,不使用内置方法统计字符串中的元音。🇨🇦 ON ICS3C A1.2 / BC CS10
- Explain when to use
for ch in svsfor i in range(len(s)), and give a task that requires each form. 🇨🇦 ON ICS3C A1.2解释何时使用for ch in s,何时使用for i in range(len(s)),并各举一个需要该形式的任务。🇨🇦 ON ICS3C A1.2 - Write the three-step word-count pipeline (
lower(),split(), dict loop) for a given sentence and produce the correct frequency dictionary. 🇨🇦 ON ICS3C A1.2 "count occurrences of a word"为给定句子编写三步词频统计流水线(lower()、split()、字典循环)并生成正确的频率字典。🇨🇦 ON ICS3C A1.2"计算单词出现次数" - Trace a string-expression chain (e.g.,
" Hello ".strip().lower().replace("h","j")) step by step, writing the intermediate string after each method call. 🇺🇸 AP CSP Skill 4.B逐步追踪字符串表达式链(如" Hello ".strip().lower().replace("h","j")),在每次方法调用后写出中间字符串。🇺🇸 AP CSP 技能 4.B - Extract the year, month, and day from a date string of the form
"YYYY-MM-DD"using slicing, without usingsplit(). 🇨🇦 ON ICS3C A1.2 "extract a portion"仅用切片(不用split())从"YYYY-MM-DD"格式的日期字符串中提取年、月、日。🇨🇦 ON ICS3C A1.2"提取一部分" - Honors — ICS3C / CSE1120 Write a program that removes all punctuation from a sentence before doing a word-count, producing clean frequency data. 🇨🇦 ON ICS3C A1.2 / AB CSE1120 2.6荣誉 — ICS3C / CSE1120 编写一个程序,在词频统计前去除句子中的所有标点,生成干净的频率数据。🇨🇦 ON ICS3C A1.2 / AB CSE1120 2.6
What This Feeds Into本单元的去向
String manipulation is a prerequisite skill for almost every real program. Within the HS Computer Science series, the data-structures and searching units depend on your ability to tokenise text, compare strings, and loop character by character. Both downstream AP courses test string operations directly. The link below points to the AP CSA guide confirmed to exist in this repo; AP CSA Unit 1 introduces Java String methods (which closely parallel the Python methods in this guide).字符串操作是几乎所有真实程序的先决技能。在 HS Computer Science 系列中,数据结构和搜索单元依赖于你分词、比较字符串和逐字符循环的能力。两门下游 AP 课程都直接测试字符串操作。以下链接指向本仓库中已确认存在的 AP CSA 指南;AP CSA Unit 1 介绍 Java String 方法(与本指南中的 Python 方法高度对应)。
Within High School Computer Science.在 HS Computer Science 内部。
The Data Structures unit builds on split() and tokenisation to process lists of strings. The Searching and Sorting unit applies the character-comparison logic you practiced in §5 to sort strings alphabetically. The Software Development unit uses the same lower().strip() normalisation pipeline for input validation. AP CSP Topic 3.4 (Strings) directly tests the indexing, slicing, and method knowledge from §1–§3 of this guide.数据结构单元在 split() 和分词的基础上处理字符串列表。搜索与排序单元将 §5 中练习的字符比较逻辑应用于按字母顺序排序字符串。软件开发单元将相同的 lower().strip() 规范化流水线用于输入验证。AP CSP 主题 3.4(字符串)直接测试本指南 §1–§3 中的索引、切片和方法知识。
AP feeder links (existing in this repo).AP 衔接链接(本仓库中已有)。
AP CSP Topic 3.4 (Strings) is the direct exam anchor: the reference-sheet string operations (substring, concatenation, length) all appear in the AP CSP free-response and multiple-choice sections. Skill 4.B ("Determine the result of code segments") means you must be able to predict a string expression's output without running it — exactly the tracing practice in the worked examples here.AP CSP 主题 3.4(字符串)是直接的考试依据:参考手册字符串操作(子字符串、拼接、长度)都出现在 AP CSP 简答题和选择题部分。技能 4.B("确定代码段的结果")意味着你必须能够不运行代码就预测字符串表达式的输出——正是本指南例题中练习的追踪技能。